ESMFold2
Biohub ESMFold2 all-atom structure prediction using the ESMC-6B language model.
How SubSeq Runs It
- Jobs run
/ref/bin/esmfold2_subseq.pyin an A100 profile. - The wrapper is mounted from the ESMFold2 ref tree, while model weights and Hugging Face cache are loaded from the A100-local
/localrefmirror. - The model root is forced to
/localref/models;--model-preset=fastis the default for normal FASTA jobs unless a preset is specified. - Network access is disabled for jobs, and output is forced to
/outputs. - Each FASTA file is folded as one complex. Multiple records in one file become multiple protein chains.
Input
Upload one or more FASTA files under /inputs. A single-chain protein can be one record; a protein complex can be one FASTA file with multiple records.
>A
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG
For richer ESMFold2 inputs, upload JSON with sequences entries for proteins, DNA, RNA, ligands, modifications, MSAs, pockets, distograms, or covalent bonds.
{
"id": "protein_ligand",
"sequences": [
{"type": "protein", "id": "A", "sequence": "MKTAYIAKQRQISFVKSHFSRQDILDL"},
{"type": "ligand", "id": "L", "smiles": "CCO"}
]
}
Example Arguments
-I=/inputs
--model-preset=fast
--num-sampling-steps=32
--num-diffusion-samples=1
Use -i=/inputs/example.fasta to fold one FASTA file explicitly, or -j=/inputs/complex.json for structured JSON.
--model-preset=full
--num-sampling-steps=50
--num-loops=3
--chunk-size=64
--esmc-precision=bf16
Use fast for typical single-sequence FASTA folding. Choose full for richer structured JSON inputs, especially when using MSAs, ligands, nucleic acids, modifications, pockets, distograms, or covalent bonds.
Arguments
Inputs
-I,--input-dir: fold every FASTA file in a directory. Default:/inputs.-i,--input: fold one FASTA file as one complex.-j,--input-json: fold one structured JSON file.-J,--json-dir: fold every JSON file in a directory.--sequence: fold one direct single-chain protein sequence.--complex-id: set the output stem and complex id for--sequenceor single FASTA jobs.
Model
--model-preset=full|fast: choose the full ESMFold2 model or ESMFold2-Fast. Default:fast.--esmc-precision=bf16|fp32|fp8: precision for the ESMC language model. Default:bf16.--chunk-size=<int>: structure-module chunk size. Default:64; use0to disable chunking.--tf32,--no-tf32: enable or disable TF32 matmul on NVIDIA GPUs. Default: enabled.
Sampling
--num-loops=<int>: number of recycling/refinement loops. Default:3.--num-sampling-steps=<int>: diffusion sampling steps. Default:32.--num-diffusion-samples=<int>: number of samples to generate per input. Default:1.--seed=<int>: random seed for reproducible sampling.--noise-scale=<float>: override ESMFold2's diffusion noise scale.--step-scale=<float>: override ESMFold2's diffusion step scale.--max-inference-sigma=<float>: cap the maximum inference sigma used by the sampler.--early-exit: enable ESMFold2's early-exit mode.
32 steps is the normal fast preset for routine FASTA jobs. Use 50 for full, MSA-backed jobs, or quality checks. Try 100-200 only as an expensive retry when confidence is borderline; prefer adding MSA/context before increasing steps. More diffusion samples are useful for ranking/diversity, not as a reliable fix for missing context.
Validation
--dry-run: parse and validate inputs without loading the model or folding.
Server-Owned Paths
--model-rootis forced to/localref/models.--output-diris forced to/outputs.--model-idand--allow-downloadare not user-controlled in SubSeq jobs.
Outputs
The wrapper writes mmCIF structures and JSON confidence summaries, for example /outputs/example.cif and /outputs/example.json.
The JSON summary includes plddt_mean, ptm, iptm, and the output CIF path. It does not include Boltz-specific fields such as confidence_score or complex_ipde.
Submit
Queue a run from New Job -> ESMFold2.