Boltz-2
Boltz structure and property prediction, wrapped for subseq.bio.
Inputs and outputs
- Your files are mounted read-only under
/inputsinside the container. - Prediction outputs and metadata should be written to
/outputs. - SubSeq manages the model cache automatically.
- The container entrypoint is
boltz predict, so the job arguments are passed directly to that CLI. - If your YAML does not include MSAs, set
msa: emptyfor each protein chain, or Boltz-2 may fail expecting MSA input. - To use a precomputed MSA, include the alignment file under
/inputsand reference it from YAML, for examplemsa: /inputs/target.a3m. For multi-protein custom MSAs, use Boltz's paired CSV format. - Boltz-2 affinity is small-molecule-only. Guided affinity rejects SMILES ligands above 128 non-hydrogen atoms before scheduling; use a polymer chain for protein, DNA, RNA, or peptide binders.
- Use polymer chain entries for protein, DNA, RNA, or peptide binders. Do not encode polymers as SMILES; use residue modifications or custom YAML for modified residues.
- When using directory input, point Boltz-2 at a directory containing only
.yaml/.fastainputs; keep auxiliary files such as.a3moutside that scanned directory and reference them by absolute/inputs/...paths.
Guided modes
- Single Sequence: generate YAML for one protein chain.
- Build Assembly: generate YAML from a repeatable protein/DNA/RNA chain list, optional per-chain CCD residue modifications/cyclic flags, and optional small molecule. For dsDNA or paired RNA, add two nucleic-acid chains and enter the second strand as the reverse complement; the guided form can add that chain from an existing DNA or RNA chain. Turn on Predict Affinity to add
properties: affinityfor the small-molecule ligand. Guided affinity is limited to protein-only polymer targets. - Custom Input: run one supplied Boltz YAML or FASTA file. Use this for paired MSAs, templates, constraints, large/symmetric assemblies, and other advanced schemas.
- Batch Folder: run all Boltz YAML/FASTA inputs in a folder as one SubSeq job.
Affinity metrics are produced only when the YAML contains properties: affinity. In guided mode that means Build Assembly with a small molecule and Predict Affinity enabled. In Custom Input or Batch Folder, each YAML file can produce affinity metrics if it contains a valid affinity property.
Example 1 — basic run
Use the default pattern shown on the New Job form: point Boltz-2 at your inputs directory and turn on potentials.
/inputs
--use_potentials
--out_dir=/outputs
- Upstream recommends YAML inputs; FASTA inputs are supported but deprecated upstream.
- Place your YAML/FASTA and any required assets under
/inputswhen uploading. --use_potentialsenables the learned potentials during inference.--out_dir=/outputsdirects all prediction artifacts to the job output volume.
Example 2 — local configuration file
You can also drive Boltz-2 with an explicit configuration file that lives under /inputs.
/inputs/my_config.yaml
--out_dir=/outputs/my_config_run
--use_potentials
/inputs/my_config.yamlis a Boltz configuration file you prepare based on the upstream examples.- Outputs for this run will be written under
/outputs/my_config_run. - The platform manages the model cache automatically.
Minimal no-MSA YAML pattern:
sequences:
- protein:
id: A
sequence: "MKTAYIAKQRQISFVKSHFSRQDILDLI"
msa: empty
Modified residues can be entered in guided Build Assembly mode, or specified directly in custom YAML:
version: 1
sequences:
- protein:
id: A
sequence: "MSTNPKPQRKTKRNTNRRPQDVKFPGG"
msa: empty
modifications:
- position: 2
ccd: MSE
Example 3 — multimer from YAML
Guided Build Assembly mode can generate multiple protein, DNA, or RNA chains. Protein chains use msa: empty. For full control, create a YAML input under /inputs/multimer.yaml with multiple polymer chains:
sequences:
- protein:
id: A
sequence: "MKTAYIAKQRQISFVKSHFSRQDILDLI"
msa: empty
- dna:
id: B
sequence: "ATCGATCG"
A DNA duplex is specified as two DNA chains, with both strands entered in their own 5-prime to 3-prime direction. In guided mode, use Add Reverse Complement after entering a DNA or RNA chain to append the reverse-complement strand.
version: 1
sequences:
- dna:
id: A
sequence: "ATGCCGTA"
- dna:
id: B
sequence: "TACGGCAT"
Then run:
/inputs/multimer.yaml
--use_potentials
--out_dir=/outputs/multimer
For de novo binder screening, a common pattern is to keep the designed binder chain as msa: empty, while giving the natural target chain a real MSA when available.
Quality-oriented sampling
The upstream defaults are already the normal full prediction path: --sampling_steps=200, --recycling_steps=3, --diffusion_samples=1, --step_scale=1.5, and --max_msa_seqs=8192.
/inputs/my_complex.yaml
--out_dir=/outputs/my_complex_seed01
--use_potentials
--recycling_steps=10
--sampling_steps=200
--diffusion_samples=1
--max_parallel_samples=1
--step_scale=1.5
--max_msa_seqs=8192
--output_format=pdb
--write_full_pae
--write_full_pde
--seed=12345
--override
- Boltz documents
--recycling_steps=10 --diffusion_samples=25as an AlphaFold3-like high-cost sampling setting. - For comparing sampled complex poses on SubSeq, separate one-sample jobs with different
--seedvalues are often easier to inspect and retry than one bundled high-sample job. - For unbiased binder screening, avoid hotspot/interface constraints unless you intentionally want constrained prediction; constraints can hide interface failure.
Additional CLI options
SubSeq passes arguments directly to boltz predict, while managing the model cache automatically.
Show common boltz predict options
--out_dir PATH
--recycling_steps INT
--sampling_steps INT
--diffusion_samples INT
--max_parallel_samples INT
--step_scale FLOAT
--output_format [pdb|mmcif]
--num_workers INT
--preprocessing-threads INT
--override
--seed INT
--max_msa_seqs INT
--subsample_msa
--num_subsampled_msa INT
--no_kernels
--use_potentials
--method STRING
--model [boltz1|boltz2]
--affinity_mw_correction
--sampling_steps_affinity INT
--diffusion_samples_affinity INT
--write_full_pae
--write_full_pde
--write_embeddings
For detailed defaults and the YAML schema, see the upstream Boltz prediction instructions.
Notes
- Confidence files such as
confidence_<id>_model_0.jsonincludeconfidence_score,ptm,iptm,complex_plddt,complex_iplddt,complex_ipde,chains_ptm, andpair_chains_iptm. - Custom A3M files should be plain text. If an upstream MSA tool writes a trailing NUL byte, strip it before passing the A3M to Boltz-2.
- For full CLI options and configuration schemas, see the Boltz GitHub repository (boltz-2 section).
- To start a job from the UI, go to New Job → Boltz-2 and paste one of the argument blocks above.