BioEmu (Microsoft)
Equilibrium-ensemble sampling for protein monomers with A3M input.
How SubSeq Runs BioEmu
- Entrypoint is fixed to
python -m bioemu.sample. - Runtime dependencies are in the image; BioEmu source code is expected under
/ref(for example/ref/src/bioemu). - Model checkpoint/config are pinned to local paths:
/ref/checkpoints/bioemu-v1.2/checkpoint.ckptand/ref/checkpoints/bioemu-v1.2/config.yaml. - Service-enforced paths:
--output_dir=/outputs,--cache_embeds_dir=/outputs/.bioemu_embeds_cache,--cache_so3_dir=/ref/.bioemu_so3_cache. - This deployment rejects direct sequence strings and FASTA input at submit-time;
--sequencemust point to an.a3mfile.
Required Arguments
--sequence: path to an.a3mfile (for example/inputs/sequence.a3m).--num_samples: number of sampled structures to generate.
User-supplied --ckpt_path, --model_config_path, and --model_name are ignored by SubSeq.
Quick Start
--sequence=/inputs/sequence.a3m
--num_samples=1
--batch_size_100=10
--filter_samples=False
--base_seed=42
This is the same style as the BioEmu prefill in the New Job form.
Input Pattern
A3M from mounted inputs:
--sequence=/inputs/sequence.a3m
--num_samples=100
--batch_size_100=20
For this deployment profile, only .a3m input is accepted.
Outputs and Caches
- Main trajectory output is written under
/outputs(for examplesamples.xtzplus metadata files). - Embedding cache is written to
/outputs/.bioemu_embeds_cache. - SO(3) lookup cache is written to
/ref/.bioemu_so3_cache. - BioEmu may initialize a ColabFold environment at
/ref/.bioemu_colabfoldfor MSA-related embedding flow.
Offline/Reproducible Use
- Pre-stage BioEmu source and v1.2 checkpoint/config under
/ref. - Prefer A3M input when you want to avoid remote MSA services.
- After warm-up caches are created, you can tighten network and mount policies in your deployment flow if desired.