BioEmu (Microsoft)
Equilibrium-ensemble sampling and side-chain reconstruction for protein monomers.
How SubSeq Runs BioEmu
- Guided mode is one sequence-to-ensemble workflow: precompute embeddings from A3M, sample from the warmed cache, optionally apply physical steering, and optionally reconstruct all-heavy-atom side chains in the same job.
- Advanced args use one BioEmu command per line, similar to GROMACS.
- Supported commands are
python -m bioemu.subseq_precompute_embeds ...,python -m bioemu.sample ..., andpython -m bioemu.sidechain_relax .... - Arbitrary Python scripts,
python -c, and unsupportedbioemu.*modules are rejected. - Shell operators, comments, quotes, and command chaining are rejected in advanced BioEmu command lines.
- Runtime dependencies, model assets, and caches are managed by SubSeq.
- Sampling paths are enforced:
--output_dir=/outputs,--cache_embeds_dir=/outputs/.bioemu_embeds_cache,--cache_so3_dir=/ref/.bioemu_so3_cache. - This deployment rejects direct sequence strings and FASTA input at submit-time;
--sequencemust point to an.a3mfile.
Sampling
Generate backbone-frame equilibrium ensemble outputs from an A3M alignment:
python -m bioemu.subseq_precompute_embeds --sequence=/inputs/sequence.a3m --cache_embeds_dir=/outputs/.bioemu_embeds_cache
python -m bioemu.sample --sequence=/inputs/sequence.a3m --num_samples=100 --batch_size_100=10 --filter_samples=True --base_seed=42
--sequence: path to an.a3mfile.--num_samples: number of sampled structures to generate.- Guided physical steering is exposed as a strength parameter, not as a separate mode.
- Guided atom detail chooses between native backbone/topology trajectory outputs and all-heavy-atom side-chain outputs.
--denoiser_configand--steering_configmust point to mounted config files under/inputs,/aux, or/ref.- User-supplied
--ckpt_path,--model_config_path, and--model_nameare ignored by SubSeq.
Side-chain Reconstruction
Reconstruct all-heavy-atom side chains from BioEmu topology.pdb and samples.xtc outputs. Guided full-atom jobs run this after sampling using the files just written under /outputs:
python -m bioemu.sidechain_relax --pdb-path /outputs/topology.pdb --xtc-path /outputs/samples.xtc --outpath /outputs --prefix samples --no-md-equil
--outpathis normalized to/outputs.- Without
--no-md-equil, BioEmu can run OpenMM local minimization/equilibration after side-chain reconstruction. - Outputs include files such as
samples_sidechain_rec.pdb,samples_sidechain_rec.xtc,samples_md_equil.pdb, andsamples_md_equil.xtc.
This BioEmu module does not select representative centroid frames from the trajectory; use a trajectory postprocessor for clustering/extracting representative PDBs.
Outputs and Caches
- Sampling writes
topology.pdb,samples.xtc,sequence.fasta, andbatch_*.npzunder/outputs. - Embedding cache is written to
/outputs/.bioemu_embeds_cache. - SO(3) lookup cache is written to
/ref/.bioemu_so3_cache.
Offline/Reproducible Use
- Prefer A3M input when you want to avoid remote MSA services.
- Side-chain reconstruction uses image-bundled OpenMM and HPacker so jobs do not install dependencies at runtime.