SUBSEQ.BIO
BLOG-GROMACS INTRO

Intro to GROMACS for protein design

Authors: Emin, Claude · Date: 2025-12-20

This is a practical, design-oriented guide to using GROMACS molecular dynamics (MD) as a computational "stress test" after you have a plausible structure from RFdiffusion / AlphaFold / Boltz, etc. MD won't prove a design works, but it can quickly surface common failure modes: unstable folds, breathing cores, leaky hydrophobics, weak interfaces, bad loops, and strained starting geometries.

Table of contents

What is GROMACS?

GROMACS runs molecular dynamics (MD): you simulate your protein (usually in explicit water + ions) using a molecular mechanics force field, and watch how it behaves over time.

For protein design, MD is most useful for answering:

Units note: GROMACS distances are typically in nm (0.10 nm = 1 Å). Time is usually in ps or ns.

Why use MD for protein design?

Structure predictors and design models give you a plausible snapshot. MD asks whether that snapshot is dynamically stable when atoms are allowed to move.

Tool What it tells you Limitation
RFdiffusion / BoltzGen Generates plausible backbone geometries Static structure only
AlphaFold2 / ESMFold Predicts likely folded structure Static structure only
GROMACS MD Whether the modeled structure stays stable when atoms move Slower; setup details matter

MD is best used as a relative filter on a shortlist (e.g., 3–20 candidates) under one consistent protocol. It is not a guarantee of experimental success.

What MD is great for

What MD is NOT

Key file types

Understanding what each file contains helps you debug problems and interpret results.

Extension What it is When you use it
.pdbInput structure (Protein Data Bank format)Starting point for pdb2gmx
.groGROMACS coordinate file (positions + box)Passed between steps; final snapshots
.topTopology (atoms, bonds, force field params)Describes the system for the simulation
.mdpRun parameters (timestep, temperature, etc.)Controls what kind of simulation you run
.tprPortable run input (binary, everything needed)Created by grompp, used by mdrun
.xtcCompressed trajectory (positions over time)Main analysis input
.edrEnergy file (temperature, pressure, etc.)Used by gmx energy
.cptCheckpoint (restart file)Continue interrupted runs
.xvgTime series output (plottable data)RMSD, Rg, energy plots, etc.
.xpmMatrix/image output (e.g., contact maps)DSSP, contact analysis

Typical workflow (prep → solvated MD)

This is the standard "minimal but sane" workflow for proteins. Each step builds on the previous one.

Step 1: Build topology and add hydrogens (pdb2gmx)

gmx pdb2gmx -f protein.pdb -o protein_processed.gro -p topol.top -ff amber99sb-ildn -water tip3p

What it does / key args:

Tip: If pdb2gmx complains about missing atoms or unknown residues, your input PDB may need cleaning (remove ligands, fix non-standard residues, check for missing loops).

Step 2: Define a solvent box and solvate (editconf, solvate)

gmx editconf -f protein_processed.gro -o box.gro -bt dodecahedron -d 1.0 -c
gmx solvate -cp box.gro -cs spc216.gro -o solv.gro -p topol.top

What it does / key args:

Step 3: Add ions (grompp, genion)

gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr
gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -neutral -conc 0.15

What it does / key args:

You'll be prompted to choose a group to replace with ions — usually pick "SOL" (solvent).

Step 4: Minimization + equilibration (mdrun)

# Energy minimization (remove bad contacts)
gmx grompp -f em.mdp -c solv_ions.gro -p topol.top -o em.tpr
gmx mdrun -v -deffnm em

# NVT equilibration (heat up, constant volume)
gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -o nvt.tpr
gmx mdrun -v -deffnm nvt

# NPT equilibration (constant pressure, let density settle)
gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -r nvt.gro -p topol.top -o npt.tpr
gmx mdrun -v -deffnm npt

What it does / key args:

Position restraints during equilibration let the water and ions relax around a fixed protein, preventing early instability artifacts.

Step 5: Production MD

gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -o md.tpr
gmx mdrun -v -deffnm md

What it does: This is the actual simulation you'll analyze. No more restraints — the protein moves freely.

Run length guidelines (very approximate):

Critical: Your results are only comparable if you keep the protocol identical across candidates — same force field, water model, salt concentration, temperature, .mdp settings, and run length.

Key outputs explained (what they mean)

Most analysis commands use these common arguments:

0) First: PBC-clean the trajectory (important!)

Many "binder dissociates" plots are actually periodic boundary artifacts. The protein didn't fall apart — it just wrapped around the simulation box edge. Always make molecules whole and center before analysis.

gmx trjconv -s md.tpr -f md.xtc -o md_centered.xtc -pbc mol -center

What it does / key args:

You'll be prompted twice: once for centering group (usually "Protein") and once for output group (usually "System").

1) RMSD (Root Mean Square Deviation)

What it measures: How far the structure drifts from the reference after alignment. This is your primary "is it staying folded?" signal.

gmx rms -s md.tpr -f md_centered.xtc -o rmsd.xvg -tu ns

What it does / key args:

Interpretation (heuristics; depends on protein size/fold and your atom selection):

Tip: RMSD depends heavily on which atoms you select (backbone vs all-atom) and whether you handled PBC correctly. Backbone C-alpha is a common choice.

2) RMSF (Root Mean Square Fluctuation)

What it measures: Per-residue flexibility — great for finding weak regions in your design.

gmx rmsf -s md.tpr -f md_centered.xtc -o rmsf.xvg -res

What it does / key args:

Interpretation:

RMSF is like crystallographic B-factors: it tells you which parts of the structure are mobile. A well-packed core should be rigid.

3) Radius of gyration (Rg)

What it measures: Overall compactness — how "spread out" the protein is.

gmx gyrate -s md.tpr -f md_centered.xtc -o gyrate.xvg

What it does:

Interpretation:

4) Hydrogen bonds

What it measures: Hydrogen bond counts over time (global or between specific groups).

gmx hbond -s md.tpr -f md_centered.xtc -num hbnum.xvg

What it does / key args:

Interpretation:

5) Secondary structure over time (DSSP)

What it measures: Whether helices/sheets persist or fray/collapse over time.

gmx do_dssp -s md.tpr -f md_centered.xtc -o ss.xpm -sc scount.xvg

What it does / key args:

Note: do_dssp requires a DSSP executable (mkdssp or dssp) in the environment.

6) Binder/interface stability (for complexes)

What it measures: Whether two proteins stay together or drift apart. Track more than one signal:

gmx mindist -s md.tpr -f md_centered.xtc -od mindist.xvg -tu ns
gmx mdmat   -s md.tpr -f md_centered.xtc -mean mean_contacts.xpm

What they do / key args:

Interpretation:

7) Potential energy and "sanity dashboard"

What it measures: Simulation stability signals — not protein stability directly, but whether the run is behaving sanely.

gmx energy -f md.edr -o energy.xvg

What it does / key args:

Interpretation:

Practical examples

Example 1: Monomer stability screen

Goal: Check if a designed monomer stays folded and compact.

Run a production trajectory, then analyze RMSD and Rg:

gmx rms    -s md.tpr -f md_centered.xtc -o rmsd.xvg -tu ns
gmx gyrate -s md.tpr -f md_centered.xtc -o gyrate.xvg

What to look for:

Example 2: Binder/complex stability

Goal: Check whether a designed binder stays bound to its target.

gmx mindist -s md.tpr -f md_centered.xtc -od mindist.xvg -tu ns
gmx mdmat   -s md.tpr -f md_centered.xtc -mean mean_contacts.xpm

What to look for:

Example 3: Thermostability stress test

Goal: Compare stability at different temperatures (screening, not proof).

Run comparable protocols at 300 K vs 350 K and compare RMSD trends:

gmx rms -s md_300K.tpr -f md_300K_centered.xtc -o rmsd_300K.xvg -tu ns
gmx rms -s md_350K.tpr -f md_350K_centered.xtc -o rmsd_350K.xvg -tu ns

What to look for:

Example 4: Finding problem regions

Goal: Identify which part of your design is unstable for redesign.

gmx rmsf -s md.tpr -f md_centered.xtc -o rmsf.xvg -res
gmx do_dssp -s md.tpr -f md_centered.xtc -o ss.xpm -sc scount.xvg

What to look for:

Validation checklist

Use this as a screening checklist. These are heuristics, not hard rules — absolute cutoffs depend on protein size, fold, and your analysis choices.

✅ Structural stability

✅ Local stability

✅ Interface stability (if binder/complex)

✅ Simulation sanity

Common issues & troubleshooting

Problem: "It explodes immediately"

Common causes:

Try:

Problem: RMSD keeps rising

Possibilities:

Try:

Problem: Binder "dissociates" in plots

Most common cause: Periodic boundary artifacts — the complex looks separated because one chain wrapped to the other side of the box.

Try:

Problem: pdb2gmx fails with unknown residues

Common causes:

Try:

Scope & complementary tools

MD is one tool in the design validation toolkit. Understanding its scope helps you use it appropriately.

What MD is good at

What MD can't tell you

Complementary tools

Tool What it adds
AlphaFold2 pLDDTQuick confidence filter before MD
Rosetta ddGMutation effects, stability scoring
FoldXFast stability estimates
Free energy perturbationBinding affinity predictions (more expensive)
Coarse-grained MDLonger timescales, bigger systems
Enhanced samplingMetadynamics, replica exchange for rare events

Typical validation pipeline

  1. Generate candidates: RFdiffusion, BoltzGen, etc.
  2. Quick filter: pLDDT, Rosetta scoring, visual inspection
  3. MD screening: shortlist → solvated MD → rank by stability
  4. Deeper analysis: longer runs, replicas, free energy calculations
  5. Experimental validation: expression, purification, biophysical characterization

SubSeq-specific notes

On subseq.bio, GROMACS runs as "one command per line":

See /docs/gromacs for the exact runner rules, GPU flags used by the image, and a minimal relaxation example.