Evo 2

Overview

Generate DNA continuations from a typed prompt or FASTA set.
Score relative sequence likelihoods for variants, promoters, genes, or synthetic designs.
Extract sequence embeddings for clustering, visualization, classifiers, or downstream models.

Modes

Mode	Input shape	When to use it
`generate_sequence` Generate - Typed Sequence Default	No uploaded input is required by the mode itself.	Continue one typed DNA prompt.
`generate_inputs` Generate - Input FASTA	Consumes a folder or output set; useful for batches and pipeline handoffs.	Continue all FASTA records in the selected source.
`score_inputs` Score - Input FASTA	Consumes a folder or output set; useful for batches and pipeline handoffs.	Score all FASTA records in the selected source.
`embed_inputs` Embeddings - Input FASTA	Consumes a folder or output set; useful for batches and pipeline handoffs.	Extract embeddings for all FASTA records in the selected source.

Canonical Job Configuration

These are the fields exposed by the default job configuration for evo2. They are also returned by GET /api/v1/program/params?program=evo2 and submitted as the params JSON object to POST /api/v1/job/submit.

Parameter	Type	Modes	What it does
`sequence` DNA Sequence	Sequence	Generate - Typed Sequence	DNA sequence using A, C, G, T, and N. Required
`sequence_name` Sequence Name	Text	Generate - Typed Sequence	Output label for a typed sequence. Default: sequence
`n_tokens` New Tokens	Integer	Generate - Typed Sequence, Generate - Input FASTA	How many new DNA tokens to generate after each prompt. Default: 400; Range: 1-4096
`num_samples` Samples	Integer	Generate - Typed Sequence, Generate - Input FASTA	Number of continuations per prompt. Default: 1; Range: 1-8
`pool` Embedding Output	Text	Embeddings - Input FASTA	Mean writes one vector per record; tokens writes per-token embeddings. Default: mean; Options: mean, tokens, both

Advanced configuration fields

Parameter	Type	Modes	What it does
`model_name` Model	Text	All modes	Use the default model unless you specifically need another deployed model variant. Default: evo2_7b; Options: evo2_7b, evo2_7b_262k, evo2_7b_base
`temperature` Temperature	Number	Generate - Typed Sequence, Generate - Input FASTA	Higher values increase sequence diversity. Default: 1; Range: 0.05-2
`top_k` Top K	Integer	Generate - Typed Sequence, Generate - Input FASTA	Restrict sampling to the top K next-token choices. Default: 4; Range: 1-16
`top_p` Top P	Number	Generate - Typed Sequence, Generate - Input FASTA	Nucleus sampling probability threshold. Default: 1; Range: 0.01-1
`batch_size` Batch Size	Integer	Score - Input FASTA	Sequence scoring batch size. Default: 1; Range: 1-8
`reduce_method` Score Reduction	Text	Score - Input FASTA	Mean is length-normalized; sum is total sequence log-likelihood. Default: mean; Options: mean, sum
`average_reverse_complement` Average Reverse Complement	Yes/no	Score - Input FASTA	Average each sequence score with its reverse-complement score. Default: false
`layer_name` Embedding Layer	Text	Embeddings - Input FASTA	Intermediate Evo 2 layer to extract. The default follows upstream examples. Default: blocks.28.mlp.l3

Outputs And Metrics

Generation: generated FASTA and JSON records.
Scoring: scores TSV/JSON with likelihood and perplexity values.
Embeddings: manifest JSON and NPY arrays.
Interpret scores comparatively within the same model and settings.
Higher likelihood, often less negative, and lower perplexity usually mean the sequence is more consistent with the model distribution.

Common Examples

Generate from a typed promoter seed with 400 new tokens and four samples.
Score a variants FASTA with mean reduction for length-normalized comparison.
Extract mean embeddings from loci.fasta for clustering.

Example API params

{
  "mode": "generate_sequence",
  "sequence": "ACGTACGTACGTACGT",
  "sequence_name": "seed",
  "n_tokens": 400,
  "num_samples": 4
}

Caveats

Evo 2 scores are model likelihoods, not direct measurements of expression, fitness, pathogenicity, or synthesis success.
Use matched sequence contexts and identical settings when ranking variants.
Reverse-complement averaging is inappropriate when biological orientation matters.

Advanced Submit

Advanced submit is still available for direct program arguments through POST /api/v1/job/submit-advanced. Prefer canonical configuration unless you need exact low-level arguments or are reproducing a known command line.

Advanced submit exposes generation, scoring, and embedding wrapper arguments for scripted DNA-language-model workflows.
Use mean scoring for mixed-length sequence sets.

curl -X POST https://subseq.bio/api/v1/job/submit \
  -H "Authorization: Bearer <api_key>" \
  -F program=evo2 \
  -F 'params={"mode":"generate_typed_sequence","dna_sequence":"ACGTACGTACGTACGT","sequence_name":"seed","new_tokens":400,"samples":4}'