Evo 2
DNA language model jobs for sequence generation, likelihood scoring, and embedding extraction.
How SubSeq Runs It
- Jobs run
/ref/bin/evo2_subseq.pywith the Evo 2 runtime. - The default checkpoint is
evo2_7b. Advanced jobs can chooseevo2_7b_262korevo2_7b_base. - Model data is loaded from the service-local Evo 2 ref tree.
- Network access is disabled for jobs, and output is forced to
/outputs.
Input
Use typed DNA sequence text, one FASTA file, or a folder of FASTA files. Sequences should use A, C, G, T, and N.
In the job form, choose the operation first, then choose Sequence Input. If you choose a FASTA file or folder, the next source selector only asks where that file or folder comes from: paste/upload, dataset, or previous job output.
>example
ACGTACGTACGTACGT
Example Arguments
Generate
generate
--sequence=ACGTACGTACGT
--name=prompt
--n-tokens=400
--num-samples=1
--temperature=1.0
--top-k=4
Score
score
--fasta=/inputs/sequences.fasta
--reduce-method=mean
--average-reverse-complement
Embeddings
embed
--input-folder=/inputs/fastas
--recursive
--pool=mean
Arguments
Global
--model-name=evo2_7b|evo2_7b_262k|evo2_7b_base: choose the Evo 2 7B-family checkpoint.
Input
--sequence=<dna>: one typed DNA sequence.--name=<label>: output label for typed sequence input.--fasta=/inputs/example.fasta: one FASTA file.--input-folder=/inputs/fastas: folder of FASTA files.--recursive: search subfolders when using a FASTA folder.
Generation
--n-tokens=<int>: number of new DNA tokens. Default:400.--num-samples=<int>: continuations per prompt. Default:1.--temperature=<float>: sampling temperature. Default:1.0.--top-k=<int>: top-K sampling cutoff. Default:4.--top-p=<float>: nucleus sampling cutoff. Default:1.0.
Scoring
--batch-size=<int>: scoring batch size. Default:1.--reduce-method=mean|sum: length-normalized or total score. Default:mean.--average-reverse-complement: average each score with the reverse-complement score.--prepend-bos: prepend Evo 2's beginning-of-sequence token when scoring.
Embeddings
--layer-name=<name>: Evo 2 layer to extract. Default:blocks.28.mlp.l3.--pool=mean|tokens|both: write one vector per sequence, per-token vectors, or both.
Outputs
- Generation:
/outputs/generated.fastaand/outputs/generated.json. - Scoring:
/outputs/scores.tsvand/outputs/scores.json. - Embeddings:
/outputs/embeddings_manifest.jsonand one or more.npyarrays.
Submit
Queue a run from New Job -> Evo 2.