v2ecoli: Whole-Cell E. coli via Process-Bigraph

Available since v0.9.0

v2ecoli is a whole-cell model of Escherichia coli built entirely on process-bigraph. It composes ~55 biological processes — metabolism, transcription, translation, replication, chromosome condensation, cell division, and more — into a single runnable Composite. Atlantis provides a production path to run v2ecoli simulations on HPC without any local installation.

Quick Start

Single cell

uv run atlantis compose ecoli \
    --duration 60 \
    --seed 0 \
    --poll \
    --base-url https://sms.cam.uchc.edu

This runs 60 seconds of biological time and polls until the SLURM job completes. Note your simulation ID from the output, then download results:

uv run atlantis compose results <SIM_ID> \
    --dest ./ecoli_output \
    --base-url https://sms.cam.uchc.edu

All options

Option

Default

Description

--duration

60.0

Biological simulation time (seconds)

--seed

0

Random seed for stochastic processes

--interval

1.0

Execution timestep (seconds)

--features

[]

JSON list of feature modules, e.g. '["ppgpp_regulation"]'

--cache-dir

/out/cache

Absolute path to ParCa cache inside the container

--poll

off

Wait for completion and print final status

--base-url

http://localhost:8080

API server URL

Colony Simulations

A colony simulation runs the same E. coli model with multiple independent random seeds, producing an ensemble that captures cell-to-cell variability arising from stochastic processes (transcription bursting, division asymmetry, etc.).

Run a colony from the CLI

The simplest approach is a shell loop — each seed is an independent SLURM job that runs in parallel on the cluster:

BASE_URL=https://sms.cam.uchc.edu
DURATION=120
SEEDS=10

for SEED in $(seq 0 $((SEEDS - 1))); do
    SIM_ID=$(uv run atlantis compose ecoli \
        --duration $DURATION \
        --seed $SEED \
        --base-url $BASE_URL \
        2>/dev/null | grep "Simulation ID" | awk '{print $NF}')
    echo "Submitted seed $SEED → sim $SIM_ID"
done

All 10 simulations run concurrently on the cluster. Poll any one of them:

uv run atlantis compose status <SIM_ID> --base-url $BASE_URL

Download colony results

for SIM_ID in 101 102 103 104 105 106 107 108 109 110; do
    uv run atlantis compose results $SIM_ID \
        --dest ./colony/seed_$((SIM_ID - 100)) \
        --base-url $BASE_URL
done

Each seed_N/ directory will contain results.zip with the simulation output (final_state.json, time-series data).

What varies between seeds

Stochastic variation between seeds comes from:

Process

Source of randomness

Transcription

Poisson-distributed mRNA synthesis events

Translation

Stochastic ribosome binding and elongation

Replication

Probabilistic origin firing

Cell division

Stochastic partitioning of molecules to daughters

Metabolism

Flux variability from stochastic enzyme availability

Setting --seed 0 through --seed N-1 gives you reproducible, independently varying trajectories for each simulated cell.

Feature flags

Enable biological sub-models that are off by default:

uv run atlantis compose ecoli \
    --duration 60 \
    --seed 0 \
    --features '["ppgpp_regulation", "rna_attenuation"]' \
    --poll \
    --base-url https://sms.cam.uchc.edu

Available features are defined in the v2ecoli source tree under v2ecoli/experiments/.

How It Works Under the Hood

When you call atlantis compose ecoli, the following pipeline runs:

1. Process-bigraph document generation

The API generates a .pbg document that configures the v2ecoli Composite:

{
    "state": {
        "v2ecoli": {
            "_type": "process",
            "address": "local:v2ecoli.composite.make_composite",
            "config": {
                "seed": 0,
                "cache_dir": "/out/cache",
                "features": []
            },
            "interval": 1.0
        }
    }
}

2. Container auto-generation

A Singularity .def file is generated by pbest with v2ecoli injected as an extra pip dependency:

Bootstrap: docker
From: ghcr.io/astral-sh/uv:python3.12-bookworm

%post
    pip install process-bigraph pbsim-common
    pip install git+https://github.com/vivarium-collective/v2ecoli.git
    # vEcoli pulled in transitively

The definition is content-hashed. If a container with the same hash already exists on HPC, the build is skipped entirely (~0s). Otherwise, a SLURM build job runs (~15 minutes for first build).

3. Runner script

A Python runner is generated and uploaded to the experiment directory:

from v2ecoli.composite import make_composite

composite = make_composite(
    cache_dir='/out/cache',
    seed=0,
    features=[],
)
composite.run(60.0)

make_composite is a factory function (not a class) — it assembles the full bigraph of ~55 biological process instances and returns a runnable Composite.

4. SLURM dispatch

singularity exec \
    --bind /experiment:/experiment \
    --bind /projects/SMS/sms_api/prod/compose/cache:/out/cache \
    /path/to/container.sif \
    python /experiment/v2ecoli_run.py

5. Results

Output is written to /experiment/output/ inside the container (bind-mounted from HPC filesystem) and zipped to results.zip for download. Primary outputs:

  • final_state.json — complete cell state at end of simulation (~14 MB typical)

  • Time-series data for observed quantities (if observables specified)

The Biological Processes

v2ecoli decomposes whole-cell behavior into 55 Process and Step instances. Key subsystems:

Subsystem

Processes

Biology

Gene expression

Transcription, Translation, RnaMaturation

mRNA synthesis, protein synthesis, RNA processing

Metabolism

Metabolism, PolypeptideElongation

Flux balance analysis, elongation rates

Regulation

TfBinding, TfUnbinding, EquilibriumModel

Transcription factor dynamics

Replication

ChromosomeCondensation, ChromosomeReplication

DNA replication fork tracking

Division

Division, BulkDivision

Cell division and molecule partitioning

Signaling

TwoComponentSystem

Histidine kinase signal transduction

Structure

Complexation

Protein complex assembly

Each process declares typed ports and wires to shared stores via the process-bigraph wiring layer. The allocate_core() function registers all of them into a link_registry at API startup — which is also what powers the Process Runtime endpoints.

ParCa Cache

v2ecoli requires a pre-computed Parameter Calculator (ParCa) cache:

File

Contents

initial_state.json

Baseline cell state (molecule counts, growth rates)

sim_data_cache.dill

Serialized configs for all 55 biological processes

cache_version.json

Source file hash for staleness detection

The cache is hosted on the HPC filesystem at /projects/SMS/sms_api/prod/compose/cache/ and bind-mounted at /out/cache inside the container.

Warning

The cache must be regenerated whenever the v2ecoli package is updated. A stale cache causes a StaleCacheError on simulation startup. The verify_cache_version() check in v2ecoli catches this and gives a clear error message before the simulation attempts to run.

REST API Reference

Method

Path

Description

POST

/compose/v1/curated/ecoli

Submit v2ecoli simulation

GET

/compose/v1/simulation/{id}/status

SLURM job status

GET

/compose/v1/simulation/{id}/results

Download results ZIP

GET

/compose/v1/simulation/{id}/document

Retrieve PBG document used

Submit request body

{
    "duration": 60.0,
    "seed": 0,
    "interval": 1.0,
    "features": [],
    "cache_dir": "/out/cache"
}

Response

{
    "simulation_database_id": 42,
    "simulator_database_id": 7,
    "status": "submitted"
}