# v2ecoli: Whole-Cell *E. coli* via Process-Bigraph *Available since v0.9.0* [v2ecoli](https://github.com/vivarium-collective/v2ecoli) is a whole-cell model of *Escherichia coli* built entirely on [process-bigraph](https://github.com/vivarium-collective/process-bigraph). It composes ~55 biological processes — metabolism, transcription, translation, replication, chromosome condensation, cell division, and more — into a single runnable `Composite`. Atlantis provides a production path to run v2ecoli simulations on HPC without any local installation. ## Quick Start ### Single cell ```bash uv run atlantis compose ecoli \ --duration 60 \ --seed 0 \ --poll \ --base-url https://sms.cam.uchc.edu ``` This runs 60 seconds of biological time and polls until the SLURM job completes. Note your simulation ID from the output, then download results: ```bash uv run atlantis compose results \ --dest ./ecoli_output \ --base-url https://sms.cam.uchc.edu ``` ### All options | Option | Default | Description | |---|---|---| | `--duration` | `60.0` | Biological simulation time (seconds) | | `--seed` | `0` | Random seed for stochastic processes | | `--interval` | `1.0` | Execution timestep (seconds) | | `--features` | `[]` | JSON list of feature modules, e.g. `'["ppgpp_regulation"]'` | | `--cache-dir` | `/out/cache` | Absolute path to ParCa cache inside the container | | `--poll` | off | Wait for completion and print final status | | `--base-url` | `http://localhost:8080` | API server URL | ## Colony Simulations A **colony simulation** runs the same *E. coli* model with multiple independent random seeds, producing an ensemble that captures cell-to-cell variability arising from stochastic processes (transcription bursting, division asymmetry, etc.). ### Run a colony from the CLI The simplest approach is a shell loop — each seed is an independent SLURM job that runs in parallel on the cluster: ```bash BASE_URL=https://sms.cam.uchc.edu DURATION=120 SEEDS=10 for SEED in $(seq 0 $((SEEDS - 1))); do SIM_ID=$(uv run atlantis compose ecoli \ --duration $DURATION \ --seed $SEED \ --base-url $BASE_URL \ 2>/dev/null | grep "Simulation ID" | awk '{print $NF}') echo "Submitted seed $SEED → sim $SIM_ID" done ``` All 10 simulations run concurrently on the cluster. Poll any one of them: ```bash uv run atlantis compose status --base-url $BASE_URL ``` ### Download colony results ```bash for SIM_ID in 101 102 103 104 105 106 107 108 109 110; do uv run atlantis compose results $SIM_ID \ --dest ./colony/seed_$((SIM_ID - 100)) \ --base-url $BASE_URL done ``` Each `seed_N/` directory will contain `results.zip` with the simulation output (`final_state.json`, time-series data). ### What varies between seeds Stochastic variation between seeds comes from: | Process | Source of randomness | |---|---| | Transcription | Poisson-distributed mRNA synthesis events | | Translation | Stochastic ribosome binding and elongation | | Replication | Probabilistic origin firing | | Cell division | Stochastic partitioning of molecules to daughters | | Metabolism | Flux variability from stochastic enzyme availability | Setting `--seed 0` through `--seed N-1` gives you reproducible, independently varying trajectories for each simulated cell. ### Feature flags Enable biological sub-models that are off by default: ```bash uv run atlantis compose ecoli \ --duration 60 \ --seed 0 \ --features '["ppgpp_regulation", "rna_attenuation"]' \ --poll \ --base-url https://sms.cam.uchc.edu ``` Available features are defined in the v2ecoli source tree under `v2ecoli/experiments/`. ## How It Works Under the Hood When you call `atlantis compose ecoli`, the following pipeline runs: ### 1. Process-bigraph document generation The API generates a `.pbg` document that configures the v2ecoli `Composite`: ```json { "state": { "v2ecoli": { "_type": "process", "address": "local:v2ecoli.composite.make_composite", "config": { "seed": 0, "cache_dir": "/out/cache", "features": [] }, "interval": 1.0 } } } ``` ### 2. Container auto-generation A Singularity `.def` file is generated by [pbest](https://github.com/biosimulations/pbest) with v2ecoli injected as an extra pip dependency: ```singularity Bootstrap: docker From: ghcr.io/astral-sh/uv:python3.12-bookworm %post pip install process-bigraph pbsim-common pip install git+https://github.com/vivarium-collective/v2ecoli.git # vEcoli pulled in transitively ``` The definition is content-hashed. If a container with the same hash already exists on HPC, the build is skipped entirely (~0s). Otherwise, a SLURM build job runs (~15 minutes for first build). ### 3. Runner script A Python runner is generated and uploaded to the experiment directory: ```python from v2ecoli.composite import make_composite composite = make_composite( cache_dir='/out/cache', seed=0, features=[], ) composite.run(60.0) ``` `make_composite` is a factory function (not a class) — it assembles the full bigraph of ~55 biological process instances and returns a runnable `Composite`. ### 4. SLURM dispatch ```bash singularity exec \ --bind /experiment:/experiment \ --bind /projects/SMS/sms_api/prod/compose/cache:/out/cache \ /path/to/container.sif \ python /experiment/v2ecoli_run.py ``` ### 5. Results Output is written to `/experiment/output/` inside the container (bind-mounted from HPC filesystem) and zipped to `results.zip` for download. Primary outputs: - `final_state.json` — complete cell state at end of simulation (~14 MB typical) - Time-series data for observed quantities (if observables specified) ## The Biological Processes v2ecoli decomposes whole-cell behavior into 55 `Process` and `Step` instances. Key subsystems: | Subsystem | Processes | Biology | |---|---|---| | **Gene expression** | Transcription, Translation, RnaMaturation | mRNA synthesis, protein synthesis, RNA processing | | **Metabolism** | Metabolism, PolypeptideElongation | Flux balance analysis, elongation rates | | **Regulation** | TfBinding, TfUnbinding, EquilibriumModel | Transcription factor dynamics | | **Replication** | ChromosomeCondensation, ChromosomeReplication | DNA replication fork tracking | | **Division** | Division, BulkDivision | Cell division and molecule partitioning | | **Signaling** | TwoComponentSystem | Histidine kinase signal transduction | | **Structure** | Complexation | Protein complex assembly | Each process declares typed ports and wires to shared stores via the process-bigraph wiring layer. The `allocate_core()` function registers all of them into a `link_registry` at API startup — which is also what powers the [Process Runtime](rest-process.md) endpoints. ## ParCa Cache v2ecoli requires a pre-computed **Parameter Calculator (ParCa)** cache: | File | Contents | |---|---| | `initial_state.json` | Baseline cell state (molecule counts, growth rates) | | `sim_data_cache.dill` | Serialized configs for all 55 biological processes | | `cache_version.json` | Source file hash for staleness detection | The cache is hosted on the HPC filesystem at `/projects/SMS/sms_api/prod/compose/cache/` and bind-mounted at `/out/cache` inside the container. ```{warning} The cache must be regenerated whenever the v2ecoli package is updated. A stale cache causes a `StaleCacheError` on simulation startup. The `verify_cache_version()` check in v2ecoli catches this and gives a clear error message before the simulation attempts to run. ``` ## REST API Reference | Method | Path | Description | |---|---|---| | `POST` | `/compose/v1/curated/ecoli` | Submit v2ecoli simulation | | `GET` | `/compose/v1/simulation/{id}/status` | SLURM job status | | `GET` | `/compose/v1/simulation/{id}/results` | Download results ZIP | | `GET` | `/compose/v1/simulation/{id}/document` | Retrieve PBG document used | ### Submit request body ```json { "duration": 60.0, "seed": 0, "interval": 1.0, "features": [], "cache_dir": "/out/cache" } ``` ### Response ```json { "simulation_database_id": 42, "simulator_database_id": 7, "status": "submitted" } ```