🧬 PETase Competition · April 2026
LCC Deep Mining
Multi-Signal Integration
Goal: Engineer LCC variants with higher PET hydrolysis activity than wild-type and the ICCG benchmark,
by systematically mining a deep mutational scanning (DMS) dataset with 5 computational tools.
Target Enzyme
LCC — Leaf-branch Compost Cutinase (PDB 4EB0)
        Benchmark
         ICCG: F243I/D238C/Y127G/N246D (Tournier, Nature 2020, Tm +19°C)
DMS Dataset
8,179 variants · micro-droplet FACS · 40h PET hydrolysis
Fitness Definition
log₂(enrichment) after FACS sorting; >0 = better than WT
01

LCC Structure, Activity & Published Variants

Background

α/β-Hydrolase Fold

261 residues, chain A. Catalytic triad: Ser165 (nucleophile), Asp210 (acid), His242 (base). Oxyanion hole: Gly167, Asn246. All are DMS coldspots.

LCC WT vs ICCG — Literature Activity

PropertyLCC WTICCG
Tm84.7°C90.9°C (+6.2°C)
PET degradation31% Pf-PET, 3d, 65°C90% in 10h, 72°C
Rate (Gf-PET)93.2 mgTA/h/mgHigher at 72°C
Productivity16.7 g TPA/L/h

Source: Tournier et al., Nature 2020. ICCG = F243I/D238C/S283C/Y127G. All residue numbers in this report use PDB/UniProt numbering.

ICCG Mutations in DMS

MutationDMS FitnessNote
F243I−0.130Slightly deleterious alone
D238Cnot measuredD238G=+0.42
S283C+0.803Disulfide partner — beneficial alone
Y127Gnot measuredY127H=+0.25, Y127C=−1.57
N246D−1.027Strongly deleterious alone

Key Variants (by improvement over ICCG) + DMS Singles Sum†

VariantAdditional Mutsvs ICCGDMS Singles Sum†CoverageSource
LCC-LANL ★+P38L/Y61C/M91I/L117P/A183V/H218Y/Q224H/S247L/T256I14.3×+1.2611/13 scored
(D238C, Y127G absent from DMS)
NREL 2024
RITK+D53R/R143I/D193T/E208K8.3×+0.903/8 scoredFang 2023
LCC-I40M+6 mutations (many outside DMS range)3.6×+0.67partialML study
ICCG/H252Y+H218Y2.6×+1.163/5 scoredCribari
LCC-A2+H218Y/N248D+40%+1.434/6 scoredZheng 2024
ICCG(baseline)1.0×+0.672/4 scoredTournier 2020

† Interpretation and limitations of DMS Singles Sum: This column is a naive linear sum of single-site DMS fitness values for scored mutations only — it does not predict true multi-mutant activity. Three caveats: (1) Missing mutations (e.g. D238C, Y127G absent from DMS singles) contribute 0 to the sum, causing underestimation; (2) Epistasis is entirely ignored — DMS directly measures F243I+P38L combinatorial fitness = −1.52, far below the additive sum of the two singles, demonstrating strong antagonism that this column cannot capture; (3) Coverage varies widely across variants (3/8 to 11/13), making cross-row comparisons unreliable. This column is a rough proxy for "how many DMS-beneficial singles does this variant carry" and must not be used for activity ranking.
★ LCC-LANL (14.3× ICCG, NREL/LANL ACS Catal. 2024). H218Y appears in 3 of the top-4 variants (single-mutant DMS fitness +0.490).

Project context: Goal: identify LCC multi-mutant candidates that can outperform ICCG and potentially approach LCC-LANL (14.3× ICCG) performance, guided by computational mining of the 8,179-variant DMS dataset.
01b

LCC Structural Zone Definitions — Binding Site & Secondary Shell

Structure

Definition: Residues with direct contact to PET/MHET substrate, identified by molecular docking of 2-HE(MHET)₃ into PDB 4EB0 (Tournier et al., Nature 2020). Highest priority zone.

PDBAARoleDMS fitness†ScoreconsPublished variant
165SerCatalytic nucleophile−0.685 (n=6)0.948
210AspCatalytic acid−0.467 (n=5)1.000
242HisCatalytic base−0.802 (n=6)1.000
95TyrOxyanion hole (2nd)−0.231 (n=5)0.883
166MetOxyanion hole (1st)−0.719 (n=6)1.000
164HisSubsite S2−0.138 (n=6)1.000
125PheHydrophobic groove−0.542 (n=5)0.625M125I (LCC-LANL)
127TyrAromatic clamp−0.680 (n=4)0.610Y127G (ICCG)
130SerSubstrate binding−0.232 (n=5)0.984T130M (WCCM)
94GlyBinding pocket−0.317 (n=4)1.000
190TrpAromatic binding−0.782 (n=5)1.000
212ValBinding pocket+0.111 (n=4)0.836
243PheBinding pocket−0.319 (n=6)0.748F243I (ICCG/LCC-LANL)
246AsnBinding pocket−1.271 (n=6)0.985N246D (ICCG)

†DMS fitness = mean log₂-enrichment across all measured substitutions at this position. Scorecons 0–1 (1=fully conserved); catalytic triad values from ganon scorecons_conservation.csv.

Definition: Any residue not in the binding site whose closest atom is ≤5Å from any atom of a binding site or catalytic triad residue. Computed from PDB 4EB0 all-atom coordinates. Priority: binding site > secondary shell > surface/core.

PDBAADist to BS†DMS fitnessScoreconsPublished variant
92Ser2.9Å−0.557 (n=3)0.910
93Pro1.3Å−0.397 (n=5)1.000
96Thr1.3Å−0.709 (n=3)1.000
97Ala3.0Å−0.283 (n=5)0.725
101Ser3.0Å−0.301 (n=6)0.884
102Leu4.0Å−0.492 (n=4)0.697
104Trp3.7Å+0.395 (n=4)1.000
123Ser3.0Å+0.123 (n=4)0.697
124Arg1.3Å−0.398 (n=5)0.514
126Asp1.3Å−1.275 (n=5)1.000
128Pro1.3Å−0.694 (n=5)0.984
129Asp1.3Å+0.030 (n=5)0.914
131Arg1.3Å−0.646 (n=7)0.978
132Ala3.2Å−0.221 (n=4)0.786
133Ser2.9Å+0.163 (n=6)0.567
134Gln3.0Å−0.604 (n=5)0.986
162Ala4.7Å+0.149 (n=5)0.630
163Gly1.3Å−0.634 (n=6)1.000
167Gly1.3Å−1.097 (n=4)1.000
168Gly2.5Å−0.195 (n=4)1.000
169Gly2.9Å−0.657 (n=5)1.000
170Gly2.9Å−0.737 (n=5)0.965
187Leu2.9Å−0.404 (n=4)1.000
188Thr2.9Å−0.118 (n=4)0.854
189Pro1.3Å−0.243 (n=5)0.766
191His1.3Å−0.639 (n=3)0.792
192Thr3.8Å+0.826 (n=5)0.763
207Ala3.0Å+0.503 (n=5)0.845
208Glu3.3Å+0.101 (n=4)0.897
209Ala1.3Å+0.256 (n=5)0.462
211Thr1.3Å+0.297 (n=3)0.770
213Ala1.3Å+0.400 (n=3)1.000
214Pro3.4Å+0.071 (n=6)0.845
215Val4.7Å−0.112 (n=5)1.000
218His3.3Å+0.041 (n=7)0.960H218Y (LCC-LANL)
222Phe4.1Å+0.204 (n=5)0.975
240Ala3.5Å−0.327 (n=4)0.986
241Ser1.3Å−0.418 (n=3)0.796
244Ala1.3Å−0.078 (n=5)0.747
245Pro1.3Å+0.361 (n=5)1.000
247Ser1.3Å+0.213 (n=3)0.582S247L (LCC-LANL)
248Asn4.7Å−0.181 (n=6)0.550

†Min atom-to-atom distance to any binding site residue (incl. catalytic triad). DMS fitness = mean log₂-enrichment. Scorecons 0–1 (1=fully conserved). Yellow = our top-5 candidate. Blue = published variant mutation.

02

Published Variants vs DMS Dataset — Coverage & Alignment

Gap Analysis

No published variant combination exists in DMS

The DMS library was generated by random combinatorial mutagenesis, not targeted at specific known variants. None of the 7 published LCC variants has an exact combination match in the 8,179-variant dataset. This means we cannot directly validate any published variant's performance using DMS data.

VariantActivityTotal MutsSingles in DMSKey MissingSubset Match in DMS?
LCC-LANL ★14.3× ICCG1311/13D238C, Y127G 1 pair found: F243I+P38L combinatorial fitness = −1.52
(Directly measured DMS double-mutant; independent of Singles Sum — demonstrates strong antagonism between these two mutations in the LCC-LANL background)
RITK8.3×83/8D238C, Y127G, D53R, R143I, D193TNone
LCC-A2+40%64/6D238C, Y127GNone
ICCG/H252Y2.6×53/5D238C, Y127GNone
ICCG1.0× (baseline)42/4D238C, Y127GNone
WCCGTm+7°C41/4F243W, D238C, Y127GNone

Key Observation: D238C and Y127G Missing Everywhere

These two ICCG core mutations are absent from all DMS singles. D238 has D238G (+0.42), D238V (−0.90) but not D238C. Y127 has Y127H (+0.25), Y127N (−0.19) but not Y127G. This is a fundamental coverage gap — the DMS library simply didn't sample these specific amino acid substitutions.

LCC-LANL: F243I+P38L Antagonism

The only subset match found: LCC-LANL contains both F243I and P38L, and this pair exists in DMS as a double with fitness −1.52. Additive prediction: F243I(−0.13) + P38L(+0.42) = +0.29. Actual: −1.52. Δ = −1.81 — severe antagonism. Yet in LCC-LANL (with 11 other mutations), the combination works brilliantly. This proves that higher-order epistasis rescues pairwise antagonism.

Implication: DMS data alone cannot validate published variants, and pairwise fitness does not predict multi-mutant outcomes. The best published variant (LCC-LANL) relies on complex higher-order epistasis that is invisible to any pairwise or additive analysis.
02

DMS Fitness Distribution — Single Mutations

Data
30.4%
Beneficial (>+0.3)
379 mutations
38.4%
Neutral (−0.3 to +0.3)
478 mutations
31.2%
Deleterious (<−0.3)
389 mutations

Distribution Shape

The distribution is roughly symmetric around 0 (WT level) with a slight left skew. The high beneficial fraction (30%) is unusual for enzymes — most DMS datasets show <10% beneficial. This indicates LCC has extensive room for improvement through mutation, consistent with it being a natural enzyme not previously optimized for PET degradation.

03

DMS Dataset Overview — Multi-Site Variants

Data

What is Micro-Droplet DMS?

Each LCC variant is encapsulated in a water-in-oil micro-droplet with a PET substrate. After 40h of hydrolysis at 65°C, droplets are sorted by fluorescence (FACS) — brighter = more PET degraded = higher activity. Fitness = log₂(enrichment ratio) after sorting. A fitness of 0 = wild-type level; >0 = better than WT; <0 = worse than WT. The measurement integrates activity, stability, and expression into a single readout.

379
Beneficial (>+0.3)
30.4%
478
Neutral (−0.3 to +0.3)
38.4%
389
Deleterious (<−0.3)
31.2%

Total: 1,246 single mutants covering 259 of 261 positions. 30% beneficial rate is unusually high — LCC is a mutationally tolerant enzyme.

kCountMean FitnessNote
22,701−0.75Most common; antagonistic epistasis dominates
31,902−0.93Further fitness decline on average
41,225−1.10Best combo: +4.00 (R47H+T82A+A209V+S241P)
5636−1.00Best combo: +6.98 (top of entire dataset)
6-13544−1.321Diminishing returns at higher k

Total: 8,179 variants. Mean fitness declines with k, but rare combinations massively outperform WT — the dataset contains hidden gems.

04

DMS Fitness Landscape — Top Performers

Data
#MutationFitnessStructural ZoneOHM Zone
1N249H+1.839Surfacestructural_essential
2N140D+1.835Surfacesafe_target
3A207E+1.8272nd Shellallosteric_core
4T192P+1.8082nd Shellsafe_target
5L142M+1.789Surfacestructural_essential
6Q40P+1.757Surfacesafe_target
7N44H+1.664Surfacesafe_target
8N225K+1.653Surfaceallosteric_handle
9Y95F+1.621Binding sitestructural_essential
10Q217P+1.597Surfaceallosteric_handle

OHM Zone legend: 

allosteric_core = high ACI + conserved, directly relays catalytic signal; 

allosteric_handle = high ACI + low conservation, tunable modulator — ideal engineering target; safe_target = low ACI, mutations are additive with low epistasis risk; 

structural_essential = conserved but low ACI, maintains fold integrity.

kMutationsFitness
5T121M+Y127N+A183V+F196S+A281T+6.975
2I204F+A207T+3.911
3K194R+I204N+N288I+3.888
4R47H+T82A+A209V+S241P+3.998
6T60S+P93R+S100P+P189L+S258T+P280Q+4.607

Key Observation

The best 5-mutant combination (+6.98) is 3.8× the best single mutant (+1.84). This is not merely additive — specific combinations synergize. The challenge: with 259 positions and 20 amino acids, the combinatorial space is vast. We need computational tools to navigate it efficiently.

Additive fitness = sum of individual single-mutant fitness values. If a combination's observed fitness exceeds the additive prediction, there is positive epistasis (synergy). Individual singles: T121M(−0.261)+Y127N(−0.191)+A183V(−0.084)+F196S(−1.272)+A281T(−0.660) = additive sum −2.47. The observed +6.975 represents extreme positive epistasis of +9.44 — the combination works despite each component being individually neutral or deleterious.
05

Conservation vs DMS Fitness — Identifying Engineering Targets

Sequence Analysis

Each dot = one position (mean fitness across all single mutations at that position). Hover for details. Ideal targets: top-left quadrant (low conservation + high fitness).

✓ Top-Left: Ideal Targets

Low conservation + high fitness. Evolutionarily unconstrained AND mutationally tolerant. Q217P (cons=0.531, fit=+0.34) and N140D (cons=0.713, fit=+0.06) sit here — safest candidates to engineer.

⚠ Top-Right: High-Risk High-Reward

High conservation but positive fitness for specific substitutions. W104L (cons=1.0) and A207E (cons=0.845) — conserved positions where rare mutations improve function. Similar to ICCG's strategy. Higher epistasis risk.

✗ Bottom-Right: Avoid

High conservation + low fitness. Catalytic triad (Ser165, Asp210, His242) and structural core cluster here. Locked by evolution — almost all mutations are catastrophic.

Bottom-Left: Low Priority

Low conservation + low fitness. Not evolutionarily constrained, but mutations don't help either. These are surface-exposed or disordered positions with little functional relevance.

Spearman ρ = −0.247, NDCG@10% = 0.70 (p = 6.8×10⁻⁵). The negative correlation confirms that conservation predicts mutational intolerance, but the scatter is wide — many positions deviate from the trend, creating engineering opportunities.
06

Conservation, Hotspots & Coldspots

Sequence Analysis

Conservation Analysis (Scorecons, 150 Orthologs)

Method: BLASTp search against UniRef90 (E-value < 1e-30, query coverage > 80%) yielded ~150 LCC homologs. Multiple sequence alignment via Clustal Omega, then conservation scored by Scorecons (Valdar 2002). Score range 0→1, where 1.0 = identical across all orthologs. 

Rationale: Conserved positions are under evolutionary constraint — mutations are more likely to be deleterious.

Conservation vs DMS fitness (per-position, single-mutation mean): Spearman ρ = −0.247 (p = 6.8×10⁻⁵, n=255 positions). As expected, more conserved positions have lower mean fitness across all substitutions.

NDCG@10% = 0.70 (computed on per-position mean single-mutation fitness, ranking 259 positions by negated conservation score) — conservation alone identifies ~70% of top-performing positions correctly.

PositionWT AAMean Fitness% DeleteriousRole
165Ser−0.6983%Catalytic nucleophile
167Gly−1.10100%Oxyanion hole
210Asp−0.4780%Catalytic acid
242His−0.80100%Catalytic base
246Asn−1.27100%Oxyanion hole
96Thr−0.71100%Buried structural
170Gly−0.74100%Near active site
PositionMean Fitness% BeneficialMaxNote
217+0.7667%+1.60allosteric handle ★ candidate
207+0.5060%+1.83allosteric core ★ candidate
286+0.4460%+1.33structural_essential C-term loop, ACI 60.1% ★ candidate
117+0.64100%+1.05Surface exposed
44+0.6386%+1.70N-terminus
229+0.56100%+0.84Surface loop

Candidate Positions — Conservation Status

PositionScoreconsMedianSafe?Status
217 (Q→P)0.5310.826YesHotspot
140 (N→D)0.7130.826YesNeutral
207 (A→E)0.8450.826BorderlineHotspot
104 (W→L)1.0000.826ConservedNeutral
286 (R→P)0.9900.826ConservedHotspot

Apparent contradiction: conserved yet hotspot? "Hotspot" means that specific substitutions at this position are beneficial (e.g. R286P = +1.33), while "conserved" means most organisms keep the wild-type residue. This happens when one or two specific mutations escape the evolutionary constraint — e.g. Pro rigidifies a loop in a way evolution didn't explore. ICCG similarly mutated the highly conserved N246 (scorecons 0.985) successfully.

07

Deep Mining Strategy — 5 Computational Tools

Methods Overview

Rationale

DMS fitness alone ranks mutations by observed performance, but doesn't explain why they work or predict how they'll combine. We integrate 5 orthogonal zero-shot tools + DMS epistasis analysis — each capturing a different aspect of protein function — to identify positions with convergent multi-signal support, maximizing confidence for wet lab validation.

Tool 1
OHM Allostery
allosteric paths
Tool 2
RINpy Network
hub residues
Tool 3
ESM-2 PLM
evolutionary fit
Tool 4
MULTI-evolve
combo prediction
Step 5
Benchmark
16 models × 65K combos

OHM — Why?

Identifies positions that participate in allosteric signal transduction to the active site. Mutations at allosteric positions can modulate activity through long-range effects — a mechanism distinct from direct fitness.

RINpy — Why?

Builds a residue interaction network from atomic contacts in the PDB. Identifies structural hub residues. Mutations at hubs are usually catastrophic — exceptions are exceptionally valuable engineering targets.

ESM-2 — Why?

Protein language model trained on millions of sequences. Captures evolutionary constraints beyond simple conservation — understands amino acid context and co-evolution patterns.

MULTI-evolve — Why?

Arc Institute framework (Science 2026). The only tool that directly predicts multi-mutant fitness from single/double data using a neural network. Tests if top positions remain top in combination.

Scoring Benchmark — Why?

Exhaustive search of all 65,535 subsets of 16 tools via rank averaging. Best combo: ESM-2 3B + ThermoMPNN + OHM ACI (Spearman=0.248, NDCG@10%=0.833) — outperforms any single tool.

08

Tool 1: OHM Allosteric Communication Analysis

Allostery

What OHM Computes

OHM (Ohm-based Allosteric Model) analyzes how perturbations at one residue propagate through the protein to the active site. 

Output: one ACI score per position (not per amino acid) — it is a property of the position in the structure, not of specific mutations. ACI is a percentile (0–100%) measuring how strongly that position participates in signal transduction to the catalytic triad.

Higher ACI = stronger allosteric coupling to the catalytic triad. This is not simply "better" — it depends on context: high-ACI positions in the Allosteric Handle zone (high ACI + low conservation) are the preferred engineering targets, as mutations there can tune catalysis with manageable epistasis risk. High-ACI positions in the Allosteric Core (conserved) are risky to mutate. Low-ACI positions (Safe Target) produce additive, predictable effects.

Zone Classification

OHM classifies each position into one of 4 zones based on ACI and conservation:

ZonenMeaning
Allosteric Core39High ACI + conserved → signal relay backbone
Allosteric Handle23High ACI + not conserved → tunable modulators
Safe Target97Low ACI → mutations are additive, low epistasis risk
Structural Essential94Conserved but low ACI → structural integrity

ACI vs DMS single-mutation fitness (per-mutation, n=1,228): Spearman=0.127, NDCG@10%=0.822. ACI has modest Spearman but high NDCG — it excels at identifying the top-10% beneficial positions, even if overall rank ordering is weaker.

Insight 1: Q217 is the ONLY position that is simultaneously an allosteric handle (ACI 84.9%) AND DMS-beneficial (+1.60). Handle positions are ideal engineering targets because they modulate the catalytic relay without being structurally essential.

Insight 2: A207 has the highest ACI (98.4%) among all beneficial mutations — it sits directly on the catalytic relay connecting Ser165 to Asp210. A207E can electrostatically tune the catalytic acid.

Insight 3: ICCG mutations F243I and N246D sit on Path 2 (substrate access channel). Their negative individual fitness is compensated by allosteric pathway optimization in combination — OHM explains why ICCG works despite deleterious singles.

Insight 4: Mutations on different allosteric paths (e.g. Path 1 + Path 3) have orthogonal effects on catalysis → predicted low antagonistic epistasis when combined.
09

OHM: Allosteric Pathways — Two Perspectives

Results

From Ser165 (Nucleophile) Outward

ACI radiates outward through a highly conserved, mutationally intolerant core:

PosAAACI%ConsFitnessInsight
164His99.61.00−0.18DO NOT TOUCH — relay backbone
168Gly99.21.00−0.13DO NOT TOUCH — relay backbone
167Gly98.11.00−0.67Oxyanion hole backbone — ABSOLUTELY IMMUTABLE
171Thr96.90.84+0.03Neutral — tolerable but no gain
169Gly95.01.00−0.42Path 3 junction — Q217P relay feeds through here
166Met93.81.00−0.47Conserved relay residue
170Gly92.20.97−0.47Relay backbone
163Gly91.91.00−0.46Relay backbone

Insight: The Ser165 relay core is entirely locked by conservation + negative fitness. Engineering must approach from the periphery (Path 3: Q217P → ... → Ser165).

From Asp210 (Acid) Outward

PosAAACI%ConsFitnessInsight
207Ala98.40.85+0.23★ A207E — HOTSPOT on relay!
209Ala96.10.46+0.04Low conservation — handle zone
213Ala94.61.00+0.13Synergistic (A213T+T230A Δ=+1.22)
212Val94.20.84−0.02Neutral
208Glu91.10.90−0.05Conserved, relay core
217Gln84.90.53+0.34★ Q217P — HANDLE + HOTSPOT

Insight: Position 207 is the only high-ACI position near Asp210 that is also a DMS hotspot. All other high-ACI neighbors are conserved + deleterious.

From His242 (Base) Outward

PosAAACI%ConsFitnessInsight
241Ser98.80.80−0.38Handle — substrate entry
244Ala97.30.75−0.10Handle zone
240Ala96.50.99−0.27Core relay, conserved
245Pro95.71.00+0.15Only mutable residue on channel
243Phe93.40.75−0.29ICCG F243I sits here
246Asn93.00.99−0.75Oxyanion hole — ICCG N246D
238Asp90.30.55−0.20ICCG D238C disulfide partner

Insight: The His242 channel is tightly optimized. ICCG mutated two positions here (F243I, N246D) — both individually harmful but combinatorially rescued.

10

Interactive 3D Structure

Interactive

Left-drag = rotate · Right-drag = pan · Scroll = zoom · Click residue = info popup. Use sidebar tabs within each viewer to select specific paths.

11

Tool 2: RINpy — Residue Interaction Network Analysis

Network

What RINpy Computes

RINpy (Residue Interaction Network in Python) builds a graph where each residue is a node and edges connect residues within 4.5 Å (non-bonded atomic contacts in PDB 4EB0). It then computes betweenness centrality (BC) — the fraction of all shortest paths in the network that pass through each residue.

High-BC residues are structural "hubs" — removing or modifying them disrupts the most communication pathways in the protein. Result: 258 nodes, 1,314 edges.

BC vs DMS Fitness

BC vs DMS single-mutation fitness (per-mutation, n=1,068): Spearman=+0.027 (near zero — BC alone barely predicts fitness), NDCG@10%=0.771. Hub residues tend to have lower fitness tolerance, but the relationship is weak at per-mutation level. BC contributes mainly through its inclusion in the best doubles combo.

MutationBC RankDegreeFitnessCategory
N140D#5/25814+1.84Beneficial Hub
S136Y#8/25816+1.43Beneficial Hub
W104L#104/2589+1.39Moderate
R286P#99/25810+1.33Moderate
Q217P#253/2586+1.60Low BC (Safe)
Insight 1: N140D is a major structural hub (BC rank #5/258). N140D sits on many shortest paths between residues — a true connector node with 14 direct contacts. Yet it is DMS-beneficial (+1.84), making it a rare "Beneficial Hub." This makes it a uniquely valuable engineering target: hub mutations usually disrupt function, but N140D defies that trend.

Insight 2: Q217P has near-zero betweenness centrality (BC rank #253/258). Q217P carries almost no network load — it sits at the periphery of the structural communication graph. This makes it exceptionally safe to mutate from a network perspective. Its allosteric effect (ACI 84.9%) operates through OHM's allosteric relay (Path 3), not through direct structural shortest paths.

Insight 3: High BC (N140D #5) + Low BC (Q217P #253) = complementary pairing. N140D anchors the structural network as a hub; Q217P operates through allosteric relay orthogonally. Combining a hub mutation with a peripheral allosteric handle is predicted to have lower antagonistic epistasis than two hub mutations. Source: rinpy_dms_merged.csv pdb_pos column (correct PDB node mapping).

Column Definitions

BC Rank: Betweenness Centrality rank out of 258 residues. #1 = most central hub (most shortest paths pass through it). Degree: Number of direct residue contacts within 4.5 Å — higher degree = more packed neighbors in the structure. Categories: Beneficial Hub = BC top-10% AND DMS fitness > +0.3 (rare: hub residues usually can't be mutated). Safe Peripheral = BC bottom-10%, very few structural contacts, safe to mutate without disrupting the fold. Moderate = BC in middle range, some structural role but not a critical hub.

12

Tool 3: ESM-2 Protein Language Model

PLM

What ESM-2 Computes

ESM-2 (650M parameters) is a transformer-based protein language model trained on ~250M protein sequences. We use masked marginal scoring: for each position, mask the residue, and compute log P(mutant|context) − log P(wildtype|context). A positive score means the PLM considers the mutation more "natural" in this sequence context.

Why Use a PLM?

Unlike simple conservation (MSA counting), ESM-2 captures context-dependent co-evolutionary patterns. It can detect that a mutation is acceptable in this specific protein even if the residue is conserved across the family — because the surrounding context compensates.

Calibration Against DMS

Spearman ρ = 0.242 · NDCG@10% = 0.817 (n=1,246 single mutations scored by ESM-2 3B)

ESM-2 is a weak predictor of micro-droplet fitness. This is expected: DMS fitness integrates activity + stability + expression, while ESM-2 primarily captures evolutionary plausibility. ESM-2 is one input signal, not a standalone predictor.

Insight 1: ESM-2 correctly identifies the catalytic triad as unmutable — Ser165, Asp210, His242 all have strongly negative ESM-2 scores for any substitution, consistent with DMS coldspot status.

Insight 2: ESM-2 flags Q217P as mildly positive (evolutionary context accepts Pro at this position), consistent with its low conservation (scorecons=0.531) and DMS hotspot status.

Insight 3: ESM-2 3B is the best single zero-shot predictor (Spearman=0.242, NDCG@10%=0.817), but combining it with ThermoMPNN and OHM ACI via rank averaging reaches Spearman=0.248, NDCG=0.833 (see benchmark slide). We also tested SaProt-650M (Sp=0.204, NDCG=0.809) and ProstT5 (Sp=0.179, NDCG=0.773) — both moderate, not in the best combination.
Updated assessment: With full 1,050-mutation evaluation, ESM-2 3B achieves Spearman=0.242, NDCG@10%=0.817. Combined with ThermoMPNN (stability) and OHM ACI (allostery), the rank-averaged score reaches Spearman=0.248, NDCG=0.833 — a meaningful improvement over any single tool.
12

Structural Criteria: Where Are the Top Mutations?

Structure
MutationFitnessSASAMin Dist†Structural ZoneLocation
N249H+1.84Surface6.6ÅSurfaceSurface, near C-terminus
N140D+1.84Surface12.3ÅSurfaceβ-sheet edge, RINpy BC major hub (#5/258)
A207E+1.83Buried3.0Å2nd ShellSecondary shell, allosteric core (3.0Å to binding pocket)
T192P+1.81Surface3.8Å2nd ShellSecondary shell, surface loop
L142M+1.79Buried14.9ÅCoreCore, buried
Q40P+1.76Surface26.0ÅSurfaceN-terminus, flexible
N44H+1.66Surface25.5ÅSurfaceN-terminus, flexible
N225K+1.65Surface12.3ÅSurfaceSurface loop
Y95F+1.62SurfaceBinding siteOxyanion hole (Tournier 2020) — direct substrate contact
Q217P+1.60Surface7.1ÅSurfaceAllosteric handle, surface-exposed

Pattern: 7/10 surface/distal; 2/10 (A207E, L142M) buried; 1/10 (Y95F) in binding site; 2/10 (A207E, T192P) in secondary shell. Binding site: 14 residues with direct PET/MHET contact from Tournier 2020 molecular docking (PDB 94, 95, 125, 127, 130, 164, 165, 166, 190, 210, 212, 242, 243, 246). Secondary shell: any residue with closest atom ≤5Å to any binding site residue (42 residues total, computed from PDB 4EB0). †Min Dist = minimum atom-to-atom distance to any binding site residue; "—" = residue IS in binding site.

Binding Site & Secondary Shell in Published Variants

ICCG Y127GIN BINDING SITE (aromatic clamp, direct substrate contact). ICCG/LCC-LANL F243IIN BINDING SITE (binding pocket). LCC-LANL M125IIN BINDING SITE (hydrophobic groove). WCCM T130MIN BINDING SITE. Key insight: ICCG and LCC-LANL primarily engineer the binding pocket itself, not just the secondary shell.

Position 217 — Allosteric Hub

23 beneficial mutations within 10Å of position 217, forming a cluster in the 192–222 region. Key neighbors: T192P (+1.81), A213S (+1.10), F222C (+1.02), P214L (+0.75).
4 synergistic doubles involving Q217: Q217R+I252T (Δ=+2.45), G170D+Q217H (Δ=+2.17), Q217K+A250S (Δ=+1.96).
Position 217 is surface-exposed (46% rSASA), polar, 18Å from active site — ideal for engineering without disrupting the catalytic machinery.

Double Mutation Pattern by Structure

Allosteric+Allosteric pairs: worst epistasis (mean Δ=−1.63). Secondary shell+shell: also bad (−1.55). Best pairs: other+other (−0.50) and close_to_active+close_to_active (~39% positive Δ). Structural proximity to active site may increase synergy potential.

13

Tool 4: MULTI-evolve — Can We Predict Combinations?

ML

What MULTI-evolve Is

MULTI-evolve (Arc Institute, Science 2026) trains a FCNN on measured single + double mutant fitness to predict multi-mutant combinations. We have no experimental doubles for the top-20 singles, so we tried using zero-shot pseudo-doubles instead.

Our Attempt: Zero-Shot Pseudo-Doubles

Step 1: Use best doubles zero-shot combo (ESM-2 650M + BLOSUM62 + SaProt + OHM ACI, Sp=0.186 with real doubles) to score 190 pairwise doubles of top-20 singles.
Step 2: Normalize scores to DMS fitness scale using linear mapping learned from 2,467 real doubles (DMS_fitness = 0.0044 × zs_score − 3.87, R²=0.03).
Step 3: Train FCNN on 21 real singles + 190 normalized pseudo-doubles = 211 training points.

Validation on Real k=3 DMS (n=1,650)

MethodSpNDCG
ESM-2 3B additive (no training)+0.1770.703
ZS combo additive (no training)+0.1660.706
DMS additive (no training)+0.1050.667
FCNN w/ ZS pseudo-doubles−0.0110.646
FCNN w/ DMS additive pseudo−0.0230.620

One-hot features cannot generalize

The FCNN uses 5,180-dim one-hot features (259 positions × 20 AAs). It only sees the 20 positions in the top-20 singles during training. For k=3 variants involving any other position, the model has zero learned weights — it outputs random predictions.

Only 36 of 1,650 k=3 validation variants had even 1 mutation in the top-20. On those 36, the zero-shot FCNN achieves Sp=+0.215 — but this is too few to be reliable.

Alternative 1: 14-dim Features on All Doubles

Replace one-hot with 16 zero-shot model scores per mutation (sum for doubles). Train Ridge/GBR on 1,787 real doubles:

Methodk=3 Spk=3 NDCG
Ridge (16-dim, all doubles)+0.1290.681
ESM-2 3B additive (no training)+0.1770.703

Alternative 2: Train on Real Top-50 Doubles

Use top-50 DMS doubles (fitness 2.1–3.9) as training data. 154 samples: 87 singles + 66 doubles + WT. 16-dim features, FCNN (128-64):

Methodk=3 All
(n=1650)
k=3 In-Training
Positions (n=59)
FCNN 16-dim (top doubles)Sp=+0.066Sp=+0.132, NDCG=0.728
ESM-2 3B additiveSp=+0.177Sp=+0.241, NDCG=0.710

Key finding: Within the trained position set, 16-dim FCNN achieves the best NDCG (0.728) — it captures some epistasis signal from real doubles. But it doesn't generalize to unseen positions.

Conclusion: Training on real doubles with 16-dim features captures epistasis within the training positions (NDCG=0.728 > ESM-2 3B's 0.710). But no model generalizes to all positions. For a full combinatorial prediction, Phase 2 experimental doubles covering more positions remain essential.
14

Cross-Tool Consensus — Which Positions Do All Tools Agree On?

Consensus

Consensus Criteria (5 Independent Signals)

For each of 259 positions, we count how many tools independently flag it as "interesting": (1) DMS fitness in top 25% · (2) ACI above median · (3) BC above median · (4) Appears in top-100 multi-mutants · (5) Low conservation (scorecons < median). Only 13 of 259 positions (5%) have 4+ tools agreeing — these are the consensus positions.

PDB PosStructural Zonen ToolsDMS Top25%ACI > medianBC > medianIn Top MultimutsLow Conserv.Candidate?
217Surface4/5★ Yes (Q217P)
286Surface4/5★ Yes (R286P)
1042nd Shell4/5★ Yes (W104L)
140Surface4/5★ Yes (N140D)
2072nd Shell3/5★ Yes (A207E)
197Surface5/5
203Surface5/5
1922nd Shell5/5
193Surface4/5
136Surface5/5
Q217P, R286P, W104L, N140D each reach 4/5 consensus. A207E reaches 3/5 (BC #132 below median, cons 0.845 above median). 4 of our 5 candidates meet ≥4/5 threshold. BC source: rinpy_dms_merged.csv pdb_pos column (correct mapping). Q217P BC #253/258 (below median — safe peripheral); N140D BC #5/258 (major hub). W104L and A207E are in 2nd Shell (≤5Å from binding site), not core. †R286P BC #99/258 (above median ✓).
15

Candidate Mutations Mapped onto LCC Structure

Structure

Orange = 5 candidate mutations. Red = catalytic triad. Grey = protein backbone. Drag to rotate, scroll to zoom.

3D viewer labels use PDB 4EB0 residue numbers (= DMS position + 34).

MutationViewer labelLocationStructural Role
Q217PGLN217Surface loopAllosteric handle — remote from active site, modulates via Path 3
W104LTRP104Core helixAllosteric core — buried, part of signal relay network
A207EALA207Active site adjacentOn Path 1 relay — directly influences catalytic Asp210
R286PARG286C-terminal loopStructural essential — Pro rigidifies C-terminus
N140DASN140β-sheet edgeRINpy BC #5/258 (major hub) — sits on many structural shortest paths; defies trend as a beneficial hub
Ser165SER165Active siteCatalytic nucleophile — DO NOT TOUCH
Asp210ASP210Active siteCatalytic acid — DO NOT TOUCH
His242HIS242Active siteCatalytic base — DO NOT TOUCH
Spatial logic: The 5 mutations are spread across the structure — not clustered in one region. Q217P is on the surface (~25 Å from active site), A207E is near the active site, N140D is on the β-sheet edge, W104L is in a core helix, R286P is at the C-terminus. This spatial distribution minimizes steric clash risk between mutations.
16

The Case for Q217P + W104L + A207E + R286P + N140D

Recommendation
MutationFitnessDeep Mining
Rank†
ACI %BC Centrality Rank
(of 258 positions)
ScoreconsOHM ZoneHotspot?In Top
Combos?
Consensus
Q217P+1.60#184.9%#2530.531handleYes4/5
W104L+1.39#278.7%#1041.000core4/5
A207E+1.83#398.4%#1320.845coreYes3/5
R286P+1.33#460.1%#990.990essentialYes3/5†
N140D+1.84#539.9%#50.713hub4/5

Why This Specific Combination?

  • Orthogonal allosteric paths: A207E (Path 1) + Q217P (Path 3) → different relay networks, predicted low antagonism
  • Hub + Low-BC mix: N140D (hub #5/258) + Q217P (peripheral #253/258) → N140D anchors the structural network, Q217P is safe to engineer via allosteric relay without structural network disruption
  • All 5 appear in real top combos in DMS — not just theoretically combined
  • 3 of 5 are DMS hotspots (Q217P, A207E, R286P) — positions where most mutations improve fitness
  • Epistasis +9.44 — additive prediction −2.47, observed +6.98; extreme positive epistasis drives this combination

Honest Risk Assessment

  • W104 conserved (1.0) and R286 conserved (0.99)higher risk of disrupting fold stability. Mitigation: ICCG also mutated conserved positions successfully.
  • No experimental double data     for these specific pairs — epistasis is predicted, not measured.
  • Antagonistic epistasis is common  (mean Δ = −0.70 in DMS) — even well-chosen combos may underperform additivity.

This is why we propose a phased approach with k=3 as a safer first test.

17

Experimental Recommendation — Phased Approach

Next Steps

Phase 1: Immediate — 5 Constructs (2 weeks)

  1. Q217P+A207E+N140D (k=3) — Safest bet: 3 different allosteric paths, 2 hotspots, 1 hub
  2. Q217P+W104L+A207E (k=3) — Pure allosteric: handle + core + core
  3. Q217P+W104L+A207E+R286P+N140D (k=5) — Full top-1 candidate
  4. N249H+N140D+A207E+T192P (k=4) — Top-4 by pure single fitness; N140D is a major structural hub (BC #5/258)
  5. T121M+Y127N+A183V+F196S+A281T (k=5) — Positive control: this exact combination already measured at +6.98 in DMS, serves as experimental benchmark to validate assay conditions

Phase 2: Pairwise Doubles — 105 Constructs (4-6 weeks)

Select 15 top mutations → synthesize C(15,2)=105 pairwise doubles. Measure in micro-droplet or plate assay. Unlocks real epistasis data for MULTI-evolve retraining.

Phase 3: MULTI-evolve R2 — ~50 Constructs (2-4 weeks)

Retrain FCNN on 105 real doubles + 15 singles. Predict k=3..10 with epistasis-aware model. Expected: identify variants exceeding ICCG (Tm > 84°C, >90% PET in 10h).

Phase 1: 2 weeks → Phase 2: 4-6 weeks → Phase 3: 2-4 weeks · Total: ~3 months to optimized LCC variant
18

DMS Observation: The Epistasis Paradox in LCC

Critical Finding

What We Computed

For 2,467 double mutants where both singles are measured: Δ = fobserved(AB) − [f(A) + f(B)]. Positive Δ = synergy (better than expected). Negative Δ = antagonism (worse than expected). We then grouped pairs by whether each single is beneficial (>+0.3), neutral, or deleterious (<−0.3).

Pair TypenMean Δ% Synergy% AntagonismMean Observed
Beneficial + Beneficial192−1.72412.5%87.5%−0.425
Beneficial + Neutral606−1.27619.3%80.7%−0.610
Beneficial + Deleterious424−0.73731.4%68.6%−0.826
Neutral + Neutral429−0.65333.8%66.2%−0.635
Deleterious + Neutral589−0.24545.5%54.5%−0.963
Deleterious + Deleterious227+0.49663.4%36.6%−0.995

The Paradox

Combining two beneficial mutations is the worst strategy. 87.5% of beneficial+beneficial pairs show antagonism (mean Δ = −1.72). The average observed fitness of two beneficial mutations combined is −0.425 — worse than wild-type, despite an additive prediction of +1.30.

Conversely, two deleterious mutations combined synergize 63.4% of the time (mean Δ = +0.50). This explains why the best k=5 in DMS (T121M+Y127N+A183V+F196S+A281T = +6.98) uses 5 individually deleterious mutations.

What This Means for Our Strategy

  • Naively combining top-fitness singles is almost guaranteed to fail (87.5% antagonism rate)
  • The best existing DMS combo exploits compensatory epistasis among deleterious mutations — a strategy we cannot replicate computationally
  • No zero-shot tool (OHM ACI, RINpy BC, ESM-2, ThermoMPNN) can predict which pairs will synergize (all Sp ≈ 0 with Δ)
  • Phase 2 (measuring 105 pairwise doubles) is not optional — it's the only way to identify synergistic pairs for rational combination design
Honest conclusion: Any k≥3 candidate we propose based on additive singles fitness has an ~87% probability of antagonism per pair. We should present candidates with this caveat and prioritize Phase 2 experiments.
19

Zero-Shot Scoring Benchmark — 16 Models × 65,535 Combinations

Benchmark

Method

We tested 16 zero-shot scoring functions (no DMS data used for training), including the recently published FAMPNN (Full-Atom MPNN, ICML 2025). Each scores every single mutation independently. We then exhaustively searched all 216−1 = 65,535 subsets, combining scores via rank averaging (convert each model's scores to ranks, then average). Evaluated by Spearman ρ and NDCG@10% against DMS single-mutation fitness (n=1,228 mutations with all 16 scores available).

#ModelTypeSpearmanNDCG@10%
1ESM-2 3BPLM+0.2420.817
2ESM-2 650MPLM+0.2150.789
3SaProt 650MStructure-PLM+0.2040.809
4ESM-1vPLM+0.1910.789
5ESM-2 150MPLM+0.1860.764
6ThermoMPNNddG/Structure+0.1790.803
7MSA log-oddsEvolution+0.1620.807
8ProstT5Structure-PLM+0.1790.773
9ConservationMSA/Scorecons+0.1320.793
10OHM ACIAllostery+0.1270.822
11MSA mut-freqEvolution+0.1320.819
12BLOSUM62Substitution+0.1230.807
13ProFAMAutoregressive PLM+0.1540.816
14FAMPNNFull-Atom Design+0.1150.800
15ProteinMPNNStructure+0.1400.803
16RINpy BCNetwork+0.0270.771
SizeBest CombinationSpearmanNDCG@10%
1ESM-2 3B+0.2420.817
2ESM-2 3B + OHM ACI+0.2480.811
3ESM-2 3B + ThermoMPNN + OHM ACI+0.2480.833
4+ ESM-2 650M+0.2540.808
5+ SaProt + Conservation+0.2520.799
6+ BLOSUM62+0.2490.801
...Spearman decreases monotonically as more features are added
15All 15 features+0.2130.793
SizeBest CombinationSpearmanNDCG@10%
3ThermoMPNN + FAMPNN + OHM ACI+0.2000.823
5ThermoMPNN + BLOSUM62 + FAMPNN + OHM ACI + RINpy+0.2000.826
Best Spearman (overall ranking): ESM-2 3B + ThermoMPNN + OHM ACI (Sp=0.248, NDCG=0.833) — unchanged after adding FAMPNN.

Best NDCG (finding top mutations): ThermoMPNN + BLOSUM62 + FAMPNN + OHM ACI + RINpy BC (Sp=0.200, NDCG=0.826) — FAMPNN improves top-mutation identification. Note: ESM-2 3B is NOT in the best NDCG combo — structure-based models dominate here.

Why these 3 (Spearman)? ESM-2 3B = evolutionary plausibility, ThermoMPNN = stability, OHM ACI = allostery. Three orthogonal signals.
Why FAMPNN helps NDCG? FAMPNN is a full-atom design model — it excels at identifying the best mutations (top-10%) but doesn't distinguish well among average ones.

65,535 subsets exhaustively searched — these are global optima, not cherry-picked.
20

Double-Mutation Benchmark — Additive Zero-Shot Prediction

Doubles

Method

For each double mutant A+B with both singles measured (n=1,787), predict fitness as score(A)+score(B). Same 16 zero-shot features (including FAMPNN), exhaustive subset search (32,767 combos). Also tested with DMS additive f(A)+f(B) included as a 16th feature.

#ModelSpearmanNDCG@10%
1ESM-2 650M+0.1590.674
2ESM-2 3B+0.1540.669
3ESM-1v+0.1410.677
4SaProt+0.1400.664
5ProstT5+0.1370.668
6ESM-2 150M+0.1310.668
DMS additive f(A)+f(B)+0.1270.680
7MSA log-odds+0.1280.663
8OHM ACI+0.1030.678
9ThermoMPNN+0.0660.676

Note: ESM-2 650M beats 3B for doubles. ThermoMPNN drops significantly (stability ≠ combinatorial fitness). All PLM additive scores outperform DMS additive f(A)+f(B).

CategoryBest CombinationSpearmanNDCG
Best Sp (zero-shot)ESM2-650M + BLOSUM62 + SaProt + OHM ACI+0.1860.676
Best NDCG (zero-shot)ESM2-3B + 650M + ProtMPNN + MSA-lo + MSA-mf + Cons+0.1450.696
Best Sp (+DMS add)ESM2-650M + BLOSUM62 + OHM ACI + DMS_add+0.2020.674
Best NDCG (+DMS add)ProstT5 + Cons + OHM ACI + RINpy + DMS_add+0.1660.703
Key findings for doubles:
1. Different best model: ESM-2 650M (not 3B) wins for doubles. Larger models may overfit to single-position context.

2. Different best combo: ESM2-650M + BLOSUM62 + SaProt + OHM ACI (Sp=0.186) — SaProt and BLOSUM62 enter the top combo for doubles but not singles. BLOSUM62's substitution matrix may capture pairwise compatibility.

3. OHM ACI appears in nearly all top combos for both singles and doubles — allosteric information is consistently valuable.

4. All methods are still weak (Sp < 0.20) — epistasis (mean Δ=−0.70) makes double-mutation fitness fundamentally hard to predict from single-mutation scores alone. Phase 2 experimental doubles remain essential.
21

Summary: Multi-Signal Evidence at a Glance

Summary
ToolWhat It MeasuresSp / NDCGKey Finding for Our Candidate
DMS FitnessDirect experimental activity+stability+expression— / —All 5 mutations are beneficial singles (+1.33 to +1.84)
OHM ACIAllosteric communication to active site0.127 / 0.822A207E on Path 1 (ACI 98.4%), Q217P on Path 3 (handle). Orthogonal paths
RINpy BCStructural hub identification (betweenness centrality)0.027 / 0.771N140D = structural hub #5/258 (rare beneficial hub). Q217P = low BC #253/258 (safe peripheral, allosteric effect via OHM relay not structural network)
ESM-2 3BEvolutionary constraint (masked marginal, 1246 singles)0.242 / 0.817Best single zero-shot predictor. Correctly flags catalytic triad
SaProt 650MStructure-aware PLM (AA + 3Di tokens)0.204 / 0.809Adds structure signal; enters best doubles combo but not singles
ProstT5Structure-aware PLM (3Di conditional LLR)0.179 / 0.773Moderate; enters best doubles NDCG combo
ThermoMPNNStability prediction (ddG from PDB)0.179 / 0.803Captures stability — orthogonal to PLM. In best singles combo
ProFAMAutoregressive protein family LM (251M params)0.154 / 0.816Moderate Spearman, good NDCG. Family-specific autoregressive model.
FAMPNNFull-atom protein design (ICML 2025)0.115 / 0.800Low Spearman but high NDCG — best at finding top-10%. In best NDCG combo.
ConservationEvolutionary constraint (150 orthologs, Scorecons)0.132 / 0.793Q217P & N140D below median (safe). W104L & R286P conserved — like ICCG
MULTI-evolveMulti-mutant fitness prediction (FCNN)— / —Converges on same top positions. Needs real doubles to unlock
Epistasis (DMS obs.)Observed non-additive interactions from 2,467 doubles (not a predictor)— / —Mean Δ=−0.70. Core×core pairs synergize. Data-derived, not zero-shot.
Best Singles ComboESM-2 3B + ThermoMPNN + OHM ACI (rank avg)0.248 / 0.83365,535 subsets searched. 3 orthogonal signals: evolution + stability + allostery
Best Doubles ComboESM2-650M + BLOSUM62 + SaProt + OHM ACI (rank avg)0.186 / 0.676Different optimal combo for doubles. OHM ACI appears in both.
The multi-signal approach works: Rank averaging ESM-2 3B + ThermoMPNN + OHM ACI (Spearman=0.248, NDCG=0.833) outperforms any single tool. 65,535 subsets exhaustively searched. Our candidate is supported by convergent evidence from DMS fitness, allosteric pathways, network topology, and this zero-shot scoring benchmark.
22

Conclusions

0.248
Spearman · NDCG 0.833
Best 3-tool rank avg (65,535 subsets)
4/5
Candidate positions with
4+ tool consensus
+9.44
Positive epistasis
(combo +6.98 vs additive −2.47)
3
Allosteric paths
covered by candidate
  1. LCC is unusually mutationally tolerant (30% beneficial rate), with the best measured 5-mutant reaching +6.98 (3.8× best single)
  2. OHM reveals 3 distinct allosteric paths to the active site; combining mutations from different paths provides orthogonal effects with predicted low epistasis
  3. RINpy identifies rare "beneficial hub" mutations (N140D, S136Y) — positions where network restructuring improves function
  4. Rank averaging ESM-2 3B + ThermoMPNN + OHM ACI (Sp=0.248, NDCG=0.833) outperforms any single tool — validated by exhaustive search of 65,535 subsets of 16 scoring functions. For doubles: ESM2-650M + BLOSUM62 + SaProt + OHM ACI (Sp=0.186, NDCG=0.676)
  1. 4/5 candidate positions have 4+ tool consensus — only 5% of all positions achieve this level of multi-signal agreement
  2. ICCG proves the principle: individually deleterious mutations can be combinatorially powerful in LCC — our candidates build on this with more data-driven evidence
  3. MULTI-evolve is limited by lack of experimental doubles — Phase 2 (105 doubles) will unlock its full predictive power
  4. Phase 1 (5 constructs, 2 weeks) is a low-cost, high-information experiment that tests our computational predictions directly
Q217P + W104L + A207E + R286P + N140D — the highest-confidence multi-mutant candidate from 5 computational tools and 8,179 DMS variants.
Saved. Next saves will go to the same file automatically.