Solver-Aligned Initialization Learning · Preprint

An ML initial guess that
doesn't break on big molecules.

Paper (arXiv)

Matrix-prediction models trained on ground-state targets decelerate the SCF solver when extrapolating beyond their training distribution. SAIL fixes this by backpropagating through the SCF solver itself — supervising the trajectory toward self-consistency rather than fitting a converged matrix. The result: reliable acceleration 10× past the training distribution, on hybrid functionals, with no labels required.

Eberhard, Kotsev,

Güthle, Günnemann

TUM / MDSI / MCML

PBE / def2-SVP

37%

ERIC reduction on QM40

SCAN / def2-SVP

33%

ERIC reduction on QM40

B3LYP / def2-SVP

28%

ERIC reduction on QM40

B3LYP wall-time · QMugs

1.35×

Wall-Time Speedup on QMugs

Two losses, opposite outcomes

Supervision matters

Both approaches use the same backbone — an SE(3)-equivariant GNN reading the molecular point cloud. They differ only in what they minimize. The matrix surrogate loss fits a converged target tensor; SAIL minimizes the energy gradient along the SCF trajectory. The surrogate target tells you where to land; the trajectory loss tells you how the solver gets there.

Baseline · ground-state supervision

Fit the converged tensor.

Train the ML head to reproduce the SCF-converged matrix on labeled data. The loss is small at the answer — but it weighs all matrix entries uniformly, including occupied–occupied and virtual–virtual rotations that don't affect the energy or the solver dynamics.

Ground-state surrogate: small error at the converged solution, but poor extrapolation when molecules leave the training hull.

SAIL · trajectory supervision

Differentiate through the solver.

Run T SCF cycles end-to-end from the predicted P⁽⁰⁾. At each cycle compute the RMS energy gradient on the Grassmannian; sum, backpropagate through DIIS and the eigendecomposition into the GNN. Label-free — only the geometry is needed.

Trajectory loss: penalizes non-stationarity each SCF cycle—label-free except for geometry.

Size extrapolation, B3LYP / def2-SVP

QM9 → QM40 → QMugs · trained on ≤ 20 atoms

Every model below was trained on the same QM9 split (molecules with ≤ 20 atoms) and evaluated across the full range up to 90 heavy atoms — 10× larger than anything seen during training. We summarize solver cost with ERIC (Effective Relative Iteration Count), defined as in the paper: ERIC < 1 means speedup over the standard MINAO guess; ERIC > 1 means the ML initialization slows the solver down. Click legend items to toggle series. Shaded vertical bands mark heavy-atom bins shared with the linked bar charts in this figure—hover any chart to emphasize the same bin.

B3LYP / def2-SVP · NVIDIA A100 · GPU4PySCF. ERIC is a proxy. Below we measure actual SCF wall-time on the GPU, including the model forward pass and initial-guess construction. The speedup grows roughly linearly with molecular size on QMugs and does not degrade 10× beyond training — confirming that the ERIC reductions translate into hybrid-functional wall-time on drug-like molecules.

Effective Relative Iteration Count (ERIC)

Solid lines: SAIL. Dotted lines: baseline (ground-state supervised). Bands: ±1 SE across ≥30 molecules per bin. Default axis: logarithmic (as in the paper). Drag the circular handle on the left axis to split the mapping at ERIC = 1 — linear stretch for speedups below 1, logarithmic for slowdowns above 1. Double-click the handle to restore the default.

SCF wall-time by molecule size

Hover bars for breakdown. Numbers above show ratio of baseline to best ML method (density P). Vertical time scales align with the breakdown panel (70 s and 7 s span the same height).

Initial-guess cost breakdown

Per-component time on QMugs. The hidden Fock build M_P→F is why ERIC matters.

Why ERIC and not RIC?

Eqs. (6)–(7) · preprint

Prior work often reports acceleration using the relative iteration count (RIC): the number of SCF iterations starting from a learned guess, divided by the iterations needed from an established non-ML reference initialization P_ref⁽⁰⁾ (for example MINAO). RIC ignores hidden work in the initialization—such as extra Fock constructions inside Δ-learning or coefficient pipelines—so it need not match wall-time speedup.

The effective relative iteration count (ERIC) is the paper’s correction: it counts every Fock build from initialization through convergence, including those performed only to obtain P⁽⁰⁾. On hybrids, where each iteration is dominated by the Fock update, ERIC tracks total Fock work more faithfully than raw iteration counts.

RIC — iteration ratio only (Eq. 6).

ERIC — ratio of total Fock builds, including initialization (Eq. 7). Values below 1 mean fewer Fock builds than the reference; above 1 mean the ML pipeline costs more in Fock-equivalent units.

Three rungs of Jacob's ladder

GGA · meta-GGA · hybrid

The same failure mode reappears across all three functional classes — baselines diverge on Hamiltonian and density-matrix ansätze, while the coefficient model stays flat but cannot fall below ~0.85 on SCAN and B3LYP because it has to compensate for fixed Fock-build substitutions. SAIL flattens every curve, regardless of functional or ansatz.

Across basis sets, without retraining

Coefficient ansatz · B3LYP · QM40

Matrix-based ansätze tie their predictions to a specific AO basis and must be retrained per target. Coefficient models predict in a fixed auxiliary basis and apply unchanged to any compatible AO basis. Train on def2-SVP, evaluate on def2-QZVP — and SAIL still beats the surrogate-trained baseline.

Target basis set

RIC baseline

RIC SAIL

ERIC baseline

ERIC SAIL

SAIL improvement

def2-SVP ID

0.83

0.67

0.91

0.75

−16%

def2-TZVP OOD

0.82

0.74

0.90

0.82

−8%

def2-TZVPPD OOD

0.82

0.74

0.90

0.82

−8%

def2-QZVP OOD

0.82

0.74

0.90

0.82

−8%