Solver-Aligned Initialization Learning · Preprint

An ML initial guess that
doesn't break on big molecules.

Matrix-prediction models trained on ground-state targets decelerate the SCF solver when extrapolating beyond their training distribution. SAIL fixes this by backpropagating through the SCF solver itself — supervising the trajectory toward self-consistency rather than fitting a converged matrix. The result: reliable acceleration 10× past the training distribution, on hybrid functionals, with no labels required.

Eberhard, Kotsev,
Güthle, Günnemann
TUM / MDSI / MCML
PBE / def2-SVP
37%
ERIC reduction on QM40
SCAN / def2-SVP
33%
ERIC reduction on QM40
B3LYP / def2-SVP
28%
ERIC reduction on QM40
B3LYP wall-time · QMugs
1.35×
Wall-Time Speedup on QMugs

Two losses, opposite outcomes

Both approaches use the same backbone — an SE(3)-equivariant GNN reading the molecular point cloud. They differ only in what they minimize. The matrix surrogate loss fits a converged target tensor; SAIL minimizes the energy gradient along the SCF trajectory. The surrogate target tells you where to land; the trajectory loss tells you how the solver gets there.

Baseline · ground-state supervision
Fit the converged tensor.

Train the ML head to reproduce the SCF-converged matrix on labeled data. The loss is small at the answer — but it weighs all matrix entries uniformly, including occupied–occupied and virtual–virtual rotations that don't affect the energy or the solver dynamics.

Ground-state surrogate: small error at the converged solution, but poor extrapolation when molecules leave the training hull.

SAIL · trajectory supervision
Differentiate through the solver.

Run T SCF cycles end-to-end from the predicted P⁽⁰⁾. At each cycle compute the RMS energy gradient on the Grassmannian; sum, backpropagate through DIIS and the eigendecomposition into the GNN. Label-free — only the geometry is needed.

Trajectory loss: penalizes non-stationarity each SCF cycle—label-free except for geometry.

Size extrapolation, B3LYP / def2-SVP

Every model below was trained on the same QM9 split (molecules with ≤ 20 atoms) and evaluated across the full range up to 90 heavy atoms — 10× larger than anything seen during training. We summarize solver cost with ERIC (Effective Relative Iteration Count), defined as in the paper: ERIC < 1 means speedup over the standard MINAO guess; ERIC > 1 means the ML initialization slows the solver down. Click legend items to toggle series. Shaded vertical bands mark heavy-atom bins shared with the linked bar charts in this figure—hover any chart to emphasize the same bin.

B3LYP / def2-SVP · NVIDIA A100 · GPU4PySCF. ERIC is a proxy. Below we measure actual SCF wall-time on the GPU, including the model forward pass and initial-guess construction. The speedup grows roughly linearly with molecular size on QMugs and does not degrade 10× beyond training — confirming that the ERIC reductions translate into hybrid-functional wall-time on drug-like molecules.

Effective Relative Iteration Count (ERIC)
Solid lines: SAIL. Dotted lines: baseline (ground-state supervised). Bands: ±1 SE across ≥30 molecules per bin. Default axis: logarithmic (as in the paper). Drag the circular handle on the left axis to split the mapping at ERIC = 1 — linear stretch for speedups below 1, logarithmic for slowdowns above 1. Double-click the handle to restore the default.
SCF wall-time by molecule size
Hover bars for breakdown. Numbers above show ratio of baseline to best ML method (density P). Vertical time scales align with the breakdown panel (70 s and 7 s span the same height).
Initial-guess cost breakdown
Per-component time on QMugs. The hidden Fock build MP→F is why ERIC matters.

Why ERIC and not RIC?

Prior work often reports acceleration using the relative iteration count (RIC): the number of SCF iterations starting from a learned guess, divided by the iterations needed from an established non-ML reference initialization Pref(0) (for example MINAO). RIC ignores hidden work in the initialization—such as extra Fock constructions inside Δ-learning or coefficient pipelines—so it need not match wall-time speedup.

The effective relative iteration count (ERIC) is the paper’s correction: it counts every Fock build from initialization through convergence, including those performed only to obtain P(0). On hybrids, where each iteration is dominated by the Fock update, ERIC tracks total Fock work more faithfully than raw iteration counts.

RIC — iteration ratio only (Eq. 6).

ERIC — ratio of total Fock builds, including initialization (Eq. 7). Values below 1 mean fewer Fock builds than the reference; above 1 mean the ML pipeline costs more in Fock-equivalent units.

Three rungs of Jacob's ladder

The same failure mode reappears across all three functional classes — baselines diverge on Hamiltonian and density-matrix ansätze, while the coefficient model stays flat but cannot fall below ~0.85 on SCAN and B3LYP because it has to compensate for fixed Fock-build substitutions. SAIL flattens every curve, regardless of functional or ansatz.

Across basis sets, without retraining

Matrix-based ansätze tie their predictions to a specific AO basis and must be retrained per target. Coefficient models predict in a fixed auxiliary basis and apply unchanged to any compatible AO basis. Train on def2-SVP, evaluate on def2-QZVP — and SAIL still beats the surrogate-trained baseline.

Target basis set
RIC baseline
RIC SAIL
ERIC baseline
ERIC SAIL
SAIL improvement
def2-SVP ID
0.83
0.67
0.91
0.75
−16%
def2-TZVP OOD
0.82
0.74
0.90
0.82
−8%
def2-TZVPPD OOD
0.82
0.74
0.90
0.82
−8%
def2-QZVP OOD
0.82
0.74
0.90
0.82
−8%