Matrix-prediction models trained on ground-state targets decelerate the SCF solver when extrapolating beyond their training distribution. SAIL fixes this by backpropagating through the SCF solver itself — supervising the trajectory toward self-consistency rather than fitting a converged matrix. The result: reliable acceleration 10× past the training distribution, on hybrid functionals, with no labels required.
Both approaches use the same backbone — an SE(3)-equivariant GNN reading the molecular point cloud. They differ only in what they minimize. The matrix surrogate loss fits a converged target tensor; SAIL minimizes the energy gradient along the SCF trajectory. The surrogate target tells you where to land; the trajectory loss tells you how the solver gets there.
Train the ML head to reproduce the SCF-converged matrix on labeled data. The loss is small at the answer — but it weighs all matrix entries uniformly, including occupied–occupied and virtual–virtual rotations that don't affect the energy or the solver dynamics.
Ground-state surrogate: small error at the converged solution, but poor extrapolation when molecules leave the training hull.
Run T SCF cycles end-to-end from the predicted P⁽⁰⁾. At each cycle compute the RMS energy gradient on the Grassmannian; sum, backpropagate through DIIS and the eigendecomposition into the GNN. Label-free — only the geometry is needed.
Trajectory loss: penalizes non-stationarity each SCF cycle—label-free except for geometry.
Every model below was trained on the same QM9 split (molecules with ≤ 20 atoms) and evaluated across the full range up to 90 heavy atoms — 10× larger than anything seen during training. We summarize solver cost with ERIC (Effective Relative Iteration Count), defined as in the paper: ERIC < 1 means speedup over the standard MINAO guess; ERIC > 1 means the ML initialization slows the solver down. Click legend items to toggle series. Shaded vertical bands mark heavy-atom bins shared with the linked bar charts in this figure—hover any chart to emphasize the same bin.
B3LYP / def2-SVP · NVIDIA A100 · GPU4PySCF. ERIC is a proxy. Below we measure actual SCF wall-time on the GPU, including the model forward pass and initial-guess construction. The speedup grows roughly linearly with molecular size on QMugs and does not degrade 10× beyond training — confirming that the ERIC reductions translate into hybrid-functional wall-time on drug-like molecules.
Prior work often reports acceleration using the relative iteration count (RIC): the number of SCF iterations starting from a learned guess, divided by the iterations needed from an established non-ML reference initialization Pref(0) (for example MINAO). RIC ignores hidden work in the initialization—such as extra Fock constructions inside Δ-learning or coefficient pipelines—so it need not match wall-time speedup.
The effective relative iteration count (ERIC) is the paper’s correction: it counts every Fock build from initialization through convergence, including those performed only to obtain P(0). On hybrids, where each iteration is dominated by the Fock update, ERIC tracks total Fock work more faithfully than raw iteration counts.
RIC — iteration ratio only (Eq. 6).
ERIC — ratio of total Fock builds, including initialization (Eq. 7). Values below 1 mean fewer Fock builds than the reference; above 1 mean the ML pipeline costs more in Fock-equivalent units.
The same failure mode reappears across all three functional classes — baselines diverge on Hamiltonian and density-matrix ansätze, while the coefficient model stays flat but cannot fall below ~0.85 on SCAN and B3LYP because it has to compensate for fixed Fock-build substitutions. SAIL flattens every curve, regardless of functional or ansatz.
Matrix-based ansätze tie their predictions to a specific AO basis and must be retrained per target. Coefficient models predict in a fixed auxiliary basis and apply unchanged to any compatible AO basis. Train on def2-SVP, evaluate on def2-QZVP — and SAIL still beats the surrogate-trained baseline.