You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lambench/metrics/results/README.md
+13-9Lines changed: 13 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Overview
2
2
3
-
Large atomic models (LAM), also known as machine learning interatomic potentials (MLIPs), are considered foundation models that predict atomic interactions across diverse systems using data-driven approaches. LAMBench is a benchmark designed to evaluate the performance of such models. It provides a comprehensive suite of tests and metrics to help developers and researchers understand the accuracy and generalizability of their machine learning models.
3
+
Large atomistic models (LAM), also known as machine learning interatomic potentials (MLIPs), are considered foundation models that predict atomic interactions across diverse systems using data-driven approaches. **LAMBench** is a benchmark designed to evaluate the performance of such models. It provides a comprehensive suite of tests and metrics to help developers and researchers understand the accuracy and generalizability of their machine learning models.
4
4
5
5
## Our mission includes
6
6
@@ -26,7 +26,7 @@ Figure 1: Generalizability on force field prediction tasks, 1 - $\bar{M}^m_{FF}$
26
26
<!-- scatter plot -->
27
27
Figure 2: Accuracy-Efficiency Trade-off, $\bar{M}^m_{FF}$ vs $M_E^m$.
28
28
29
-
# LAMBench Metrics Calculations
29
+
# LAMBench Metrics Calculation
30
30
31
31
## Generalizability
32
32
@@ -81,12 +81,16 @@ In contrast, an ideal model that perfectly matches Density Functional Theory (DF
81
81
82
82
### Domain Specific Property Calculation
83
83
84
-
For the domain-specific property tasks, we adopt the MAE as the error metric.
85
-
In the Inorganic Materials domain, the MDR phonon benchmark predicts maximum phonon
86
-
frequency, entropy, free energy, and heat capacity at constant volume, with each prediction type assigned a weight of 0.25.
87
-
In the Molecules domain, the TorsionNet500 benchmark predicts the torsion profile energy, torsion barrier height, and the number of molecules for which the model's prediction of the torsional barrier height has an error exceeding 1 kcal/mol.
88
-
Each prediction type in this domain is assigned a weight of $\frac{1}{3}$.
89
-
The resulting score is denoted as $\bar M^{m}_{PC}$.
84
+
For the domain-specific property calculation tasks, we adopt the MAE as the primary error metric.
85
+
86
+
In the Inorganic Materials domain, the MDR phonon benchmark predicts the maximum phonon frequency, entropy, free energy, and heat capacity at constant volume, while the elasticity benchmark evaluates the shear and bulk moduli. Each prediction type
87
+
is assigned an equal weight of $\frac{1}{6}$.
88
+
89
+
In the Molecules domain, the TorsionNet500 benchmark evaluates the torsion profile energy, torsional barrier height, and the number of molecules for which the predicted torsional barrier height error exceeds 1 kcal/mol. The Wiggle150 benchmark assesses the relative conformer energy profile. Each prediction type in this domain is assigned a weight of 0.25.
90
+
91
+
In the Catalysis domain, the OC20NEB-OOD benchmark evaluates the energy barrier, reaction energy change (delta energy), and the percentage of reactions with predicted energy barrier errors exceeding 0.1 eV for three reaction types: transfer, dissociation, and desorption. Each prediction type in this domain is assigned a weight of 0.2.
92
+
93
+
The resulting error metric after averaging over all domains is denoted as $\bar M^{m}_{PC}$.
90
94
91
95
## Applicability
92
96
@@ -122,4 +126,4 @@ The final instability metric is computed as the average over all nine structures
0 commit comments