Raw training arrays are stored as Git LFS archive assets under release/, not
as regular Git blobs. See DATA_RELEASES.md for the packaged datasets and
checksums. Custom training data should use the same paired full-grid SAD/ELF
NumPy convention.
Required files:
<stem>_sad.npy
<stem>_elf.npy
Optional files:
<stem>_sym.npy
The current full-grid ELFPredictor ignores _sym.npy files.
Each *_sad.npy and *_elf.npy pair must have identical shape for a given
structure. The loader yields complete unit-cell grids shaped:
sad: (1, D, H, W)
elf: (1, D, H, W)
There is no patch extraction. Shape-bucketed training groups samples by exact grid shape.
The SAD input is a project-defined superposed atomic density:
- neutral spherical atomic density tables are loaded from packaged
.pklfiles; - each atom contributes a periodic neutral density centered at its fractional position;
- the grid is normalized to the configured total valence electron count;
- the density is scaled into the ELFNet training convention.
This SAD is not VASP's internal ICHARG=2 density.
The target ELF grid is parsed from VASP ELFCAR volumetric output and stored as a
float32 NumPy array. Values are expected to be ELF-like and typically lie in the
range [0, 1].
elfnet-train /path/to/paired_arrays \
--epochs 100 \
--batch 32 \
--batching shape \
--val-frac 0.05 \
--lambda-cdf 0.05The trainer expects all required *_sad.npy and *_elf.npy files to be in the
same directory.