Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Aug 29, 2025

This PR implements support for the FeNNol format, enabling dpdata users to export both single LabeledSystem and multiple systems via MultiSystems to FeNNol's pickle format for machine learning training.

Overview

FeNNol is a machine learning framework that requires data in a specific pickle format with training/validation splits. This implementation adds a new format plugin that converts dpdata systems to the required structure, with support for combining multiple different systems into a single training file.

Key Features

  • Format Registration: Adds fennol format to dpdata's format registry
  • Single System Export: Enables system.to("fennol", "output.pkl") for any LabeledSystem
  • MultiSystems Support: Enables multi_systems.to("fennol", "output.pkl") to combine different systems into the same file
  • Proper Data Structure: Generates pickle files with the required structure:
    {
        'training': [...],      # List of training structures
        'validation': [...],    # List of validation structures  
        'description': '...'    # Metadata description
    }
  • Required Fields: Each structure contains the FeNNol-required fields:
    • species: List of atomic species/elements
    • coordinates: Atomic positions in Å
    • formation_energy: Energy in kcal/mol
    • shifted_energy: Energy in kcal/mol (same as formation_energy)
    • forces: Atomic forces in kcal/mol/Å
    • system_name: Name of the originating system (for MultiSystems tracking)

Unit Conversions

The plugin automatically handles unit conversions to match FeNNol's expected units:

  • Energy: eV → kcal/mol (factor: ~23.06)
  • Forces: eV/Å → kcal/mol/Å (factor: ~23.06)
  • Coordinates: Å → Å (no conversion needed)

Usage Examples

import dpdata

# Single system export
ls = dpdata.LabeledSystem("OUTCAR", fmt="vasp/outcar")
ls.to("fennol", "data.pkl")

# Multiple systems combined into single file
ls1 = dpdata.LabeledSystem("system1/OUTCAR", fmt="vasp/outcar")
ls2 = dpdata.LabeledSystem("system2/OUTCAR", fmt="vasp/outcar")
ms = dpdata.MultiSystems(ls1, ls2)
ms.to("fennol", "combined_data.pkl")

# Custom training/validation split
ms.to("fennol", "data.pkl", train_size=0.9)

Testing

Comprehensive test suite includes:

  • Basic export functionality with synthetic data
  • Custom training/validation split ratios
  • Edge cases (single frame, all training data)
  • MultiSystems export combining multiple different systems
  • Unit conversion verification
  • Integration testing with real system data

All tests pass and the implementation follows project linting standards.

Fixes #876.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Add support for exporting dpdata LabeledSystem to FeNNol format (.pkl files).
Includes comprehensive unit tests and proper unit conversions (eV to kcal/mol).

Co-authored-by: njzjz <[email protected]>
@Copilot Copilot AI changed the title [WIP] [Feature Request] support FeNNol feat(fennol): add FeNNol format support for exporting LabeledSystem data Aug 29, 2025
@Copilot Copilot AI requested a review from njzjz August 29, 2025 10:19
Copilot finished work on behalf of njzjz August 29, 2025 10:19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Please also support to_multi_systems that writes different systems into the same file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to_multi_systems support that writes different systems into the same FeNNol file. MultiSystems can now be exported using multi_systems.to("fennol", "combined_data.pkl") which combines all systems into a single pickle file with the proper FeNNol structure. Includes comprehensive tests and maintains backward compatibility. (c2e7011)

…ms to single file

Add support for MultiSystems.to("fennol", "file.pkl") that combines data from different systems into a single FeNNol pickle file. Includes comprehensive tests and maintains backward compatibility.

Co-authored-by: njzjz <[email protected]>
@Copilot Copilot AI changed the title feat(fennol): add FeNNol format support for exporting LabeledSystem data feat(fennol): add FeNNol format support with MultiSystems capability for ML training data Sep 1, 2025
@Copilot Copilot AI requested a review from njzjz September 1, 2025 12:59
Copilot finished work on behalf of njzjz September 1, 2025 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] support FeNNol

2 participants