Physics LLM Evaluation Benchmark

A curated benchmark of physics problems, step-by-step solutions, and evaluation rubrics for testing large language model reasoning in scientific tasks.

This repository is designed to evaluate whether AI systems can solve physics problems with mathematical consistency, conceptual clarity, and physically meaningful reasoning.

Goals

Create challenging physics problems across undergraduate and graduate-level topics.
Provide detailed step-by-step solutions.
Define evaluation rubrics for grading AI-generated answers.
Highlight common reasoning mistakes made in physics problem solving.
Support benchmarking of LLMs in scientific reasoning tasks.

Topics

The benchmark currently includes or plans to include problems in:

Classical mechanics
Electromagnetism
Quantum mechanics
Statistical physics
Condensed matter physics
Quantum information and quantum dynamics

Repository Structure

problems/
  mechanics/
  electromagnetism/
  quantum_mechanics/
  statistical_physics/
  condensed_matter/

solutions/
  mechanics/
  electromagnetism/
  quantum_mechanics/
  statistical_physics/
  condensed_matter/

rubrics/
  evaluation_rubric.md

scripts/
  validate_dataset.py

Current Examples

Classical Mechanics: Inclined plane with friction and an external force
Electromagnetism: Electric field of a uniformly charged spherical shell
Quantum Mechanics: Spin-1/2 particle in a magnetic field
Statistical Physics: Two-level system and partition function
Condensed Matter / Quantum Dynamics: Two-spin transverse-field Ising Hamiltonian

Status

This repository is under active development. Initial examples focus on physics reasoning tasks relevant to AI model evaluation, including multi-step problem solving, symbolic manipulation, and conceptual interpretation.

Author

Thiago Rocha Girão Souza
PhD Candidate in Physics
Quantum Computing | Quantum Dynamics | Scientific Python

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
problems		problems
rubrics		rubrics
scripts		scripts
solutions		solutions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Physics LLM Evaluation Benchmark

Goals

Topics

Repository Structure

Current Examples

Status

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Physics LLM Evaluation Benchmark

Goals

Topics

Repository Structure

Current Examples

Status

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages