Skip to content

davidfish-g/hypothesisai

Repository files navigation

HypothesisAI

A platform where domain experts evaluate AI-generated scientific hypotheses to benchmark how well different models perform at scientific reasoning.

What It Does

AI models generate hypotheses across scientific domains (Physics, Chemistry, Biology, and more). Experts rate each hypothesis on three criteria — novelty, plausibility, and testability — using a 1–5 scale. The ratings are aggregated into a leaderboard that ranks models by average expert scores.

Currently benchmarking 180+ models from OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, xAI, and others.

Evaluations

We welcome evaluations from all. Sign in at hypothesisai-production.up.railway.app and begin rating hypotheses in your area of expertise. It is free for all users.

Tech Stack

  • React
  • Next
  • PostgreSQL
  • OpenRouter
  • Tailwind

About

Crowdsourced AI benchmark for scientific hypothesis generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages