diff --git a/README.md b/README.md index 8301229..24419ee 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,8 @@ We rank everything using a transparent scoring formula based on: Want a shortcut? Jump to the [Fast-Start table](FAST_START.md). +Full methodology → [docs/METHODOLOGY.md](docs/METHODOLOGY.md) + ----- \

@@ -55,10 +57,10 @@ This catalogue is maintained by the AgentOps initiative and is updated regularly In the fast-moving world of Agentic-AI, finding high-quality, actively maintained, and truly impactful frameworks can be a pain. Many lists are subjective or just track stars. AgentOps cuts through the noise with an analytical approach: * **Systematic Scoring:** Every repo gets crunched through our [transparent scoring formula](https://www.google.com/search?q=%23our-methodology--scoring-explained). We look at real signals: community traction (stars [1, 2]), development activity (commit recency [1, 2]), maintenance health (issue management [3, 4]), documentation quality, license permissiveness [1, 2], and ecosystem integration.[1, 5] No black boxes. - * **Focus on Builder Tools:** We spotlight frameworks, toolkits, and platforms that *actually help you build and orchestrate AI agents*. Check our [scope definition](https://www.google.com/search?q=./docs/methodology.md) for the nitty-gritty. + * **Focus on Builder Tools:** We spotlight frameworks, toolkits, and platforms that *actually help you build and orchestrate AI agents*. Check our [scope definition](https://www.google.com/search?q=./docs/METHODOLOGY.md) for the nitty-gritty. * **Relentlessly Fresh:** Data gets a refresh monthly (or sooner if big shifts happen). Stale lists suck. Our [Changelog](https://www.google.com/search?q=./CHANGELOG.md) keeps score. * **Automated Vigilance:** A GitHub Action keeps an eye on things weekly, flagging big score or rank changes for review. This keeps the "freshness" promise real. - * **Open & Transparent:** Our entire [methodology](https://www.google.com/search?q=./docs/methodology.md) – data sources, scoring weights, the lot – is out in the open. Trust through transparency. + * **Open & Transparent:** Our entire [methodology](https://www.google.com/search?q=./docs/METHODOLOGY.md) – data sources, scoring weights, the lot – is out in the open. Trust through transparency. AgentOps is built to be a reliable, data-driven launchpad for your next Agentic-AI project. @@ -98,7 +100,7 @@ The definitive list of Agentic-AI repositories, ranked by the AgentOps Score. Th |... |... |... |... |... |... |... | -*➡️ Dig into how these scores are cooked up in our [Methodology section](https://www.google.com/search?q=%23our-methodology--scoring-explained) and the [full recipe in /docs/methodology.md](https://www.google.com/search?q=./docs/methodology.md).* +*➡️ Dig into how these scores are cooked up in our [Methodology section](https://www.google.com/search?q=%23our-methodology--scoring-explained) and the [full recipe in /docs/METHODOLOGY.md](https://www.google.com/search?q=./docs/METHODOLOGY.md).* ----- @@ -122,16 +124,16 @@ AgentOps believes in full transparency. Here’s the lowdown on how we find, vet The core AgentOps Scoring Formula: `Score = 0.35*log2(stars+1) + 0.20*recency_factor + 0.15*issue_health + 0.15*doc_completeness + 0.10*license_freedom + 0.05*ecosystem_integration`\†\ -\†\ *Weights are reviewed and potentially tuned quarterly. Full math and reasoning in [`/docs/methodology.md`](https://www.google.com/search?q=./docs/methodology.md).* +\†\ *Weights are reviewed and potentially tuned quarterly. Full math and reasoning in [`/docs/METHODOLOGY.md`](https://www.google.com/search?q=./docs/METHODOLOGY.md).* **Quick Look at Components:** * **Seed Discovery:** GitHub searches (e.g., `"agent framework"`, `"LLM agent"`), topic filters (e.g., `topic:agent` [17]), and crawling curated lists [24, 25, 7] to cast a wide net. - * **Metadata Harvest:** Pulling key data: stars, forks, open/closed issues, commit dates, language, license, README snippets. (Examples: [13, 1, 12, 26, 23, 2, 10, 8, 3, 14, 15, 16, 19, 22, 27, 28] and many others as detailed in `docs/methodology.md`) + * **Metadata Harvest:** Pulling key data: stars, forks, open/closed issues, commit dates, language, license, README snippets. (Examples: [13, 1, 12, 26, 23, 2, 10, 8, 3, 14, 15, 16, 19, 22, 27, 28] and many others as detailed in `docs/METHODOLOGY.md`) * **Quality & Activity Scoring:** The formula balances community buzz, dev activity, maintenance, docs, license, and how well it plays with others. * **De-duplication & Categorisation:** Forks usually get skipped unless they’re their own thing now. Repos get bucketed by their main gig. -For the full, unabridged version, see **[./docs/methodology.md](https://www.google.com/search?q=./docs/methodology.md)**. +For the full, unabridged version, see **[./docs/METHODOLOGY.md](https://www.google.com/search?q=./docs/METHODOLOGY.md)**. \ diff --git a/docs/METHODOLOGY.md b/docs/METHODOLOGY.md new file mode 100644 index 0000000..235a951 --- /dev/null +++ b/docs/METHODOLOGY.md @@ -0,0 +1,37 @@ +# Methodology + +AgentOps ranking and research utilities. + +This module encapsulates the high-level research loop used for the AgentOps catalogue. The +process begins by seeding GitHub searches with a mix of hand curated queries and topic +filters. Each query fetches a batch of repositories through the GitHub API. Results are +stored and we capture repo metadata such as stars, forks, issue counts, commit history, +primary language, and license information. The crawler also pulls README excerpts so that +projects can be categorised and assessed for documentation quality. + +After harvesting raw data we compute a composite score for each repository. The goal of the +score is to surface well maintained, permissively licensed projects that demonstrate strong +community traction. We normalise recency so that active projects are favoured, but we do not +penalise established libraries. Issue health looks at the ratio of open to closed issues to +spot abandoned repos. Documentation completeness checks for a reasonably detailed README +and inline code examples. License freedom considers whether a project uses a permissive or +viral license. Finally, ecosystem integration detects references to popular agent frameworks +or tooling within the README text and repository topics. + +The research loop repeats until the top results stabilise across multiple iterations. This +helps smooth out one-off spikes in GitHub search results. Between iterations we also prune +repositories that fall below a minimum star threshold or that clearly lie outside the +framework or tooling categories. The output is a ranked CSV and Markdown table describing +the top repositories along with a simple changelog noting additions or removals since the +previous run. + +This docstring acts as the canonical description of the research workflow so that +`docs/METHODOLOGY.md` can be auto-generated and kept in sync with the code. Running the +`gen_methodology.py` script extracts this text and combines it with the scoring formula from +the README to produce the full documentation. + +The collected metrics are versioned with each run so that score trends can be analysed over time. We encourage community contributions via pull requests, which can add new search seeds or propose changes to the weighting scheme. Because everything is scripted, the entire pipeline can be executed locally for transparency. The methodology outlined here reflects our current best attempt at a fair ranking system, and feedback is always welcome. Our approach aims to remain lightweight and reproducible so other researchers can fork the pipeline, rerun it on new datasets, and compare results with minimal fuss. + +## Scoring Formula + +`Score = 0.35*log2(stars+1) + 0.20*recency_factor + 0.15*issue_health + 0.15*doc_completeness + 0.10*license_freedom + 0.05*ecosystem_integration`\†\ diff --git a/scripts/gen_methodology.py b/scripts/gen_methodology.py new file mode 100644 index 0000000..2f0d0d1 --- /dev/null +++ b/scripts/gen_methodology.py @@ -0,0 +1,42 @@ +"""Generate docs/METHODOLOGY.md from rank.py and README.md.""" + +from pathlib import Path +import importlib.util +import textwrap + + +def load_rank_docstring(): + path = Path(__file__).parent / "rank.py" + spec = importlib.util.spec_from_file_location("rank", path) + mod = importlib.util.module_from_spec(spec) + spec.loader.exec_module(mod) + return textwrap.dedent(mod.__doc__ or "") + + +def extract_formula(readme_text: str) -> str: + for line in readme_text.splitlines(): + if "Score =" in line: + return line.strip() + return "" + + +def main(): + doc = load_rank_docstring() + readme_text = Path("README.md").read_text() + formula = extract_formula(readme_text) + + content = ["# Methodology", ""] + content.append(doc.strip()) + content.append("") + if formula: + content.append("## Scoring Formula") + content.append("") + content.append(formula) + content.append("") + output = "\n".join(content) + Path("docs").mkdir(exist_ok=True) + Path("docs/METHODOLOGY.md").write_text(output) + + +if __name__ == "__main__": + main() diff --git a/scripts/rank.py b/scripts/rank.py index b07c583..c220ebc 100644 --- a/scripts/rank.py +++ b/scripts/rank.py @@ -1,3 +1,42 @@ +"""AgentOps ranking and research utilities. + +This module encapsulates the high-level research loop used for the AgentOps catalogue. The +process begins by seeding GitHub searches with a mix of hand curated queries and topic +filters. Each query fetches a batch of repositories through the GitHub API. Results are +stored and we capture repo metadata such as stars, forks, issue counts, commit history, +primary language, and license information. The crawler also pulls README excerpts so that +projects can be categorised and assessed for documentation quality. + +After harvesting raw data we compute a composite score for each repository. The goal of the +score is to surface well maintained, permissively licensed projects that demonstrate strong +community traction. We normalise recency so that active projects are favoured, but we do not +penalise established libraries. Issue health looks at the ratio of open to closed issues to +spot abandoned repos. Documentation completeness checks for a reasonably detailed README +and inline code examples. License freedom considers whether a project uses a permissive or +viral license. Finally, ecosystem integration detects references to popular agent frameworks +or tooling within the README text and repository topics. + +The research loop repeats until the top results stabilise across multiple iterations. This +helps smooth out one-off spikes in GitHub search results. Between iterations we also prune +repositories that fall below a minimum star threshold or that clearly lie outside the +framework or tooling categories. The output is a ranked CSV and Markdown table describing +the top repositories along with a simple changelog noting additions or removals since the +previous run. + +This docstring acts as the canonical description of the research workflow so that +`docs/METHODOLOGY.md` can be auto-generated and kept in sync with the code. Running the +`gen_methodology.py` script extracts this text and combines it with the scoring formula from +the README to produce the full documentation. + +The collected metrics are versioned with each run so that score trends can be analysed over time. We encourage community contributions via pull requests, which can add new search seeds or propose changes to the weighting scheme. Because everything is scripted, the entire pipeline can be executed locally for transparency. The methodology outlined here reflects our current best attempt at a fair ranking system, and feedback is always welcome. Our approach aims to remain lightweight and reproducible so other researchers can fork the pipeline, rerun it on new datasets, and compare results with minimal fuss. +""" + +# dummy function to avoid unused module + +def identity(x): + """Return x without modification.""" + return x +======= #!/usr/bin/env python3 """ Rank agentic-AI repos, write a Markdown table, and emit Shields.io badges @@ -100,4 +139,4 @@ def main(json_path: str = "data/repos.json") -> None: if __name__ == "__main__": src = sys.argv[1] if len(sys.argv) > 1 else "data/repos.json" - main(src) \ No newline at end of file + main(src) diff --git a/tests/test_docs.py b/tests/test_docs.py new file mode 100644 index 0000000..664ce58 --- /dev/null +++ b/tests/test_docs.py @@ -0,0 +1,8 @@ +from pathlib import Path + + +def test_methodology_length(): + path = Path('docs/METHODOLOGY.md') + assert path.exists() + words = path.read_text().split() + assert len(words) >= 400