Skip to content

Commit 0f6cf4b

Browse files
greynewellclaude
andauthored
feat: enforce strict code quality with expanded linting and type checking (v0.14.0)
## Summary - Expand Ruff linting rules (B, UP, SIM, RUF, C4, PIE, PT, ASYNC, S, T20) and add mypy type checking with Pydantic plugin across all 134 source files - Fix all 72 ruff lint violations across 36 files and all 267 mypy type errors across 39 files - Add mypy pre-commit hook (local, using `uv run mypy`) and CI type-check job ## Test plan - [x] `uvx ruff check src/ tests/` — All checks passed - [x] `uvx ruff format --check src/ tests/` — 266 files already formatted - [x] `uv run mypy src/mcpbr/` — Success: no issues found in 134 source files - [x] `uv run pytest -m "not integration"` — 4293 passed, 0 failures - [x] All pre-commit hooks pass (sync-version, ruff, ruff-format, mypy, trailing-whitespace, end-of-file-fixer, check-yaml, check-added-large-files, check-merge-conflict) 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit # Release Notes: Version 0.14.0 * **New Features** * Added comprehensive type checking with mypy integration in CI/pre-commit workflows * Expanded code quality enforcement with stricter linting rules * **Bug Fixes** * Fixed multiple code quality violations and type errors across the codebase * Improved error handling and timezone handling consistency * Enhanced validation and safety checks throughout infrastructure providers * **Documentation** * Updated development guidelines to include type checking requirements <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 5336c9d commit 0f6cf4b

143 files changed

Lines changed: 1412 additions & 1093 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.claude-plugin/marketplace.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"$schema": "https://anthropic.com/claude-code/marketplace.schema.json",
33
"name": "mcpbr",
4-
"version": "0.13.4",
4+
"version": "0.14.0",
55
"description": "mcpbr - MCP Benchmark Runner plugin marketplace",
66
"owner": {
77
"name": "mcpbr Contributors",
@@ -11,7 +11,7 @@
1111
{
1212
"name": "mcpbr",
1313
"description": "Expert benchmark runner for MCP servers using mcpbr. Handles Docker checks, config generation, and result parsing.",
14-
"version": "0.13.4",
14+
"version": "0.14.0",
1515
"author": {
1616
"name": "mcpbr Contributors"
1717
},

.claude-plugin/package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@greynewell/mcpbr-claude-plugin",
3-
"version": "0.13.4",
3+
"version": "0.14.0",
44
"description": "Claude Code plugin for mcpbr - Expert benchmark runner for MCP servers with specialized skills",
55
"keywords": [
66
"claude-code",

.claude-plugin/plugin.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "mcpbr",
3-
"version": "0.13.4",
3+
"version": "0.14.0",
44
"description": "Expert benchmark runner for MCP servers using mcpbr. Handles Docker checks, config generation, and result parsing.",
55
"schema_version": "1.0"
66
}

.github/workflows/ci.yml

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,34 @@ jobs:
5757
pip install pre-commit
5858
5959
- name: Run pre-commit hooks
60-
run: pre-commit run --all-files --show-diff-on-failure
60+
# Skip mypy in pre-commit; the dedicated type-check job runs it
61+
# with full project dependencies installed.
62+
run: SKIP=mypy pre-commit run --all-files --show-diff-on-failure
63+
64+
type-check:
65+
runs-on: ubuntu-latest
66+
steps:
67+
- uses: actions/checkout@v4
68+
69+
- name: Set up Python
70+
uses: actions/setup-python@v5
71+
with:
72+
python-version: "3.11"
73+
74+
- name: Install dependencies
75+
run: |
76+
python -m pip install --upgrade pip
77+
pip install -e ".[dev]"
78+
79+
- name: Cache mypy
80+
uses: actions/cache@v4
81+
with:
82+
path: .mypy_cache
83+
key: mypy-${{ hashFiles('pyproject.toml') }}
84+
restore-keys: mypy-
85+
86+
- name: Run mypy
87+
run: mypy src/mcpbr/
6188

6289
test:
6390
runs-on: ubuntu-latest

.pre-commit-config.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,15 @@ repos:
1616
args: [--fix]
1717
- id: ruff-format
1818

19+
- repo: local
20+
hooks:
21+
- id: mypy
22+
name: mypy
23+
entry: uv run --extra dev mypy src/mcpbr/
24+
language: system
25+
pass_filenames: false
26+
types: [python]
27+
1928
- repo: https://github.com/pre-commit/pre-commit-hooks
2029
rev: v5.0.0
2130
hooks:

AGENTS.md

Lines changed: 40 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,18 @@ If any linting errors remain, they MUST be fixed manually before proceeding.
175175
uvx ruff check --fix src/ tests/ && uvx ruff format src/ tests/ && uvx ruff check src/ tests/
176176
```
177177

178-
### 2. Run Tests
178+
### 2. Run Type Checking
179+
180+
```bash
181+
# Run mypy on source code
182+
uv run mypy src/mcpbr/
183+
```
184+
185+
**Expected output:** `Success: no issues found`
186+
187+
If any type errors remain, they MUST be fixed before proceeding.
188+
189+
### 3. Run Tests
179190

180191
```bash
181192
# Run all non-integration tests
@@ -187,7 +198,7 @@ uv run pytest -m integration
187198

188199
**Expected result:** All tests must pass with 0 failures.
189200

190-
### 3. Update CHANGELOG
201+
### 4. Update CHANGELOG
191202

192203
**MANDATORY:** If your changes are user-visible, update CHANGELOG.md:
193204

@@ -201,7 +212,7 @@ uv run pytest -m integration
201212
cat CHANGELOG.md | head -30
202213
```
203214

204-
### 4. Verify Changes
215+
### 5. Verify Changes
205216

206217
- Review all modified files
207218
- Ensure no unintended changes were introduced
@@ -217,7 +228,8 @@ The project uses Ruff for linting with the following configuration:
217228

218229
- **Line length:** 100 characters (E501 is ignored)
219230
- **Target Python version:** 3.11+
220-
- **Enabled rules:** E (pycodestyle errors), F (pyflakes), I (isort), N (pep8-naming), W (pycodestyle warnings)
231+
- **Enabled rules:** E (pycodestyle), F (pyflakes), I (isort), N (pep8-naming), W (warnings), B (bugbear), UP (pyupgrade), SIM (simplify), RUF (ruff-specific), C4 (comprehensions), PIE (misc), PT (pytest-style), ASYNC (async bugs), S (security/bandit), T20 (print detection)
232+
- **Type checking:** mypy with Pydantic plugin, strict mode on core modules
221233

222234
### Common Linting Issues to Avoid
223235

@@ -226,6 +238,10 @@ The project uses Ruff for linting with the following configuration:
226238
3. **Undefined names** - All variables and functions must be defined before use
227239
4. **Line too long** - While E501 is ignored, try to keep lines under 100 chars when reasonable
228240
5. **Trailing whitespace** - Remove trailing whitespace from all lines
241+
6. **Mutable default args** (B006) - Don't use `[]` or `{}` as default arguments
242+
7. **Exception chaining** (B904) - Use `raise X from err` inside `except` blocks
243+
8. **Modern Python** (UP) - Use Python 3.11+ patterns (e.g., `X | Y` unions, `match` statements)
244+
9. **Simplifications** (SIM) - Collapse nested `with`/`if` statements, use `contextlib.suppress()`
229245

230246
### Code Style
231247

@@ -422,11 +438,12 @@ Checklist for CHANGELOG:
422438

423439
1. ✅ All linting checks pass (`uvx ruff check src/ tests/`)
424440
2. ✅ Code is formatted (`uvx ruff format src/ tests/`)
425-
3. ✅ All tests pass (`uv run pytest -m "not integration"`)
426-
4.**CHANGELOG.md is updated** (for user-visible changes)
427-
5. ✅ Code is documented
428-
6. ✅ README is updated (if applicable)
429-
7. ✅ Changes are committed with descriptive commit messages
441+
3. ✅ Type checking passes (`uv run mypy src/mcpbr/`)
442+
4. ✅ All tests pass (`uv run pytest -m "not integration"`)
443+
5.**CHANGELOG.md is updated** (for user-visible changes)
444+
6. ✅ Code is documented
445+
7. ✅ README is updated (if applicable)
446+
8. ✅ Changes are committed with descriptive commit messages
430447

431448
### PR Title Format
432449

@@ -537,9 +554,10 @@ git push
537554
### ✅ DO: Check Linting First
538555

539556
```bash
540-
# Good: Check linting before commit
557+
# Good: Check linting and types before commit
541558
uvx ruff check --fix src/ tests/
542559
uvx ruff format src/ tests/
560+
uv run mypy src/mcpbr/
543561
uv run pytest -m "not integration"
544562
git commit -m "feat: add new feature"
545563
git push
@@ -590,14 +608,17 @@ uvx ruff check --fix src/ tests/
590608
uvx ruff format src/ tests/
591609
uvx ruff check src/ tests/ # Verify all fixed
592610

593-
# 5. Run tests
611+
# 5. Run type checking
612+
uv run mypy src/mcpbr/
613+
614+
# 6. Run tests
594615
uv run pytest -m "not integration"
595616

596-
# 6. Commit changes (include CHANGELOG.md)
617+
# 7. Commit changes (include CHANGELOG.md)
597618
git add src/ tests/ CHANGELOG.md
598619
git commit -m "feat: add my new feature"
599620

600-
# 7. Push and create PR
621+
# 8. Push and create PR
601622
git push -u origin feature/my-new-feature
602623
gh pr create --title "feat: add my new feature" --body "Implements #123"
603624
```
@@ -615,9 +636,10 @@ The project uses GitHub Actions for CI/CD. All PRs must pass:
615636

616637
1. **Lint Check** - `uvx ruff check src/ tests/`
617638
2. **Format Check** - `uvx ruff format --check src/ tests/`
618-
3. **Build Check** - Package builds successfully
619-
4. **Test (Python 3.11)** - All tests pass on Python 3.11
620-
5. **Test (Python 3.12)** - All tests pass on Python 3.12
639+
3. **Type Check** - `mypy src/mcpbr/`
640+
4. **Build Check** - Package builds successfully
641+
5. **Test (Python 3.11)** - All tests pass on Python 3.11
642+
6. **Test (Python 3.12)** - All tests pass on Python 3.12
621643

622644
You can view check results on any PR:
623645
```bash
@@ -626,11 +648,11 @@ gh pr checks <PR_NUMBER>
626648

627649
## Summary
628650

629-
**Remember:** The most important rule is to run linting, formatting, and tests BEFORE committing. This ensures high code quality and prevents CI/CD failures.
651+
**Remember:** The most important rule is to run linting, formatting, type checking, and tests BEFORE committing. This ensures high code quality and prevents CI/CD failures.
630652

631653
**Pre-commit command:**
632654
```bash
633-
uvx ruff check --fix src/ tests/ && uvx ruff format src/ tests/ && uv run pytest -m "not integration"
655+
uvx ruff check --fix src/ tests/ && uvx ruff format src/ tests/ && uv run mypy src/mcpbr/ && uv run pytest -m "not integration"
634656
```
635657

636658
Happy coding! 🚀

CHANGELOG.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,30 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.14.0] - 2026-02-13
11+
12+
### Added
13+
14+
- **Strict code quality enforcement**: Expanded Ruff linting rules (B, UP, SIM, RUF, C4, PIE, PT,
15+
ASYNC, S, T20) and added mypy type checking with Pydantic plugin across all 134 source files
16+
- Added mypy pre-commit hook and CI type-check job
17+
- Zero ruff violations (72 fixed across 36 files)
18+
- Zero mypy errors (267 fixed across 39 files)
19+
- All 4293 tests pass with no regressions
20+
21+
### Fixed
22+
23+
- **72 ruff lint violations** across 36 files: B904 (raise-without-from), SIM102/SIM105/SIM115/
24+
SIM116/SIM117 (simplifications), RUF059/RUF003 (unused vars, Unicode), B007 (unused loop vars),
25+
PT019 (pytest fixtures), S-rules (security: S310 URL validation, S108 temp dirs, S311 non-crypto
26+
random, S110 exception handling, S608 SQL, S112 try-except-continue, S104 binding, S602 shell)
27+
- **267 mypy type errors** across 39 files: union-attr (128), assignment (33), no-any-return (28),
28+
arg-type (23), and others. Fixed with proper type narrowing, assertions, annotations, and
29+
type-safe patterns across infrastructure providers (GCP, AWS, Azure, Cloudflare, K8s), core
30+
modules (harness, CLI, docker_env), and utility modules (providers, notifications, benchmarks)
31+
32+
[0.14.0]: https://github.com/greynewell/mcpbr/releases/tag/v0.14.0
33+
1034
## [0.13.0] - 2026-02-13
1135

1236
### Fixed

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@greynewell/mcpbr",
3-
"version": "0.13.4",
3+
"version": "0.14.0",
44
"description": "Model Context Protocol Benchmark Runner - CLI tool for evaluating MCP servers",
55
"keywords": [
66
"mcpbr",

pyproject.toml

Lines changed: 91 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "mcpbr"
7-
version = "0.13.4"
7+
version = "0.14.0"
88
description = "Model Context Protocol Benchmark Runner - evaluate MCP servers against software engineering benchmarks"
99
readme = "README.md"
1010
license = "MIT"
@@ -46,6 +46,12 @@ dev = [
4646
"ruff>=0.1.0",
4747
"pre-commit>=3.0.0",
4848
"slack_sdk>=3.27.0",
49+
"mypy>=1.11.0",
50+
"types-docker",
51+
"types-paramiko",
52+
"types-PyYAML",
53+
"types-requests",
54+
"types-psutil",
4955
]
5056
docs = [
5157
"mkdocs>=1.5.0",
@@ -90,12 +96,94 @@ line-length = 100
9096
target-version = "py311"
9197

9298
[tool.ruff.lint]
93-
select = ["E", "F", "I", "N", "W"]
94-
ignore = ["E501"]
99+
select = [
100+
"E", # pycodestyle errors
101+
"F", # pyflakes
102+
"I", # isort
103+
"N", # pep8-naming
104+
"W", # pycodestyle warnings
105+
"B", # flake8-bugbear
106+
"UP", # pyupgrade (Python 3.11+)
107+
"SIM", # simplify
108+
"RUF", # ruff-specific
109+
"C4", # flake8-comprehensions
110+
"PIE", # misc linting
111+
"PT", # pytest-style
112+
"ASYNC", # async bugs
113+
"S", # bandit (security)
114+
"T20", # print detection
115+
]
116+
ignore = [
117+
"E501", # line too long (handled by formatter)
118+
"B008", # function call in default argument (Click pattern)
119+
"S101", # assert usage (fine in tests)
120+
"S603", # subprocess call - check for untrusted input
121+
"S607", # start process with partial path
122+
"T201", # print statement (CLI tool uses print)
123+
"SIM108", # ternary operator (readability preference)
124+
"PT011", # pytest.raises too broad
125+
"PT012", # pytest.raises multiple statements
126+
"RUF012", # mutable class variable (Pydantic models)
127+
"ASYNC109",# async function timeout param (trio-specific, not asyncio)
128+
"ASYNC110",# async sleep in loop (trio-specific)
129+
"ASYNC221",# await in async for (trio-specific)
130+
"ASYNC230",# open call in async function (trio-specific)
131+
"ASYNC240",# async generator (trio-specific)
132+
"ASYNC251",# async sleep in async for (trio-specific)
133+
]
134+
135+
[tool.ruff.lint.per-file-ignores]
136+
"tests/**/*.py" = ["S", "T20"]
137+
"infrastructure/**/*.py" = ["S603", "S607"]
138+
"src/mcpbr/infrastructure/**/*.py" = ["S603", "S607", "S108"]
139+
"scripts/**/*.py" = ["T20", "S"]
95140

96141
[tool.pytest.ini_options]
97142
asyncio_mode = "auto"
98143
testpaths = ["tests"]
99144
markers = [
100145
"integration: marks tests as integration tests (deselect with '-m not integration')",
101146
]
147+
148+
[tool.mypy]
149+
python_version = "3.11"
150+
warn_return_any = true
151+
warn_unreachable = true
152+
no_implicit_optional = true
153+
strict_equality = true
154+
check_untyped_defs = true
155+
disallow_incomplete_defs = true
156+
plugins = ["pydantic.mypy"]
157+
158+
[[tool.mypy.overrides]]
159+
module = [
160+
"datasets",
161+
"datasets.*",
162+
"google.generativeai",
163+
"google.generativeai.*",
164+
"wandb",
165+
"wandb.*",
166+
"slack_sdk",
167+
"slack_sdk.*",
168+
"uvicorn",
169+
"uvicorn.*",
170+
"fastapi",
171+
"fastapi.*",
172+
"tomli",
173+
"tomli.*",
174+
"weasyprint",
175+
"weasyprint.*",
176+
"terminal_bench",
177+
"terminal_bench.*",
178+
]
179+
ignore_missing_imports = true
180+
181+
[[tool.mypy.overrides]]
182+
module = [
183+
"mcpbr.models",
184+
"mcpbr.config",
185+
"mcpbr.evaluation",
186+
"mcpbr.pricing",
187+
]
188+
disallow_untyped_defs = true
189+
warn_unused_ignores = true

scripts/sync_version.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,6 @@
2121
class VersionNotFoundError(Exception):
2222
"""Raised when version cannot be found in pyproject.toml."""
2323

24-
pass
25-
2624

2725
def get_version_from_pyproject(pyproject_path: Path) -> str:
2826
"""Extract version from pyproject.toml."""

0 commit comments

Comments
 (0)