-
Notifications
You must be signed in to change notification settings - Fork 120
feat(analytics): add filtering of bot events in create_pr_metrics and… #645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… corresponding tests
WalkthroughThe changes update the pull request metrics calculation by adding filtering mechanisms to exclude bot-generated events. A new method Changes
Sequence Diagram(s)sequenceDiagram
participant PR as Pull Request Events
participant GH as GithubETLHandler
participant ETL as CodeETLAnalyticsService
participant Calc as Metrics Calculator
PR->>GH: process_pr(pr_events)
GH->>GH: _github_bot_filter(pr_events)
GH->>ETL: create_pr_metrics(filtered_events)
ETL->>ETL: filter_non_bot_events(filtered_events)
ETL->>Calc: calculate metrics from human events
Calc-->>ETL: metrics results
ETL-->>GH: metrics
GH-->>PR: return original pr_events
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (1)
505-603
: Consider adding a test for the second bot detection condition.The current tests thoroughly cover filtering based on usernames ending with "[bot]", but don't explicitly test the second condition that checks if
event.data.get("user", {}).get("type") == "Bot"
.You could add a test like:
def test_create_pr_metrics_filters_bot_by_type(): pr_service = CodeETLAnalyticsService() t1 = time_now() t2 = t1 + timedelta(hours=1) t3 = t2 + timedelta(hours=1) pr = get_pull_request(state=PullRequestState.MERGED, created_at=t1, updated_at=t1) # Create an event with user type set to "Bot" bot_event = get_pull_request_event( pull_request_id=pr.id, reviewer="normal_name", # Not ending with [bot] state=PullRequestEventState.COMMENTED.value, created_at=t2, data={"user": {"login": "normal_name", "type": "Bot"}} ) human_event = get_pull_request_event( pull_request_id=pr.id, reviewer="human_user", state=PullRequestEventState.APPROVED.value, created_at=t3, ) pr_metrics = pr_service.create_pr_metrics(pr, [bot_event, human_event], []) assert "human_user" in pr_metrics.reviewers assert "normal_name" not in pr_metrics.reviewers
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
(1 hunks)backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (4)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (2)
CodeETLAnalyticsService
(15-186)create_pr_metrics
(16-63)backend/analytics_server/mhq/utils/time.py (1)
time_now
(10-11)backend/analytics_server/mhq/store/models/code/enums.py (2)
PullRequestState
(18-21)PullRequestEventState
(24-27)backend/analytics_server/tests/factories/models/code.py (1)
get_pull_request_event
(79-107)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: All file linting
🔇 Additional comments (6)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (1)
24-35
: The bot filtering logic looks robust.The implementation correctly filters out bot events from PR metrics using two detection mechanisms: checking for usernames ending with "[bot]" and examining the user type in event data. This ensures that metrics like response time and rework cycles are based solely on human interactions.
backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (5)
505-526
: Good test case for verifying bot filtering.This test correctly verifies that bot reviewers are excluded from the
reviewers
list while human reviewers are retained. The test creates both a bot event and a human event, then checks that only the human reviewer is included in the final metrics.
528-541
: Good test case for human first response time.This test verifies that when a human reviewer provides the first response, the
first_response_time
metric is correctly calculated as 3600 seconds (1 hour).
543-565
: Good test case for human rework time.This test confirms that when a human reviewer requests changes and later approves the PR, the
rework_time
metric is properly calculated as 3600 seconds (1 hour).
567-580
: Good test case for bot-only first response time.This test properly verifies that when only bot events are present, the
first_response_time
metric is set toNone
since bot events are filtered out.
582-603
: Good test case for bot-only rework time.This test confirms that when both the change request and approval events come from a bot, the
rework_time
metric is set toNone
since bot events are filtered out.
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
Outdated
Show resolved
Hide resolved
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
Outdated
Show resolved
Hide resolved
backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
backend/analytics_server/tests/utils/string/test_is_bot_name.py (1)
1-49
:⚠️ Potential issueComprehensive test coverage for bot name detection!
The test suite effectively covers a wide range of scenarios:
- Simple bot name formats with different delimiters
- Various prefixes and suffixes
- Special patterns like "bot" and "bot_"
- Case insensitivity
- Special characters
- Negative cases where "bot" is part of a word
- Edge cases with empty strings and non-text inputs
The file needs to be formatted with Black according to the pipeline failure. Please run
black .
to fix this formatting issue.🧰 Tools
🪛 GitHub Actions: Black Integration
[error] 1-1: Black formatting check failed. File would be reformatted. Run 'black .' to fix code style issues.
🧹 Nitpick comments (1)
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (1)
357-366
: Optimize boolean comparison using!=
operator.The boolean expression can be simplified by using the inequality operator.
- if not (pr_event_data.get("type") == "Bot"): + if pr_event_data.get("type") != "Bot":🧰 Tools
🪛 Ruff (0.8.2)
364-364: Use
pr_event_data.get("type") != "Bot"
instead ofnot pr_event_data.get("type") == "Bot"
Replace with
!=
operator(SIM201)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
(4 hunks)backend/analytics_server/mhq/service/code/sync/etl_github_handler.py
(2 hunks)backend/analytics_server/mhq/utils/string.py
(1 hunks)backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py
(1 hunks)backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py
(1 hunks)backend/analytics_server/tests/utils/string/test_is_bot_name.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
- backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (2)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (1)
create_pr_metrics
(17-58)backend/analytics_server/mhq/store/models/code/pull_requests.py (1)
PullRequestEvent
(79-103)
backend/analytics_server/tests/utils/string/test_is_bot_name.py (1)
backend/analytics_server/mhq/utils/string.py (1)
is_bot_name
(9-13)
🪛 Ruff (0.8.2)
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py
364-364: Use pr_event_data.get("type") != "Bot"
instead of not pr_event_data.get("type") == "Bot"
Replace with !=
operator
(SIM201)
🪛 GitHub Actions: Black Integration
backend/analytics_server/tests/utils/string/test_is_bot_name.py
[error] 1-1: Black formatting check failed. File would be reformatted. Run 'black .' to fix code style issues.
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: All file linting
🔇 Additional comments (6)
backend/analytics_server/mhq/utils/string.py (2)
9-13
: Good implementation of bot name detection!The regex pattern effectively captures common bot name patterns, including various delimiters and positions of the "bot" substring. The case-insensitive flag ensures better coverage. Well-structured function with proper type hinting.
2-2
: LGTM! Good addition of the re module import.Clean import of the required
re
module to support the new function.backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (1)
184-187
: Good integration of bot filtering in the workflow.The implementation properly filters bot events before calculating PR metrics, improving data accuracy for human interactions. This aligns well with the PR objective to exclude bot events from metrics.
backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (3)
356-362
: Good test for bot filtering - empty case.This test properly verifies that events with bot users are filtered out completely.
364-372
: Good test for non-bot events preservation.This test ensures that legitimate human events are preserved during filtering.
374-382
: Good test for mixed bot and human events.This test verifies that only non-bot events are preserved when processing a mixed list. It also correctly tests the "actor" field in addition to the "user" field, ensuring complete coverage of both data structures.
This pull request introduces changes to filter out bot events from pull request metrics and adds corresponding test cases to ensure the correctness of this new functionality. The most important changes include modifying the
create_pr_metrics
function and adding several new test functions.Filtering bot events:
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
: Modified thecreate_pr_metrics
function to filter out bot events from the pull request events.New test cases:
backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py
: Added thetest_create_pr_metrics_filters_bot_events
function to verify that bot events are filtered out correctly.backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py
: Added thetest_create_pr_metrics_no_bot_first_response_time
function to ensure that the first response time is calculated correctly when there are no bot events.backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py
: Added thetest_create_pr_metrics_no_bot_rework_time
function to verify that the rework time is calculated correctly in the absence of bot events.backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py
: Added thetest_create_pr_metrics_no_human_first_response_time
andtest_create_pr_metrics_no_human_rework_time
functions to ensure that metrics are not calculated when only bot events are present.Filter BOT/AI Pull Request Reviews #635Testing
Youtube.
Summary by CodeRabbit
New Features
Tests