-
Notifications
You must be signed in to change notification settings - Fork 120
Track accurate PR cycle time by identifying first ready-for-review time #649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…cking - Implemented GitHubApiService methods to retrieve PR timeline events. - Defined dataclasses and TypedDict for GitHubUser, GitHubReviewEvent, and GitHubReadyForReviewEvent. - Introduced TimelineEventUtils for sharing event filtering and sorting functions. - Update cycle time computation to use earliest ready-for-review_event, enhancing performance analysis. - Added corresponding unit tests for new functionalities.
…ng to use actor_username in tests
WalkthroughThis update introduces comprehensive support for handling GitHub pull request timeline events in the analytics backend. New data models and type definitions are added for timeline events such as reviews and "ready for review" actions. The GitHub API service is extended to fetch and process these timeline events, converting them into strongly typed objects. Event processing logic in ETL services and handlers is refactored to utilize timeline events, focusing on accurate event filtering and extraction. Utility functions enable flexible event selection, and enums and factories are updated to support the new event types. Extensive tests are added and updated to validate the new behaviors. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant GithubApiService
participant GitHub
participant TimelineEventUtils
Client->>GithubApiService: get_pr_timeline(org, repo, pr_number)
GithubApiService->>GitHub: GET /repos/{org}/{repo}/pulls/{pr_number}/timeline (paginated)
GitHub-->>GithubApiService: Timeline events (JSON)
GithubApiService->>GithubApiService: _fetch_all_timeline_events()
GithubApiService->>GithubApiService: _process_timeline_events()
GithubApiService->>TimelineEventUtils: Filter/sort events by type
GithubApiService-->>Client: List[GitHubTimeline]
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (8)
backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py (1)
58-66
: Redundant in-place sort after already receiving a sorted list
TimelineEventUtils.get_sorted_events_by_type()
returns an already sorted list.
Callingreview_events.sort(...)
immediately afterwards performs an unnecessary O(n log n) sort on the same data.- review_events.sort(key=lambda x: x.created_at)
You can safely remove the above line to save a few CPU cycles and keep the code intent clear.
backend/analytics_server/mhq/service/code/sync/utils/timeline.py (1)
30-42
: Type-var mismatch infilter_by_instance_type
return annotation
events
isList[E]
,event_class
isType[T]
, but the function returns
List[T]
. When filtering, the resulting list actually contains items of
the same runtime type as the input (E
), not necessarilyT
.
This confuses static type-checkers (e.g., mypy/pyright).- def filter_by_instance_type(events: List[E], event_class: Type[T]) -> List[T]: + def filter_by_instance_type(events: List[E], event_class: Type[T]) -> List[T]:Change the return type to
List[E]
(or drop the type vars and useAny
) or
replace the parameters with a singleType[T] / List[T]
pairing.backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (1)
215-233
:idempotency_key
type mismatch may hide bugs
get_pull_request_event
ultimately storesidempotency_key
as a string,
but the test passes the rawevent.id
(likely anint
). Because Python
silently accepts both, the discrepancy can slip into production where DB
schemas often enforcevarchar
.- idempotency_key=event1.id, + idempotency_key=str(event1.id),Apply the same change for
event2.id
and the old-event block below to keep
test data faithful to production types.backend/analytics_server/mhq/exapi/github.py (1)
221-223
: Avoid re-allocating the supported-event set on every call
SUPPORTED_EVENTS
is re-created each iteration; move it to a module/class level constant to save allocations and ease discoverability.- SUPPORTED_EVENTS = {"ready_for_review", "reviewed"} - return timeline_event.get("event") in SUPPORTED_EVENTS + return timeline_event.get("event") in _SUPPORTED_TIMELINE_EVENTS(define
_SUPPORTED_TIMELINE_EVENTS
once at the top of the file).backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (1)
308-324
: Minor: don’t silently drop draft / pending reviews
continue # Skip incomplete reviews
filters on the presence of
review.submitted_at
, thereby omitting reviews in “PENDING” state.
If the intention is to exclude only draft/pending entries, also
checkreview.state != "PENDING"
so that future GitHub changes do not
invalidate the logic.backend/analytics_server/mhq/exapi/models/timeline.py (3)
29-42
: Consider using more specific types for Dict fields and enum values.Two improvements to consider:
- The
_links
field is typed as plainDict
without specifying key and value types. UseDict[str, Any]
or a more specific type if the structure is known.- The
state
field should use aLiteral
type to restrict values to valid GitHub review states ("APPROVED", "CHANGES_REQUESTED", "COMMENTED", etc.)- from typing import Optional, Dict, Literal, Union + from typing import Optional, Dict, Literal, Union, Any @dataclass class GitHubReviewEvent: id: int node_id: str user: GitHubUser body: str commit_id: str submitted_at: Union[str, datetime] - state: str + state: Literal["APPROVED", "CHANGES_REQUESTED", "COMMENTED", "DISMISSED"] html_url: str pull_request_url: str author_association: str - _links: Dict + _links: Dict[str, Any] event: Literal["reviewed"]
44-49
: Defensively handle potential timestamp parsing errors.The timestamp conversion looks good, but consider adding error handling for malformed timestamps to prevent unexpected exceptions at runtime.
def __post_init__(self): if isinstance(self.submitted_at, str) and self.submitted_at: # Convert ISO format with Z (UTC) to datetime + try: self.submitted_at = datetime.fromisoformat( self.submitted_at.replace("Z", "+00:00") ) + except ValueError as e: + # Log the error or handle invalid timestamp format + print(f"Warning: Invalid timestamp format {self.submitted_at}: {e}")
44-69
: Extract duplicate timestamp conversion logic to a utility function.Both event classes have nearly identical timestamp conversion logic in their
__post_init__
methods. Consider extracting this to a utility function to avoid code duplication.+ def _convert_iso_timestamp(timestamp: Optional[Union[str, datetime]]) -> Optional[datetime]: + """Convert ISO 8601 timestamp string to datetime object.""" + if not isinstance(timestamp, str) or not timestamp: + return timestamp + try: + return datetime.fromisoformat(timestamp.replace("Z", "+00:00")) + except ValueError as e: + # Log the error or handle invalid timestamp format + print(f"Warning: Invalid timestamp format {timestamp}: {e}") + return timestamp @dataclass class GitHubReviewEvent: # ... fields ... def __post_init__(self): - if isinstance(self.submitted_at, str) and self.submitted_at: - # Convert ISO format with Z (UTC) to datetime - self.submitted_at = datetime.fromisoformat( - self.submitted_at.replace("Z", "+00:00") - ) + self.submitted_at = _convert_iso_timestamp(self.submitted_at) @dataclass class GitHubReadyForReviewEvent: # ... fields ... def __post_init__(self): - if isinstance(self.created_at, str) and self.created_at: - # Convert ISO format with Z (UTC) to datetime - self.created_at = datetime.fromisoformat( - self.created_at.replace("Z", "+00:00") - ) + self.created_at = _convert_iso_timestamp(self.created_at)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
backend/analytics_server/mhq/exapi/github.py
(3 hunks)backend/analytics_server/mhq/exapi/models/timeline.py
(1 hunks)backend/analytics_server/mhq/exapi/types/timeline.py
(1 hunks)backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
(6 hunks)backend/analytics_server/mhq/service/code/sync/etl_github_handler.py
(4 hunks)backend/analytics_server/mhq/service/code/sync/utils/timeline.py
(1 hunks)backend/analytics_server/mhq/store/models/code/enums.py
(1 hunks)backend/analytics_server/tests/factories/models/code.py
(2 hunks)backend/analytics_server/tests/factories/models/exapi/github.py
(2 hunks)backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py
(2 hunks)backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py
(4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
backend/analytics_server/mhq/exapi/github.py (2)
backend/analytics_server/mhq/exapi/types/timeline.py (2)
GitHubReadyForReviewEventDict
(53-54)GitHubReviewEventDict
(26-38)backend/analytics_server/mhq/exapi/models/timeline.py (3)
GitHubReadyForReviewEvent
(53-69)GitHubReviewEvent
(30-49)GitHubUser
(7-26)
backend/analytics_server/tests/service/code/sync/test_etl_github_handler.py (2)
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (1)
GithubETLHandler
(43-420)backend/analytics_server/tests/factories/models/code.py (1)
get_pull_request_event
(79-108)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: All file linting
🔇 Additional comments (14)
backend/analytics_server/mhq/store/models/code/enums.py (1)
32-32
: Well-structured enum additionAdding the READY_FOR_REVIEW enum value aligns perfectly with the PR objective of tracking accurate PR cycle times. This allows the system to explicitly model and process "ready for review" timeline events from GitHub.
backend/analytics_server/tests/factories/models/code.py (1)
89-89
: Good improvement to the test factoryAdding the optional
actor_username
parameter enhances test flexibility by allowing explicit actor specification independent of reviewer, which aligns with the PR's focus on timeline events.Also applies to: 107-107
backend/analytics_server/tests/service/code/sync/test_etl_code_analytics.py (5)
4-8
: Well-structured importsGood job explicitly importing the required enums - this makes dependencies clear and the code more maintainable.
509-528
: Comprehensive test for ready-for-review cycle time calculationThis test properly verifies that the cycle time is calculated from the ready-for-review event timestamp rather than PR creation time, which is a key improvement in this PR.
531-543
: Good fallback test caseExcellent test for the fallback behavior when no ready-for-review event exists. This ensures backward compatibility with PRs that don't have explicit ready-for-review events.
546-574
: Thorough test for multiple ready-for-review eventsThis test correctly verifies that the earliest ready-for-review event is used when multiple events exist. The test intentionally passes events in reverse chronological order, which is a good practice to ensure the implementation is sorting correctly.
577-616
: Comprehensive integration test for all performance metricsThis test effectively validates all performance metrics (cycle time, first review time, rework time, merge time) when both ready-for-review and review events are present. It ensures that all timing calculations work together as expected.
backend/analytics_server/tests/factories/models/exapi/github.py (4)
32-35
: Good data modeling with GitHubUser dataclassAdding this dataclass improves type safety and structure, which aligns with the PR objective of enhancing data modeling for timeline events.
41-41
: Well-structured model updateChanging from a simple string to a structured GitHubUser object improves type safety and readability.
49-49
: Consistent property access updateCorrectly updated to access the login via the user object structure.
59-60
: Clear implementation with helpful commentGood job adding an explanatory comment and updating the implementation to use the new GitHubUser class.
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (1)
357-374
: Preserve timestamp fields indata
or document their removalStripping
submitted_at
/created_at
from the storeddata
dict means the
raw event no longer contains its own timestamp, forcing consumers to look at the
enclosingPullRequestEvent.created_at
. That is fine, but please add a comment
or keep the fields if downstream analytics expect the original GitHub field
names.backend/analytics_server/mhq/exapi/models/timeline.py (2)
64-69
: 🛠️ Refactor suggestionUpdate timestamp conversion logic to handle None value.
With the suggested change to make
created_at
optional, the conversion logic should be updated to handle None values.def __post_init__(self): - if isinstance(self.created_at, str) and self.created_at: + if isinstance(self.created_at, str) and self.created_at and self.created_at is not None: # Convert ISO format with Z (UTC) to datetime + try: self.created_at = datetime.fromisoformat( self.created_at.replace("Z", "+00:00") ) + except ValueError as e: + # Log the error or handle invalid timestamp format + print(f"Warning: Invalid timestamp format {self.created_at}: {e}")Likely an incorrect or invalid review comment.
7-26
:❓ Verification inconclusive
Verify the completeness of the GitHubUser fields.
The GitHubUser class looks well-structured with comprehensive fields. The
user_view_type
field is not a standard GitHub API user object field in my experience. Verify if this is a custom field or a newer addition to the GitHub API.
🏁 Script executed:
#!/bin/bash # Check GitHub API documentation or examples to verify user_view_type field # First, let's check if this field appears in GitHub's API documentation echo "Searching for user_view_type in GitHub API docs" curl -s https://docs.github.com/en/rest/users/users?apiVersion=2022-11-28 | grep -i "user_view_type" # Let's also check if it exists in actual user data from GitHub API echo "Fetching a sample user to check fields" curl -s https://api.github.com/users/octocat | jq 'keys' | grep -i "user_view_type"Length of output: 65872
GitHubUser.user_view_type is an official field—confirm your endpoint usage
We’ve verified that
user_view_type
appears in GitHub’s REST API spec (it’s part of the “Simple User” object returned byGET /users
– see https://docs.github.com/rest/reference/users#list-users). However, neitherGET /users/{username}
norGET /user
includes this field in their actual responses.• If you consume list-users responses (where
user_view_type
is present), it’s safe to keep.
• If you model full user endpoints, remove or markuser_view_type
optional to avoid parsing errors.File: backend/analytics_server/mhq/exapi/models/timeline.py (lines 7–26)
Suggested diff:class GitHubUser: login: str id: int node_id: str avatar_url: str gravatar_id: str url: str html_url: str followers_url: str following_url: str gists_url: str starred_url: str subscriptions_url: str organizations_url: str repos_url: str events_url: str received_events_url: str type: str - user_view_type: str + user_view_type: Optional[str] # only returned by List-Users endpoints site_admin: bool
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py
Outdated
Show resolved
Hide resolved
…al fields and improved type handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/analytics_server/mhq/exapi/types/timeline.py (1)
1-13
: Consider handling potential missing fields in GitHubUserDictThe base user dictionary defines all fields as required, but GitHub API responses might not consistently include all these fields.
-from typing import TypedDict, Dict, Union, Literal +from typing import TypedDict, Dict, Union, Literal +try: + # py≥3.11 + from typing import NotRequired +except ImportError: # py≤3.10 + from typing_extensions import NotRequired class GitHubUserDict(TypedDict): """GitHub user information as a dictionary.""" # Required fields login: str id: int node_id: str - type: str + type: NotRequired[str] # Sometimes might be missing
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
backend/analytics_server/mhq/exapi/github.py
(3 hunks)backend/analytics_server/mhq/exapi/models/timeline.py
(1 hunks)backend/analytics_server/mhq/exapi/types/timeline.py
(1 hunks)backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
(6 hunks)backend/analytics_server/mhq/service/code/sync/etl_github_handler.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/analytics_server/mhq/service/code/sync/etl_code_analytics.py
- backend/analytics_server/mhq/exapi/models/timeline.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: All file linting
🔇 Additional comments (12)
backend/analytics_server/mhq/service/code/sync/etl_github_handler.py (5)
178-181
: Good fix: Now using correct owner login for timeline fetch!This correctly addresses the previous issue where
pr.user.login
(the PR author) was incorrectly used instead of the repository owner, which would have failed with 404 errors for organization repositories.
182-187
: Neat implementation using TimelineEventUtilsGood separation of concerns using the utility class to extract specific event types from the timeline. I like how you're only keeping the earliest ready-for-review event which accurately reflects when the PR first became reviewable.
299-325
: Clear processing of review eventsGood implementation of review event processing with the appropriate skip condition for incomplete reviews.
327-342
: Well-structured ready-for-review event handlingThe conditional processing of the ready-for-review event is clean and follows the same pattern as review events.
347-375
: Good extraction of common event creation logicThe helper method effectively standardizes PullRequestEvent creation, improving maintainability and reducing duplicate code.
backend/analytics_server/mhq/exapi/github.py (4)
128-155
: Well-structured PR timeline fetch methodGood implementation with proper error handling and clear documentation.
195-203
: Great improvements to API request handlingThis implementation properly addresses previous concerns by:
- Adding the required "mockingbird-preview" Accept header for the preview API
- Including a 15-second timeout to prevent hanging
284-294
: Properly handling possible extra fields in GitHub user dataGood use of dataclasses.fields to dynamically filter the input dictionary to only include fields present in the GitHubUser dataclass, preventing TypeError exceptions.
330-340
: Consistent field filtering in both event creation methodsThe same approach is consistently used for both review events and ready-for-review events.
backend/analytics_server/mhq/exapi/types/timeline.py (3)
35-37
: Comprehensive user type definitionGood job combining the required and optional fields for GitHub users into a complete type definition.
40-52
: Well-defined review event structureClear definition of required fields with appropriate types. Using Literal["reviewed"] is a good way to restrict the event type.
96-106
: Clear ready-for-review event definition and union typeGood implementation of the specific event type by extending the base event and using a union type to represent different timeline items.
#634
This PR enhances GitHub analytics by incorporating pull request timeline events like
ready_for_review
andreviewed
to improve the accuracy of PR cycle time metrics.Key Enhancements:
ready_for_review
andreviewed
events to ensure type safety and consistency.get_pr_performance
andget_rework_cycles
to base cycle time and review analysis on actual timeline events instead of PR creation time.ready_for_review
events in PR event analytics.Continuation of PR #636
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Tests