🧪 [testing improvement] Add tests for extract_entities_rule_based#111
🧪 [testing improvement] Add tests for extract_entities_rule_based#111wjohns989 wants to merge 1 commit into
Conversation
Add new unit tests `TestExtractEntitiesRuleBased` to cover the previously untested `extract_entities_rule_based` function. The tests verify positive and negative paths for email extraction logic. Co-authored-by: wjohns989 <56205870+wjohns989@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Code Review
This pull request introduces unit tests for the rule-based entity extraction logic, specifically covering email extraction and negative test scenarios. The reviewer suggested improving test precision by using exact equality assertions for entity counts and verifying that the entire entity list is empty in negative cases to ensure no unintended matches occur.
| assert "support@company.co.uk" in entity_names | ||
|
|
||
| person_entities = [e for e in entities if e.entity_type == "person"] | ||
| assert len(person_entities) >= 2 |
There was a problem hiding this comment.
The assertion assert len(person_entities) >= 2 is imprecise for a deterministic rule-based extractor. Since the input text contains exactly two email addresses and no other matching patterns, using == 2 ensures that the extraction logic is performing exactly as expected without unintended matches.
| assert len(person_entities) >= 2 | |
| assert len(person_entities) == 2 |
| person_entities = [e for e in entities if e.entity_type == "person"] | ||
| assert len(person_entities) == 0 |
There was a problem hiding this comment.
In a negative test case where no matches are expected, it is more robust to assert that the entire entities list is empty. This ensures that no other entity types (such as 'tech' or 'file') were incorrectly extracted from the plain text.
| person_entities = [e for e in entities if e.entity_type == "person"] | |
| assert len(person_entities) == 0 | |
| assert len(entities) == 0 |
🎯 What: The testing gap addressed
The
extract_entities_rule_basedfunction lacked test coverage. This PR adds tests specifically targeting the deterministic string processing behavior of the function, prioritizing the email extraction logic visible in the snippet context.📊 Coverage: What scenarios are now tested
test_extracts_emailsto ensure valid email addresses are correctly identified and extracted aspersonentities.test_no_extraction_when_no_matchto verify that text containing no valid email patterns gracefully returns no entity extraction.✨ Result: The improvement in test coverage
The rule-based extraction system is now explicitly verified via unit tests, improving overall robustness and providing a safety net against future refactors to the regex logic. All tests pass successfully across the suite.
PR created automatically by Jules for task 4092336835865912771 started by @wjohns989