Skip to content

🧪 [testing improvement] Add tests for extract_entities_rule_based#111

Open
wjohns989 wants to merge 1 commit into
mainfrom
test/extract-entities-rule-based-4092336835865912771
Open

🧪 [testing improvement] Add tests for extract_entities_rule_based#111
wjohns989 wants to merge 1 commit into
mainfrom
test/extract-entities-rule-based-4092336835865912771

Conversation

@wjohns989

Copy link
Copy Markdown
Owner

🎯 What: The testing gap addressed
The extract_entities_rule_based function lacked test coverage. This PR adds tests specifically targeting the deterministic string processing behavior of the function, prioritizing the email extraction logic visible in the snippet context.

📊 Coverage: What scenarios are now tested

  • Added test_extracts_emails to ensure valid email addresses are correctly identified and extracted as person entities.
  • Added test_no_extraction_when_no_match to verify that text containing no valid email patterns gracefully returns no entity extraction.

Result: The improvement in test coverage
The rule-based extraction system is now explicitly verified via unit tests, improving overall robustness and providing a safety net against future refactors to the regex logic. All tests pass successfully across the suite.


PR created automatically by Jules for task 4092336835865912771 started by @wjohns989

Add new unit tests `TestExtractEntitiesRuleBased` to cover the previously
untested `extract_entities_rule_based` function. The tests verify positive
and negative paths for email extraction logic.

Co-authored-by: wjohns989 <56205870+wjohns989@users.noreply.github.com>
@google-labs-jules

Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces unit tests for the rule-based entity extraction logic, specifically covering email extraction and negative test scenarios. The reviewer suggested improving test precision by using exact equality assertions for entity counts and verifying that the entire entity list is empty in negative cases to ensure no unintended matches occur.

assert "support@company.co.uk" in entity_names

person_entities = [e for e in entities if e.entity_type == "person"]
assert len(person_entities) >= 2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertion assert len(person_entities) >= 2 is imprecise for a deterministic rule-based extractor. Since the input text contains exactly two email addresses and no other matching patterns, using == 2 ensures that the extraction logic is performing exactly as expected without unintended matches.

Suggested change
assert len(person_entities) >= 2
assert len(person_entities) == 2

Comment on lines +96 to +97
person_entities = [e for e in entities if e.entity_type == "person"]
assert len(person_entities) == 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In a negative test case where no matches are expected, it is more robust to assert that the entire entities list is empty. This ensures that no other entity types (such as 'tech' or 'file') were incorrectly extracted from the plain text.

Suggested change
person_entities = [e for e in entities if e.entity_type == "person"]
assert len(person_entities) == 0
assert len(entities) == 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant