-
Notifications
You must be signed in to change notification settings - Fork 67
[Feature] Integrated seroski-dupbot for duplicate detection in github issues #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a comprehensive duplicate issue detection system called "seroski-dupbot" that automatically identifies and manages duplicate GitHub issues using machine learning embeddings and similarity scoring. The system provides three-tier behavior based on similarity scores: unique issues (< 0.55), potentially related issues (0.55-0.84), and clear duplicates (≥ 0.85) which are auto-closed.
Key Changes:
- Automated duplicate detection workflow that triggers on issue creation, editing, and closure
- Vector database integration using Pinecone for storing and querying issue embeddings
- Database management utilities for population, cleanup, and validation operations
Reviewed Changes
Copilot reviewed 12 out of 14 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| package.json | Defines Node.js dependencies for GitHub API, vector database, and embedding services |
| .github/workflows/duplicate-issue.yml | Main workflow for duplicate detection with cleanup jobs for closed issues |
| .github/workflows/database-operations.yml | Administrative workflow for database management operations |
| .github/workflows/api-validation.yml | Validation workflow to test API connectivity before operations |
| .github/scripts/validate-apis.js | API validation script testing connections to all required services |
| .github/scripts/populate-existing-issues.js | Script to populate vector database with existing repository issues |
| .github/scripts/debug-pinecone.js | Debugging utility for inspecting Pinecone database state |
| .github/scripts/clear-all-vectors.js | Destructive operation script to clear all vectors from database |
| .github/scripts/cleanup-specific-issue.js | Utility to remove specific issue vectors from database |
| .github/scripts/cleanup-duplicates.js | Script to clean up duplicate vectors in the database |
| .github/scripts/cleanup-closed-issue.js | Automated cleanup script for removing closed issue vectors |
| .github/scripts/check-duplicates.js | Core duplicate detection logic with three-tier similarity analysis |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v3 |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using actions/checkout@v3 is deprecated. Consider upgrading to actions/checkout@v4 for better performance and security updates.
| uses: actions/checkout@v3 | ||
|
|
||
| - name: Setup Node.js | ||
| uses: actions/setup-node@v3 |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using actions/setup-node@v3 is deprecated. Consider upgrading to actions/setup-node@v4 for better performance and security updates.
| uses: actions/checkout@v3 | ||
|
|
||
| - name: Setup Node.js | ||
| uses: actions/setup-node@v3 |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using actions/setup-node@v3 is deprecated. Consider upgrading to actions/setup-node@v4 for better performance and security updates.
| uses: actions/setup-node@v3 | |
| uses: actions/setup-node@v4 |
|
|
||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v3 |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using actions/checkout@v3 is deprecated. Consider upgrading to actions/checkout@v4 for better performance and security updates.
| uses: actions/checkout@v3 | |
| uses: actions/checkout@v4 |
| uses: actions/checkout@v3 | ||
|
|
||
| - name: Setup Node.js | ||
| uses: actions/setup-node@v3 |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using actions/setup-node@v3 is deprecated. Consider upgrading to actions/setup-node@v4 for better performance and security updates.
| uses: actions/setup-node@v3 | |
| uses: actions/setup-node@v4 |
.github/workflows/api-validation.yml
Outdated
|
|
||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v3 |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using actions/checkout@v3 is deprecated. Consider upgrading to actions/checkout@v4 for better performance and security updates.
| uses: actions/checkout@v3 | |
| uses: actions/checkout@v4 |
.github/workflows/api-validation.yml
Outdated
| uses: actions/checkout@v3 | ||
|
|
||
| - name: Setup Node.js | ||
| uses: actions/setup-node@v3 |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using actions/setup-node@v3 is deprecated. Consider upgrading to actions/setup-node@v4 for better performance and security updates.
| uses: actions/setup-node@v3 | |
| uses: actions/setup-node@v4 |
| repo: REPO, | ||
| issue_number: ISSUE_NUMBER, | ||
| state: 'closed', | ||
| state_reason: 'duplicate' |
Copilot
AI
Oct 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The state_reason 'duplicate' may not be supported by the GitHub API. The valid state_reason values are typically 'completed' or 'not_planned'. Consider using 'not_planned' for duplicate issues.
| state_reason: 'duplicate' | |
| state_reason: 'not_planned' |
|
@yep-yogesh , please review this |
📌 Problem
Repositories often get cluttered with duplicate issues, making it difficult for maintainers to manage and causing wasted effort for contributors. Manual duplicate detection is slow, inconsistent, and error-prone.
💡 Solution
This PR integrates seroski-dupbot, a bot that automatically detects duplicate issues using embeddings and similarity scoring.
Behavior based on similarity score:
duplicate, and auto-closes it.🧩 Benefits
Impact: Provides a scalable, automated solution to handle duplicate issues across repositories, ensuring cleaner and more manageable issue tracking.
Closes #140.