-
Notifications
You must be signed in to change notification settings - Fork 3
Longest transcript #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Longest transcript #100
Conversation
|
I asked claude-code to review this. Here is the response: Code Review: PR #100 - Longest Transcript Overview This PR introduces functionality to filter FASTA files for the longest transcript per gene, adds a plugin to update ortholog group member counts, and includes several bug fixes:
Critical Issues
Location: Load/plugin/perl/AddNumberOfMembers.pm:80 og.core_peripheral = 'peripheral' This line starts with a SQL fragment but has no context. The code appears to be truncated or has a copy-paste error. The getUnfinishedOrthologGroups() method is called but never Impact: This plugin will fail to compile and run.
Location: Load/plugin/perl/AddNumberOfMembers.pm:146-148 my $sql = "UPDATE apidb.OrthologGroup SET number_of_members = -1"; Three my $sql declarations overwrite each other. Only the last one will execute during undo. Recommendation: Execute all three statements: Code Quality Issues
Location: Load/bin/filterForLongestTranscript:33 The file move operation could fail partway through, leaving the directory in an inconsistent state: Recommendation: Consider creating backups before overwriting or implementing atomic operations.
Location: Load/bin/filterForLongestTranscript:72 my ($gene_id) = $header =~ /gene=([^\s]+)/; The script assumes all FASTA headers contain gene= which is acknowledged in the comment but fragile. If the format changes, all files will be silently skipped with only warnings. Recommendation: Make the regex pattern configurable via command-line argument with gene= as default.
Location: Load/bin/filterForLongestTranscript:42-56 The entire FASTA file is loaded into memory via the %genes hash. For very large files with many genes, this could consume significant memory. Recommendation: Document memory requirements or consider streaming approaches for very large datasets. Improvements in InsertPhylogeneticProfile.pm ✅
Location: Load/plugin/perl/InsertPhylogeneticProfile.pm:94-100 The change from parsing protein IDs with regex to using database lookups is a significant improvement: OLD: my ($taxonCode, $sourceId) = split(/|/, $member);NEW: Preload mapping from databasemy $taxonCode = $proteinIdToOrthomclAbbrev{$member}; Benefits:
The indentation fixes improve readability and follow Perl conventions. Security Considerations
Location: Load/bin/filterForLongestTranscript:22-24 While using File::Spec->catfile() is good practice, there's no validation that files in the directory are actually owned by the expected user or haven't been symlinked. Risk Level: Low (assuming trusted environment) Testing Gaps
The PR adds:
Recommendation: Add test cases for:
Style & Convention
Location: Load/plugin/perl/InsertPhylogeneticProfile.pm:82 while (my ($orthomclAbbrev) = $sth->fetchrow_array()) { The file mixes tabs and spaces. Should follow project conventions consistently.
The new plugin follows VEuPathDB conventions with proper documentation hash structure, though the purpose and notes are empty placeholders. Recommendation: Fill in documentation: Summary Must Fix Before Merge ❌
Should Fix
Good Changes ✅
Recommendation: Request changes - the AddNumberOfMembers.pm plugin has critical errors that will prevent it from running. The other changes look good once these are fixed. |
No description provided.