-
Notifications
You must be signed in to change notification settings - Fork 63
refactor: refactor baselines #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @ChenZiHong-Gavin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant refactoring effort across multiple baseline methods within the GraphGen project. The core objective is to enhance code maintainability and simplify interfaces by centralizing LLM client management, standardizing data loading mechanisms, and consolidating related functionalities into more cohesive structures. This refactoring reduces boilerplate, improves error handling, and sets the stage for more consistent development of future baseline implementations. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant refactoring of the baseline methods, which greatly improves code structure and maintainability, particularly in the EntiGraph baseline. The introduction of a class-based approach and the simplification of interfaces are excellent changes. However, the goal of unifying data handling logic seems incomplete, as EntiGraph uses a robust data loader while other baselines use a simpler, inconsistent method. There is also a critical bug in wrap.py where it fails to create the output directory before writing results. I've left specific comments on these issues and other areas for improvement.
| with open(args.input_file, "r", encoding="utf-8") as f: | ||
| data = json.load(f) | ||
|
|
||
| results = wrap.generate(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script is missing the logic to create the output directory before saving the results. Unlike the other refactored baseline scripts, this one will raise a FileNotFoundError if the output directory (e.g., cache/data/) does not exist. This is a critical bug. Please add the directory creation logic before writing to the file, like so:
# Save results
output_dir = os.path.dirname(args.output_file)
if output_dir:
os.makedirs(output_dir, exist_ok=True)
with open(args.output_file, "w", encoding="utf-8") as f:
json.dump(results, f, indent=4, ensure_ascii=False)| with open(args.input_file, "r", encoding="utf-8") as f: | ||
| data = json.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data loading logic here is a simple json.load(f), which is inconsistent with the more robust load_and_dedup_data function in baselines/EntiGraph/entigraph.py. To fulfill the PR's goal of unifying data handling, this logic should be extracted into a shared utility and used across all baseline scripts.
| with open(args.input_file, "r", encoding="utf-8") as f: | ||
| data = json.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data loading logic here is a simple json.load(f), which is inconsistent with the more robust load_and_dedup_data function in baselines/EntiGraph/entigraph.py. For consistency and to meet the PR's goal of unifying data handling, a shared, robust data loading utility should be used across all baselines.
| with open(args.input_file, "r", encoding="utf-8") as f: | ||
| data = json.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script uses a simple json.load(f) for data loading, which is inconsistent with the more robust load_and_dedup_data function implemented in baselines/EntiGraph/entigraph.py. To truly unify data handling as stated in the PR description, this logic should be centralized in a shared utility and used here as well.
| with open(args.input_file, "r", encoding="utf-8") as f: | ||
| data = json.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| for i, entity_i in enumerate(entities): | ||
| for j in range(i + 1, len(entities)): | ||
| temp_pairs.append((doc_text, entity_i, entities[j])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
baselines/EntiGraph/entigraph.py
Outdated
| if output_dir and not os.path.exists(output_dir): | ||
| os.makedirs(output_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
This pull request refactors multiple baseline methods in the GraphGen project to simplify their interfaces, unify data handling logic, and improve code maintainability.