Skip to content

Conversation

@ChenZiHong-Gavin
Copy link
Collaborator

@ChenZiHong-Gavin ChenZiHong-Gavin commented Dec 24, 2025

This pull request enhances the flexibility of output management within the graph processing engine. By introducing an explicit save_output configuration option for each node, the system no longer relies on the topological position (i.e., being a leaf node) to decide which intermediate results are preserved. This change provides developers with more precise control over data persistence, allowing them to specify exactly which node outputs are relevant for downstream use or inspection, thereby streamlining data flow and potentially optimizing resource usage.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ChenZiHong-Gavin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the flexibility of output management within the graph processing engine. By introducing an explicit save_output configuration option for each node, the system no longer relies on the topological position (i.e., being a leaf node) to decide which intermediate results are preserved. This change provides developers with more precise control over data persistence, allowing them to specify exactly which node outputs are relevant for downstream use or inspection, thereby streamlining data flow and potentially optimizing resource usage.

Highlights

  • Explicit Output Saving: Introduced a save_output boolean field in the Node class to explicitly control whether a node's output should be saved, replacing the previous implicit mechanism based on leaf nodes.
  • Configuration Updates: Updated several example YAML configuration files to set save_output: true for their respective nodes, demonstrating the new configuration approach.
  • Engine Logic Refinement: Modified the Engine's execution logic to leverage the new save_output flag for determining which datasets to return, and removed the _find_leaf_nodes utility method.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a save_output configuration flag for nodes in the computation graph. This provides an explicit way to specify which node outputs should be saved, replacing the previous implicit behavior of saving outputs only for leaf nodes. The changes include updating the Node data model, modifying the execution engine to use this new flag, and updating all example configurations to be compatible with the new approach. The changes are well-implemented and make the system's behavior more predictable and configurable. I have one minor suggestion regarding import ordering for better code style.

Comment on lines 8 to 11
import ray
import ray.data
from dotenv import load_dotenv
from ray.data import DataContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to PEP 8, imports within a group should be sorted alphabetically. This improves readability and makes it easier to find imports.

Suggested change
import ray
import ray.data
from dotenv import load_dotenv
from ray.data import DataContext
from dotenv import load_dotenv
import ray
import ray.data
from ray.data import DataContext

@ChenZiHong-Gavin ChenZiHong-Gavin merged commit 60f4d1b into main Dec 24, 2025
4 checks passed
@ChenZiHong-Gavin ChenZiHong-Gavin deleted the save-output-config branch December 24, 2025 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants