Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flytekit: Rename map_task to map, replace min_successes and min_success_ratio with tolerance, rename max_parallelism to concurrency #3107

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ChihTsungLu
Copy link

@ChihTsungLu ChihTsungLu commented Feb 4, 2025

Tracking issue

Related to flyteorg/flyte#6139

Why are the changes needed?

The current Flytekit has several areas that could be improved for a better developer experience:

  1. The map_task name is unnecessarily verbose when imported via the recommended import flytekit as fl
  2. The failure tolerance parameters (min_successes and min_success_ratio) are powerful but overly verbose
  3. The max_parallelism parameter naming in workflow and LaunchPlan needs to be aligned with map_task's concurrency parameter

What changes were proposed in this pull request?

  1. Rename map_task to map

    • While this conflicts with Python's built-in map, it's acceptable since we recommend using import flytekit as fl
    • All changes will maintain backwards compatibility
  2. Simplify failure tolerance parameters

    • Deprecate min_successes and min_success_ratio
    • Introduce new tolerance parameter that accepts both float and int types
    • Maintain backwards compatibility with existing parameters
  3. Standardize parallelism parameter

    • Deprecate max_parallelism argument in workflow and LaunchPlan
    • Introduce new concurrency parameter to match map_task's parameter
    • Maintain backwards compatibility with existing parameter

Known issue

The changes introduce the concurrency field in Flytekit, which is not currently defined in flyteidl's LaunchPlanSpec

<img width="1561" alt="valueError" src="https://github.com/user-attachments/assets/e794e7d0-6393-4009-a320-988fdd1769cb" />

Code to Address the Issue:
The following code handles the transition between the concurrency and max_parallelism fields:

    @classmethod
    def from_flyte_idl(cls, pb2):
        """
        :param flyteidl.admin.launch_plan_pb2.LaunchPlanSpec pb2:
        :rtype: LaunchPlanSpec
        """

        auth_role = None
        # First check the newer field, auth_role.
        if pb2.auth_role is not None and (pb2.auth_role.assumable_iam_role or pb2.auth_role.kubernetes_service_account):
            auth_role = _common.AuthRole.from_flyte_idl(pb2.auth_role)
        # Fallback to the deprecated field.
        elif pb2.auth is not None:
            if pb2.auth.assumable_iam_role:
                auth_role = _common.AuthRole(assumable_iam_role=pb2.auth.assumable_iam_role)
            else:
                auth_role = _common.AuthRole(assumable_iam_role=pb2.auth.kubernetes_service_account)

        # Handle concurrency/max_parallelism transition
        concurrency = None
        max_parallelism = None

        if hasattr(pb2, "concurrency"):
            try:
                if pb2.HasField("concurrency"):
                    concurrency = pb2.concurrency
            except ValueError:
                pass  # Field doesn't exist in protobuf yet

        # Fallback to max_parallelism (deprecated field)
        if hasattr(pb2, "max_parallelism"):
            max_parallelism = pb2.max_parallelism

        # Use concurrency if available, otherwise use max_parallelism
        final_concurrency = concurrency if concurrency is not None else max_parallelism

        return cls(
            workflow_id=_identifier.Identifier.from_flyte_idl(pb2.workflow_id),
            entity_metadata=LaunchPlanMetadata.from_flyte_idl(pb2.entity_metadata),
            default_inputs=_interface.ParameterMap.from_flyte_idl(pb2.default_inputs),
            fixed_inputs=_literals.LiteralMap.from_flyte_idl(pb2.fixed_inputs),
            labels=_common.Labels.from_flyte_idl(pb2.labels),
            annotations=_common.Annotations.from_flyte_idl(pb2.annotations),
            auth_role=auth_role,
            raw_output_data_config=_common.RawOutputDataConfig.from_flyte_idl(pb2.raw_output_data_config),
            concurrency=final_concurrency,
            max_parallelism=pb2.max_parallelism,
            security_context=security.SecurityContext.from_flyte_idl(pb2.security_context)
            if pb2.security_context
            else None,
            overwrite_cache=pb2.overwrite_cache if pb2.overwrite_cache else None,
        )

How was this patch tested?

Ran tests with the command: make test

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Summary by Bito

This PR implements comprehensive API improvements in Flytekit, including renaming map_task to map, introducing tolerance parameters, and standardizing parallelism control. It enhances agent module support in CLI, fixes output formatting in AWS SageMaker and OpenAI plugins, expands test coverage, and improves Ray plugin configuration. The changes maintain backward compatibility through deprecation warnings while providing a cleaner, more consistent API with better error handling and documentation.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 5

@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 4, 2025

Code Review Agent Run #d47fe6

Actionable Suggestions - 13
  • tests/flytekit/unit/types/directory/test_listdir.py - 2
    • Consider implications of map vs map_task · Line 4-4
    • Consider using map_task for workflow operations · Line 29-29
  • plugins/flytekit-papermill/tests/test_task.py - 1
    • Consider using map_task for notebook tasks · Line 417-417
  • flytekit/__init__.py - 1
    • Consider maintaining backward compatibility for imports · Line 222-222
  • flytekit/core/array_node_map_task.py - 1
    • Consider keeping descriptive function name · Line 373-373
  • tests/flytekit/unit/core/test_array_node_map_task.py - 8
Additional Suggestions - 10
  • flytekit/core/options.py - 3
    • Consider adding concurrency parameter validation · Line 26-27
    • Consider adding validation for concurrency parameter · Line 38-38
    • Consider using @deprecated decorator instead · Line 43-66
  • tests/flytekit/unit/core/test_array_node_map_task.py - 2
  • tests/flytekit/integration/remote/workflows/basic/array_map.py - 1
    • Consider potential naming confusion with map · Line 4-4
  • tests/flytekit/unit/core/test_array_node.py - 1
  • flytekit/models/launch_plan.py - 2
    • Consider validating concurrency value before use · Line 277-277
    • Consider simplifying concurrency handling logic · Line 301-318
  • flytekit/tools/translator.py - 1
    • Consider consolidating duplicate warning logic · Line 355-382
Review Details
  • Files reviewed - 24 · Commit Range: 87dfe2f..d8e5d4b
    • flytekit/__init__.py
    • flytekit/clis/sdk_in_container/run.py
    • flytekit/core/array_node_map_task.py
    • flytekit/core/launch_plan.py
    • flytekit/core/options.py
    • flytekit/models/execution.py
    • flytekit/models/launch_plan.py
    • flytekit/remote/entities.py
    • flytekit/remote/remote.py
    • flytekit/tools/translator.py
    • plugins/flytekit-k8s-pod/tests/test_pod.py
    • plugins/flytekit-papermill/tests/test_task.py
    • tests/flytekit/integration/remote/workflows/basic/array_map.py
    • tests/flytekit/integration/remote/workflows/basic/pydantic_wf.py
    • tests/flytekit/unit/core/test_array_node.py
    • tests/flytekit/unit/core/test_array_node_map_task.py
    • tests/flytekit/unit/core/test_artifacts.py
    • tests/flytekit/unit/core/test_interface.py
    • tests/flytekit/unit/core/test_launch_plan.py
    • tests/flytekit/unit/core/test_node_creation.py
    • tests/flytekit/unit/core/test_partials.py
    • tests/flytekit/unit/core/test_type_hints.py
    • tests/flytekit/unit/remote/test_remote.py
    • tests/flytekit/unit/types/directory/test_listdir.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 4, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Feature Improvement - API Naming and Parameter Standardization

test_artifacts.py - Updated imports and usage from map_task to map

test_interface.py - Updated test cases to use map instead of map_task

test_launch_plan.py - Replaced max_parallelism with concurrency in launch plan tests

test_node_creation.py - Updated map_task references to map in node creation tests

test_partials.py - Updated import statement to use map instead of map_task

test_type_hints.py - Modified imports and usage to use map instead of map_task

test_remote.py - Updated imports and usage of map_task to map in remote tests

test_listdir.py - Updated directory tests to use map instead of map_task

@@ -1,7 +1,7 @@
import tempfile
from pathlib import Path

from flytekit import FlyteDirectory, FlyteFile, map_task, task, workflow
from flytekit import FlyteDirectory, FlyteFile, map, task, workflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider implications of map vs map_task

Consider if replacing map_task with map is intentional as they might have different functionality in the Flyte framework. map_task is typically used for task parallelization while map might have different semantics.

Code suggestion
Check the AI-generated fix before applying
Suggested change
from flytekit import FlyteDirectory, FlyteFile, map, task, workflow
from flytekit import FlyteDirectory, FlyteFile, map_task, task, workflow

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -26,6 +26,6 @@ def list_dir(dir: FlyteDirectory) -> list[FlyteFile]:
def wf() -> list[str]:
tmpdir = setup()
files = list_dir(dir=tmpdir)
return map_task(read_file)(file=files)
return map(read_file)(file=files)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using map_task for workflow operations

Consider using map_task instead of map for task mapping operations in Flytekit workflows. The map function may not provide the same task-level parallelization and execution guarantees as map_task.

Code suggestion
Check the AI-generated fix before applying
Suggested change
return map(read_file)(file=files)
return map_task(read_file)(file=files)

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -414,7 +414,7 @@ def create_sd() -> StructuredDataset:
def test_map_over_notebook_task():
@workflow
def wf(a: float) -> typing.List[float]:
return map_task(nb_sub_task)(a=[a, a])
return map(nb_sub_task)(a=[a, a])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using map_task for notebook tasks

Consider using map_task instead of map for mapping over notebook tasks. The map function may not handle notebook task specific requirements correctly.

Code suggestion
Check the AI-generated fix before applying
Suggested change
return map(nb_sub_task)(a=[a, a])
return map_task(nb_sub_task)(a=[a, a])

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

from flytekit._version import __version__
from flytekit.configuration import Config
from flytekit.core.array_node_map_task import map_task
from flytekit.core.array_node_map_task import map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider maintaining backward compatibility for imports

Consider keeping both map_task and map imports to maintain backward compatibility. The alias is defined later but importing directly as map may break existing code that uses map_task.

Code suggestion
Check the AI-generated fix before applying
Suggested change
from flytekit.core.array_node_map_task import map
from flytekit.core.array_node_map_task import map_task

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -369,11 +370,12 @@ def _raw_execute(self, **kwargs) -> Any:
return outputs


def map_task(
def map(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider keeping descriptive function name

Consider keeping the original function name map_task instead of renaming to map as it could conflict with Python's built-in map function and cause confusion. The original name was more descriptive of the function's purpose.

Code suggestion
Check the AI-generated fix before applying
Suggested change
def map(
def map_task(

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -63,7 +63,7 @@ def say_hello(name: str) -> str:

@workflow
def wf() -> List[str]:
return map_task(say_hello)(name=["abc", "def"])
return map(say_hello)(name=["abc", "def"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map task function call change

Consider if using map() instead of map_task() is intentional as it changes the behavior from using Flyte's map task functionality to Python's built-in map().

Code suggestion
Check the AI-generated fix before applying
Suggested change
return map(say_hello)(name=["abc", "def"])
return map_task(say_hello)(name=["abc", "def"])

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -575,7 +575,7 @@ def say_hello(name: str) -> str:
for index, map_input_str in enumerate(list_strs):
monkeypatch.setenv("BATCH_JOB_ARRAY_INDEX_VAR_NAME", "name")
monkeypatch.setenv("name", str(index))
t = map_task(say_hello)
t = map(say_hello)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential task mapping behavior change

Consider if using map() instead of map_task() is intentional as this could change the behavior of task mapping functionality.

Code suggestion
Check the AI-generated fix before applying
Suggested change
t = map(say_hello)
t = map_task(say_hello)

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -410,7 +410,7 @@ def test_serialization_metadata(serialization_settings):
def t1(a: int) -> int:
return a + 1

arraynode_maptask = map_task(t1, metadata=TaskMetadata(retries=2))
arraynode_maptask = map(t1, metadata=TaskMetadata(retries=2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function rename may affect compatibility

Consider if changing from map_task to map could impact backward compatibility. The function name change from map_task to map may affect existing code that imports and uses the original function name.

Code suggestion
Check the AI-generated fix before applying
Suggested change
arraynode_maptask = map(t1, metadata=TaskMetadata(retries=2))
# Maintain both for backward compatibility
arraynode_maptask = map_task(t1, metadata=TaskMetadata(retries=2))

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +229 to +230
t1 = map(say_hello, **kwargs1)
t2 = map(say_hello, **kwargs2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify intended function call change

Consider if replacing map_task with map is intentional as this changes the function being called which could affect functionality. The map_task decorator appears to be imported but not used after this change.

Code suggestion
Check the AI-generated fix before applying
Suggested change
t1 = map(say_hello, **kwargs1)
t2 = map(say_hello, **kwargs2)
t1 = map_task(say_hello, **kwargs1)
t2 = map_task(say_hello, **kwargs2)

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -316,7 +316,7 @@ def test_bounded_inputs_vars_order(serialization_settings):
def task1(a: int, b: float, c: str) -> str:
return f"{a} - {b} - {c}"

mt = map_task(functools.partial(task1, c=1.0, b="hello", a=1))
mt = map(functools.partial(task1, c=1.0, b="hello", a=1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using map_task instead of map

Consider using map_task() instead of map() as it appears to be the intended function based on the test context and imports. Using map() could lead to unexpected behavior since it's a built-in Python function.

Code suggestion
Check the AI-generated fix before applying
Suggested change
mt = map(functools.partial(task1, c=1.0, b="hello", a=1))
mt = map_task(functools.partial(task1, c=1.0, b="hello", a=1))

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -492,7 +492,7 @@ def test_supported_node_type():
def test_task():
...

map_task(test_task)
map(test_task)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using map_task instead of map

The function call has been changed from map_task(test_task) to map(test_task). This could potentially cause confusion with Python's built-in map() function. Consider using the imported map_task decorator/function to maintain clarity and avoid potential naming conflicts.

Code suggestion
Check the AI-generated fix before applying
Suggested change
map(test_task)
map_task(test_task)

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -533,7 +533,7 @@ def consume_directories(dirs: List[FlyteDirectory]):
for path_info, other_info in d.crawl():
print(path_info)

mt = map_task(generate_directory, min_success_ratio=0.1)
mt = map(generate_directory, min_success_ratio=0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verify map function usage intention

Consider if using map() instead of map_task() is intentional as it may change the expected behavior. The map_task() function is typically used for array node map tasks in Flytekit.

Code suggestion
Check the AI-generated fix before applying
Suggested change
mt = map(generate_directory, min_success_ratio=0.1)
mt = map_task(generate_directory, min_success_ratio=0.1)

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -575,7 +575,7 @@ def say_hello(name: str) -> str:
for index, map_input_str in enumerate(list_strs):
monkeypatch.setenv("BATCH_JOB_ARRAY_INDEX_VAR_NAME", "name")
monkeypatch.setenv("name", str(index))
t = map_task(say_hello)
t = map(say_hello)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using map_task instead of map

Consider using map_task instead of map as it appears to be the intended decorator based on the imports and test context. The map function could be confused with Python's built-in map function.

Code suggestion
Check the AI-generated fix before applying
Suggested change
t = map(say_hello)
t = map_task(say_hello)

Code Review Run #d47fe6


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 5, 2025

Code Review Agent Run #99b31d

Actionable Suggestions - 8
  • tests/flytekit/unit/core/test_array_node_map_task.py - 4
  • flytekit/remote/remote.py - 1
  • tests/flytekit/unit/core/test_node_creation.py - 1
    • Consider using map_task for workflow testing · Line 276-276
  • tests/flytekit/unit/remote/test_remote.py - 1
  • flytekit/core/launch_plan.py - 1
Additional Suggestions - 10
  • flytekit/core/options.py - 4
    • Consider adding concurrency parameter validation · Line 26-27
    • Consider adding validation for concurrency parameter · Line 38-38
    • Consider using standard deprecation decorator pattern · Line 43-66
    • Consider consolidating duplicate warning message · Line 48-64
  • flytekit/models/execution.py - 1
    • Consider adding property setter for deprecation · Line 290-302
  • flytekit/clis/sdk_in_container/run.py - 1
    • Consider updating deprecated parameter name · Line 529-529
  • tests/flytekit/unit/core/test_node_creation.py - 1
  • tests/flytekit/unit/core/test_array_node_map_task.py - 3
Review Details
  • Files reviewed - 24 · Commit Range: 87dfe2f..09755a2
    • flytekit/__init__.py
    • flytekit/clis/sdk_in_container/run.py
    • flytekit/core/array_node_map_task.py
    • flytekit/core/launch_plan.py
    • flytekit/core/options.py
    • flytekit/models/execution.py
    • flytekit/models/launch_plan.py
    • flytekit/remote/entities.py
    • flytekit/remote/remote.py
    • flytekit/tools/translator.py
    • plugins/flytekit-k8s-pod/tests/test_pod.py
    • plugins/flytekit-papermill/tests/test_task.py
    • tests/flytekit/integration/remote/workflows/basic/array_map.py
    • tests/flytekit/integration/remote/workflows/basic/pydantic_wf.py
    • tests/flytekit/unit/core/test_array_node.py
    • tests/flytekit/unit/core/test_array_node_map_task.py
    • tests/flytekit/unit/core/test_artifacts.py
    • tests/flytekit/unit/core/test_interface.py
    • tests/flytekit/unit/core/test_launch_plan.py
    • tests/flytekit/unit/core/test_node_creation.py
    • tests/flytekit/unit/core/test_partials.py
    • tests/flytekit/unit/core/test_type_hints.py
    • tests/flytekit/unit/remote/test_remote.py
    • tests/flytekit/unit/types/directory/test_listdir.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@@ -315,7 +316,7 @@ def test_bounded_inputs_vars_order(serialization_settings):
def task1(a: int, b: float, c: str) -> str:
return f"{a} - {b} - {c}"

mt = map_task(functools.partial(task1, c=1.0, b="hello", a=1))
mt = map(functools.partial(task1, c=1.0, b="hello", a=1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter type mismatch in task call

The function call parameters c=1.0, b="hello", a=1 appear to have mismatched types with the task definition. The task expects a: int, b: float, c: str but receives c as float, b as string, and a as int. Consider adjusting the parameter types to match the task signature.

Code suggestion
Check the AI-generated fix before applying
Suggested change
mt = map(functools.partial(task1, c=1.0, b="hello", a=1))
mt = map(functools.partial(task1, c="1.0", b=1.0, a=1))

Code Review Run #99b31d


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -1551,7 +1551,7 @@ def _execute(
annotations=options.annotations,
raw_output_data_config=options.raw_output_data_config,
auth_role=None,
max_parallelism=options.max_parallelism,
concurrency=options.concurrency,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter rename may break compatibility

Consider verifying if renaming max_parallelism to concurrency maintains backward compatibility. This change could potentially break existing code that relies on the max_parallelism parameter.

Code suggestion
Check the AI-generated fix before applying
Suggested change
concurrency=options.concurrency,
concurrency=options.max_parallelism if hasattr(options, 'max_parallelism')
else options.concurrency,
# TODO: Remove max_parallelism support in next major version
# Deprecated in favor of concurrency parameter

Code Review Run #99b31d


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -273,7 +273,7 @@ def t1(a: str) -> str:

@workflow
def my_wf(a: typing.List[str]) -> typing.List[str]:
mappy = map_task(t1)
mappy = map(t1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using map_task for workflow testing

Consider using map_task instead of map as it appears to be the intended function based on the test context. The map function may not provide the same task mapping functionality needed for workflow testing.

Code suggestion
Check the AI-generated fix before applying
Suggested change
mappy = map(t1)
mappy = map_task(t1)

Code Review Run #99b31d


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -726,7 +726,7 @@ def t1(x: int, y: int) -> int:

@workflow
def w() -> int:
return map_task(partial(t1, y=2))(x=[1, 2, 3])
return map(partial(t1, y=2))(x=[1, 2, 3])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using map_task for consistency

Consider using map_task instead of map as it appears to be testing map task functionality based on the test name and context.

Code suggestion
Check the AI-generated fix before applying
Suggested change
return map(partial(t1, y=2))(x=[1, 2, 3])
return map_task(partial(t1, y=2))(x=[1, 2, 3])

Code Review Run #99b31d


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +307 to +309
m1 = map(functools.partial(task1, c=param_c))(a=param_a, b=param_b)
m2 = map(functools.partial(task2, c=param_c))(a=param_a, b=param_b)
m3 = map(functools.partial(task3, c=param_c))(a=param_a, b=param_b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using map_task for array testing

Consider using map_task instead of map for consistency with the test name and module being tested (test_array_node_map_task.py). The test appears to be validating array node map task functionality.

Code suggestion
Check the AI-generated fix before applying
Suggested change
m1 = map(functools.partial(task1, c=param_c))(a=param_a, b=param_b)
m2 = map(functools.partial(task2, c=param_c))(a=param_a, b=param_b)
m3 = map(functools.partial(task3, c=param_c))(a=param_a, b=param_b)
m1 = map_task(functools.partial(task1, c=param_c))(a=param_a, b=param_b)
m2 = map_task(functools.partial(task2, c=param_c))(a=param_a, b=param_b)
m3 = map_task(functools.partial(task3, c=param_c))(a=param_a, b=param_b)

Code Review Run #99b31d


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

- Rename map_task to map for simpler API
- Replace min_successes/min_success_ratio with tolerance parameter
- Rename max_parallelism to concurrency for consistency

Signed-off-by: Chih Tsung Lu <[email protected]>
Signed-off-by: Chih Tsung Lu <[email protected]>
@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 17, 2025

Code Review Agent Run #cbd7b1

Actionable Suggestions - 7
  • tests/flytekit/unit/remote/test_remote.py - 1
    • Consider using map_task instead of map · Line 20-20
  • flytekit/__init__.py - 1
    • Consider deprecation notice for map_task rename · Line 225-225
  • flytekit/core/options.py - 2
    • Consider adding constructor deprecation warning · Line 38-38
    • Consider using standard deprecation decorator pattern · Line 43-65
  • flytekit/clis/sdk_in_container/run.py - 1
    • Consider removing deprecated max-parallelism option · Line 269-277
  • tests/flytekit/unit/core/test_interface.py - 1
  • tests/flytekit/unit/core/test_array_node_map_task.py - 1
    • Type mismatch in task1 function parameters · Line 318-318
Additional Suggestions - 10
  • flytekit/clis/sdk_in_container/run.py - 1
    • Consider removing deprecated max-parallelism option · Line 269-277
  • flytekit/models/launch_plan.py - 2
    • Consider validating concurrency value before use · Line 277-277
    • Consider extracting concurrency handling logic · Line 301-318
  • flytekit/tools/translator.py - 1
    • Consider consolidating duplicate warning code blocks · Line 357-384
  • tests/flytekit/unit/core/test_node_creation.py - 1
    • Consider using more specific function name · Line 248-248
  • plugins/flytekit-k8s-pod/tests/test_pod.py - 1
    • Consider using more explicit map_task function · Line 331-331
  • tests/flytekit/unit/core/test_type_hints.py - 2
  • tests/flytekit/unit/core/test_artifacts.py - 2
    • Consider maintaining backward compatibility with imports · Line 15-15
    • Consider using more specific map_task function · Line 582-582
Review Details
  • Files reviewed - 24 · Commit Range: 2c2d41d..d6c460c
    • flytekit/__init__.py
    • flytekit/clis/sdk_in_container/run.py
    • flytekit/core/array_node_map_task.py
    • flytekit/core/launch_plan.py
    • flytekit/core/options.py
    • flytekit/models/execution.py
    • flytekit/models/launch_plan.py
    • flytekit/remote/entities.py
    • flytekit/remote/remote.py
    • flytekit/tools/translator.py
    • plugins/flytekit-k8s-pod/tests/test_pod.py
    • plugins/flytekit-papermill/tests/test_task.py
    • tests/flytekit/integration/remote/workflows/basic/array_map.py
    • tests/flytekit/integration/remote/workflows/basic/pydantic_wf.py
    • tests/flytekit/unit/core/test_array_node.py
    • tests/flytekit/unit/core/test_array_node_map_task.py
    • tests/flytekit/unit/core/test_artifacts.py
    • tests/flytekit/unit/core/test_interface.py
    • tests/flytekit/unit/core/test_launch_plan.py
    • tests/flytekit/unit/core/test_node_creation.py
    • tests/flytekit/unit/core/test_partials.py
    • tests/flytekit/unit/core/test_type_hints.py
    • tests/flytekit/unit/remote/test_remote.py
    • tests/flytekit/unit/types/directory/test_listdir.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@ChihTsungLu ChihTsungLu force-pushed the master branch 4 times, most recently from 8e230da to c2ff9e1 Compare February 17, 2025 06:48
@@ -17,7 +17,7 @@
from mock import ANY, MagicMock, patch

import flytekit.configuration
from flytekit import CronSchedule, ImageSpec, LaunchPlan, WorkflowFailurePolicy, task, workflow, reference_task, map_task, dynamic, eager
from flytekit import CronSchedule, ImageSpec, LaunchPlan, WorkflowFailurePolicy, task, workflow, reference_task, map, dynamic, eager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using map_task instead of map

Consider updating the import statement to use map_task instead of map to maintain consistency with the module's naming convention and avoid potential confusion with Python's built-in map function.

Code suggestion
Check the AI-generated fix before applying
Suggested change
from flytekit import CronSchedule, ImageSpec, LaunchPlan, WorkflowFailurePolicy, task, workflow, reference_task, map, dynamic, eager
from flytekit import CronSchedule, ImageSpec, LaunchPlan, WorkflowFailurePolicy, task, workflow, reference_task, map, dynamic, eager

Code Review Run #cbd7b1


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

from flytekit._version import __version__
from flytekit.configuration import Config
from flytekit.core.array_node_map_task import map_task
from flytekit.core.array_node_map_task import map
from flytekit.core.artifact import Artifact
from flytekit.core.base_sql_task import SQLTask
from flytekit.core.base_task import SecurityContext, TaskMetadata, kwtypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider deprecation notice for map_task rename

Consider keeping the original map_task import and marking it as deprecated using @deprecated decorator if this is an API change, to maintain backward compatibility. The alias on line 277 may not be sufficient for all use cases.

Code suggestion
Check the AI-generated fix before applying
Suggested change
from flytekit.core.base_task import SecurityContext, TaskMetadata, kwtypes
from flytekit.core.array_node_map_task import map_task, map
from deprecated import deprecated

Code Review Run #cbd7b1


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

"""

labels: typing.Optional[common_models.Labels] = None
annotations: typing.Optional[common_models.Annotations] = None
raw_output_data_config: typing.Optional[common_models.RawOutputDataConfig] = None
security_context: typing.Optional[security.SecurityContext] = None
max_parallelism: typing.Optional[int] = None
concurrency: typing.Optional[int] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding constructor deprecation warning

The parameter max_parallelism has been renamed to concurrency. While backward compatibility is maintained through property and setter methods, consider adding a deprecation warning in the constructor when max_parallelism is used.

Code suggestion
Check the AI-generated fix before applying
 -    def __init__(self, **kwargs):
 +    def __init__(self, max_parallelism=None, **kwargs):
 +        if max_parallelism is not None:
 +            warnings.warn(
 +                "max_parallelism is deprecated and will be removed in a future version. Use concurrency instead.",
 +                DeprecationWarning,
 +                stacklevel=2)
 +        super().__init__(**kwargs)

Code Review Run #cbd7b1


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +43 to +65
@property
def max_parallelism(self) -> typing.Optional[int]:
"""
[Deprecated] Use concurrency instead. This property is maintained for backward compatibility
"""
warnings.warn(
"max_parallelism is deprecated and will be removed in a future version. Use concurrency instead.",
DeprecationWarning,
stacklevel=2,
)
return self.concurrency

@max_parallelism.setter
def max_parallelism(self, value: typing.Optional[int]):
"""
Setter for max_parallelism (deprecated in favor of concurrency)
"""
warnings.warn(
"max_parallelism is deprecated and will be removed in a future version. Use concurrency instead.",
DeprecationWarning,
stacklevel=2,
)
self.concurrency = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using standard deprecation decorator pattern

Consider using a decorator like deprecated from the warnings module instead of manually implementing deprecation warnings. This would make the code more maintainable and consistent with Python's standard deprecation patterns.

Code suggestion
Check the AI-generated fix before applying
Suggested change
@property
def max_parallelism(self) -> typing.Optional[int]:
"""
[Deprecated] Use concurrency instead. This property is maintained for backward compatibility
"""
warnings.warn(
"max_parallelism is deprecated and will be removed in a future version. Use concurrency instead.",
DeprecationWarning,
stacklevel=2,
)
return self.concurrency
@max_parallelism.setter
def max_parallelism(self, value: typing.Optional[int]):
"""
Setter for max_parallelism (deprecated in favor of concurrency)
"""
warnings.warn(
"max_parallelism is deprecated and will be removed in a future version. Use concurrency instead.",
DeprecationWarning,
stacklevel=2,
)
self.concurrency = value
@property
@deprecated("Use concurrency instead", DeprecationWarning)
def max_parallelism(self) -> typing.Optional[int]:
return self.concurrency
@max_parallelism.setter
@deprecated("Use concurrency instead", DeprecationWarning)
def max_parallelism(self, value: typing.Optional[int]):
self.concurrency = value

Code Review Run #cbd7b1


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +269 to +277
max_parallelism: int = make_click_option_field(
click.Option(
param_decls=["--max-parallelism"],
required=False,
type=int,
show_default=True,
help="[Deprecated] Use --concurrency instead",
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing deprecated max-parallelism option

Consider removing the deprecated --max-parallelism option since --concurrency is now the preferred way to control parallel execution. Having both options may cause confusion for users.

Code suggestion
Check the AI-generated fix before applying
Suggested change
max_parallelism: int = make_click_option_field(
click.Option(
param_decls=["--max-parallelism"],
required=False,
type=int,
show_default=True,
help="[Deprecated] Use --concurrency instead",
)
)

Code Review Run #cbd7b1


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -406,6 +406,6 @@ def test_map_task_interface(min_success_ratio, expected_type):
def t() -> str:
return "hello"

mt = map_task(t, min_success_ratio=min_success_ratio)
mt = map(t, min_success_ratio=min_success_ratio)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map task function name mismatch

Consider if replacing map_task with map is intentional as this could change the behavior of the test. The test name test_map_task_interface suggests testing map_task functionality but the implementation uses map.

Code suggestion
Check the AI-generated fix before applying
 -def test_map_task_interface(min_success_ratio, expected_type):
 +def test_map_interface(min_success_ratio, expected_type):

Code Review Run #cbd7b1


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -316,7 +316,7 @@ def test_bounded_inputs_vars_order(serialization_settings):
def task1(a: int, b: float, c: str) -> str:
return f"{a} - {b} - {c}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type mismatch in task1 function parameters

The function call parameters in task1 appear to have type mismatches. Parameter c is defined as str but passed a float value 1.0, and parameter b is defined as float but passed a str value "hello". This could lead to runtime type errors.

Code suggestion
Check the AI-generated fix before applying
Suggested change
mt = map(functools.partial(task1, b=1.0, c="hello", a=1))

Code Review Run #cbd7b1


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Signed-off-by: Chih Tsung Lu <[email protected]>
@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 17, 2025

Code Review Agent Run #351655

Actionable Suggestions - 5
  • flytekit/core/options.py - 2
    • Consider adding concurrency parameter validation · Line 38-38
    • Consider adding concurrency parameter validation · Line 38-38
  • tests/flytekit/integration/remote/workflows/basic/pydantic_wf.py - 1
    • Consider implications of map vs map_task · Line 3-3
  • flytekit/core/launch_plan.py - 1
  • tests/flytekit/integration/remote/workflows/basic/array_map.py - 1
    • Consider implications of map vs map_task · Line 4-4
Additional Suggestions - 10
  • flytekit/tools/translator.py - 1
    • Consider consolidating duplicate warning code blocks · Line 357-384
  • flytekit/models/execution.py - 2
    • Consider consistent parameter naming for concurrency · Line 380-380
    • Consider consolidating duplicate property methods · Line 294-306
  • flytekit/models/launch_plan.py - 3
    • Consider adding concurrency value validation · Line 249-250
    • Consider consolidating concurrency variable assignments · Line 172-173
    • Consider validating concurrency value before use · Line 277-277
  • tests/flytekit/unit/core/test_type_hints.py - 1
  • plugins/flytekit-k8s-pod/tests/test_pod.py - 1
  • flytekit/core/array_node_map_task.py - 2
    • Improve deprecated parameter warning handling · Line 376-378
    • Improve deprecation warning implementation · Line 398-398
Review Details
  • Files reviewed - 24 · Commit Range: 2c2d41d..d8304e5
    • flytekit/__init__.py
    • flytekit/clis/sdk_in_container/run.py
    • flytekit/core/array_node_map_task.py
    • flytekit/core/launch_plan.py
    • flytekit/core/options.py
    • flytekit/models/execution.py
    • flytekit/models/launch_plan.py
    • flytekit/remote/entities.py
    • flytekit/remote/remote.py
    • flytekit/tools/translator.py
    • plugins/flytekit-k8s-pod/tests/test_pod.py
    • plugins/flytekit-papermill/tests/test_task.py
    • tests/flytekit/integration/remote/workflows/basic/array_map.py
    • tests/flytekit/integration/remote/workflows/basic/pydantic_wf.py
    • tests/flytekit/unit/core/test_array_node.py
    • tests/flytekit/unit/core/test_array_node_map_task.py
    • tests/flytekit/unit/core/test_artifacts.py
    • tests/flytekit/unit/core/test_interface.py
    • tests/flytekit/unit/core/test_launch_plan.py
    • tests/flytekit/unit/core/test_node_creation.py
    • tests/flytekit/unit/core/test_partials.py
    • tests/flytekit/unit/core/test_type_hints.py
    • tests/flytekit/unit/remote/test_remote.py
    • tests/flytekit/unit/types/directory/test_listdir.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

"""

labels: typing.Optional[common_models.Labels] = None
annotations: typing.Optional[common_models.Annotations] = None
raw_output_data_config: typing.Optional[common_models.RawOutputDataConfig] = None
security_context: typing.Optional[security.SecurityContext] = None
max_parallelism: typing.Optional[int] = None
concurrency: typing.Optional[int] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding concurrency parameter validation

Consider adding validation for the concurrency parameter to ensure it's a positive integer when set. A negative or zero value for concurrency could cause unexpected behavior.

Code suggestion
Check the AI-generated fix before applying
Suggested change
concurrency: typing.Optional[int] = None
_concurrency: typing.Optional[int] = None
@property
def concurrency(self) -> typing.Optional[int]:
return self._concurrency
@concurrency.setter
def concurrency(self, value: typing.Optional[int]):
if value is not None and value <= 0:
raise ValueError('concurrency must be a positive integer')
self._concurrency = value

Code Review Run #351655


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -1,6 +1,6 @@
from pydantic import BaseModel

from flytekit import map_task
from flytekit import map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider implications of map vs map_task

Consider if replacing map_task with map is intentional as they might have different behaviors. The map function is typically used for parallel execution while map_task might have had specific task-related functionality.

Code suggestion
Check the AI-generated fix before applying
Suggested change
from flytekit import map
from flytekit import map_task

Code Review Run #351655


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Comment on lines +307 to +308
concurrency if concurrency is not None else max_parallelism,
cached_outputs.get("_concurrency", cached_outputs.get("")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible incorrect dictionary key lookup

The cached output lookup for concurrency appears to have an incomplete key in the dictionary get operation. The second get() call is missing its key parameter which could lead to unexpected behavior. Consider fixing the nested get calls.

Code suggestion
Check the AI-generated fix before applying
 -                    cached_outputs.get("_concurrency", cached_outputs.get(""))
 +                    cached_outputs.get("_concurrency", cached_outputs.get("_max_parallelism"))

Code Review Run #351655


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@@ -1,7 +1,7 @@
import typing
from functools import partial

from flytekit import map_task, task, workflow
from flytekit import map, task, workflow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider implications of map vs map_task

Consider whether replacing map_task with map is intentional as this could change the behavior of the workflow. The map function might have different semantics or performance characteristics compared to map_task.

Code suggestion
Check the AI-generated fix before applying
Suggested change
from flytekit import map, task, workflow
from flytekit import map_task, task, workflow

Code Review Run #351655


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

@flyte-bot
Copy link
Contributor

flyte-bot commented Feb 20, 2025

Code Review Agent Run #97cb30

Actionable Suggestions - 0
Additional Suggestions - 1
  • tests/flytekit/integration/remote/test_remote.py - 1
Review Details
  • Files reviewed - 12 · Commit Range: d8304e5..3c46c8b
    • flytekit/clis/sdk_in_container/serve.py
    • flytekit/core/type_engine.py
    • flytekit/remote/remote.py
    • plugins/flytekit-aws-sagemaker/flytekitplugins/awssagemaker_inference/boto3_agent.py
    • plugins/flytekit-aws-sagemaker/tests/test_boto3_agent.py
    • plugins/flytekit-greatexpectations/tests/test_schema.py
    • plugins/flytekit-openai/flytekitplugins/openai/batch/agent.py
    • plugins/flytekit-openai/tests/openai_batch/test_agent.py
    • plugins/flytekit-ray/flytekitplugins/ray/task.py
    • plugins/flytekit-ray/tests/test_ray.py
    • tests/flytekit/integration/remote/test_remote.py
    • tests/flytekit/integration/remote/workflows/basic/deep_child_workflow.py
  • Files skipped - 3
    • .github/workflows/build_image.yml - Reason: Filter setting
    • .github/workflows/pythonbuild.yml - Reason: Filter setting
    • .github/workflows/pythonpublish.yml - Reason: Filter setting
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants