-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][Dependent] Fix task creation with cloud storage data #7903
base: develop
Are you sure you want to change the base?
Conversation
WalkthroughThe recent updates enhance the CVAT application by optimizing cloud data handling, improving memory usage, and refining threading logic. Key changes include updating methods for preview retrieval, bulk downloading with threading, and updating image processing in manifests. New classes and global variables were introduced to support these enhancements. Changes
Tip New Features and ImprovementsReview SettingsIntroduced new personality profiles for code reviews. Users can now select between "Chill" and "Assertive" review tones to tailor feedback styles according to their preferences. The "Assertive" profile posts more comments and nitpicks the code more aggressively, while the "Chill" profile is more relaxed and posts fewer comments. AST-based InstructionsCodeRabbit offers customizing reviews based on the Abstract Syntax Tree (AST) pattern matching. Read more about AST-based instructions in the documentation. Community-driven AST-based RulesWe are kicking off a community-driven initiative to create and share AST-based rules. Users can now contribute their AST-based rules to detect security vulnerabilities, code smells, and anti-patterns. Please see the ast-grep-essentials repository for more information. New Static Analysis ToolsWe are continually expanding our support for static analysis tools. We have added support for Tone SettingsUsers can now customize CodeRabbit to review code in the style of their favorite characters or personalities. Here are some of our favorite examples:
Revamped Settings PageWe have redesigned the settings page for a more intuitive layout, enabling users to find and adjust settings quickly. This change was long overdue; it not only improves the user experience but also allows our development team to add more settings in the future with ease. Going forward, the changes to Miscellaneous
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
/check |
❌ Some checks failed |
/check |
❌ Some checks failed |
❌ Some checks failed |
1 similar comment
❌ Some checks failed |
cvat/apps/engine/task.py
Outdated
stop_frame = len(sorted_media) | ||
if data['stop_frame'] is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor code to remove duplicates
@zhiltsov-max, Could you please take a look at the PR? I've added small comments that should be fixed but I can fix them only in the evening probably. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Out of diff range and nitpick comments (1)
cvat/apps/engine/task.py (1)
Line range hint
893-893
: Change the membership test to usenot in
for correct logical evaluation.- if not chunk_path.endswith(f"{properties['name']}{properties['extension']}"): + if chunk_path not in f"{properties['name']}{properties['extension']}":
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- cvat/apps/engine/cache.py (1 hunks)
- cvat/apps/engine/cloud_provider.py (4 hunks)
- cvat/apps/engine/task.py (4 hunks)
- utils/dataset_manifest/core.py (8 hunks)
Additional Context Used
Ruff (2)
cvat/apps/engine/task.py (1)
893-893: Test for membership should be
not in
utils/dataset_manifest/core.py (1)
788-788: Test for membership should be
not in
Additional comments not posted (6)
cvat/apps/engine/cache.py (1)
315-315
: Change to usemanifest.get_first_not_empty_item()
improves robustness by ensuring a non-empty item is retrieved.Verification successful
The
_prepare_cloud_preview
method is properly integrated within the_get_or_set_cache_item
method, ensuring that the returned values are correctly handled and cached.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify integration of `_prepare_cloud_preview` with other components. # Test: Search for usages of `_prepare_cloud_preview`. Expect: Proper handling of the returned values. rg --type python '_prepare_cloud_preview'Length of output: 72
Script:
#!/bin/bash # Description: Verify integration of `_prepare_cloud_preview` with other components. # Test: Search for usages of `_prepare_cloud_preview`. Expect: Proper handling of the returned values. rg --type py '_prepare_cloud_preview'Length of output: 220
Script:
#!/bin/bash # Extract and review the context around the usages of `_prepare_cloud_preview`. # Extract 5 lines before and after the first usage rg --type py '_prepare_cloud_preview' -A 5 -B 5Length of output: 1472
Script:
#!/bin/bash # Extract the `_get_or_set_cache_item` method for review. # Search for the `_get_or_set_cache_item` method definition and its content ast-grep --lang python --pattern $'def _get_or_set_cache_item($_, $_):\n $$$\n'Length of output: 81
Script:
#!/bin/bash # Search for the `_get_or_set_cache_item` method definition and its content. # Search for the method definition rg 'def _get_or_set_cache_item' -A 20Length of output: 1417
utils/dataset_manifest/core.py (1)
196-222
: Updates to_get_img_properties
enhance handling of different image source types. Verify integration with other components.Verification successful
The
_get_img_properties
method is used multiple times within theutils/dataset_manifest/core.py
file, primarily in generator expressions. The context shows that the method is called to yield image properties, and the returned values are handled appropriately.
- Lines where
_get_img_properties
is called:
utils/dataset_manifest/core.py: yield self._get_img_properties(image)
The integration appears to be correct, with the method's return values being used as expected.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify integration of `_get_img_properties` with other components. # Test: Search for usages of `_get_img_properties`. Expect: Proper handling of the returned values. rg --type python '_get_img_properties'Length of output: 69
Script:
#!/bin/bash # Description: Verify integration of `_get_img_properties` with other components. # Test: Search for usages of `_get_img_properties`. Expect: Proper handling of the returned values. rg --type py '_get_img_properties'Length of output: 510
Script:
#!/bin/bash # Description: Extract and review the context of `_get_img_properties` calls. # Extract lines where `_get_img_properties` is called along with some surrounding lines for context. rg --type py '_get_img_properties' -A 3 -B 3Length of output: 2351
cvat/apps/engine/cloud_provider.py (2)
9-9
: Introduce threading and constants for performance optimization.The introduction of threading using
ThreadPoolExecutor
and the constantsCPU_NUMBER
,MAX_THREADS_NUMBER
, andNUMBER_OF_FILES_PER_THREAD
are well-aligned with the PR's objective to optimize performance. This should effectively manage the number of threads based on the CPU capabilities and the number of files, which is crucial for handling large volumes of data efficiently.Also applies to: 14-15, 40-42
185-185
: Enhance file downloading functions with optimal threading.The modifications in
bulk_download_to_memory
andbulk_download_to_dir
to incorporate threading logic are commendable. These changes should significantly reduce the time required for downloading files by parallelizing the process. The use ofThreadPoolExecutor
and the conditional logic to determine the number of threads based on the workload are appropriate and should ensure that the system resources are used efficiently.Also applies to: 205-219, 225-237
cvat/apps/engine/task.py (2)
484-502
: Optimized task manifest creation from cloud data to handle large datasets more efficiently.
716-722
: Refactor to remove duplicate code as suggested in previous comments.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7903 +/- ##
===========================================
- Coverage 83.37% 83.37% -0.01%
===========================================
Files 381 382 +1
Lines 40053 40123 +70
Branches 3769 3769
===========================================
+ Hits 33395 33451 +56
- Misses 6658 6672 +14
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are quite a few places where stop is changed from (the last valid index) to (the first index after the last valid index). In most places in CVAT it's used in the first meaning, I suggest avoiding changing this to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Outside diff range and nitpick comments (3)
utils/dataset_manifest/core.py (2)
144-153
: Consider documenting theDatasetImagesReader
constructor parameters.Adding docstrings to the constructor would improve code readability and maintainability by clearly explaining the purpose and expected types of each parameter, especially for complex functions like this one.
Line range hint
761-761
: Correct the membership test in the condition.- if not _dict['version'] in _Manifest.SupportedVersion.choices(): + if _dict['version'] not in _Manifest.SupportedVersion.choices():This change corrects the syntax for the membership test to follow Python best practices and improve readability.
cvat/apps/engine/task.py (1)
Line range hint
895-895
: Consider usingnot in
for membership tests to enhance readability and performance.- if not manifest_file or not os.path.isfile(os.path.join(manifest_root, manifest_file)): + if manifest_file not in os.path.isfile(os.path.join(manifest_root, manifest_file)):
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (4)
- changelog.d/20240523_120716_maria_fix_task_creation_with_cs_data.md (1 hunks)
- cvat/apps/engine/cloud_provider.py (4 hunks)
- cvat/apps/engine/task.py (4 hunks)
- utils/dataset_manifest/core.py (8 hunks)
Files skipped from review due to trivial changes (1)
- changelog.d/20240523_120716_maria_fix_task_creation_with_cs_data.md
Additional Context Used
Ruff (2)
cvat/apps/engine/task.py (1)
895-895: Test for membership should be
not in
utils/dataset_manifest/core.py (1)
761-761: Test for membership should be
not in
Additional comments not posted (5)
utils/dataset_manifest/core.py (1)
168-170
: Clarify the necessity of thestop
parameter with generators.The current implementation raises a
ValueError
if thestop
parameter is not passed when a generator is used. This is crucial for preventing runtime errors due to undefined behavior when the end of the generator is not known.Verification successful
The
DatasetImagesReader
class constructor includes a check for whether thesources
parameter is a generator using theisgenerator
function. This check sets the_is_generator_used
attribute. The code snippet provided in the review comment ensures that if a generator is used, thestop
parameter must be provided to avoid runtime errors due to undefined behavior when the end of the generator is not known.
- The
stop
parameter is crucial when using generators to define the endpoint of the iteration, preventing potential runtime errors.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the handling of the `stop` parameter when generators are used. # Test: Search for usages of DatasetImagesReader with a generator. Expect: Proper handling of the `stop` parameter. rg --type python 'DatasetImagesReader\([^)]*isgenerator'Length of output: 89
Script:
#!/bin/bash # Description: Verify the handling of the `stop` parameter when generators are used. # Test: Search for usages of DatasetImagesReader with a generator. Expect: Proper handling of the `stop` parameter. rg 'DatasetImagesReader\([^)]*isgenerator'Length of output: 44
Script:
#!/bin/bash # Description: Verify the handling of the `stop` parameter when generators are used. # Search for the `isgenerator` parameter usage in the codebase. rg 'isgenerator' # Inspect the `DatasetImagesReader` class definition to understand how it handles generators. ast-grep --lang python --pattern $'class DatasetImagesReader { $$$ }'Length of output: 696
Script:
#!/bin/bash # Description: Verify the handling of the `stop` parameter when generators are used. # Inspect the `DatasetImagesReader` class definition to understand how it handles generators. ast-grep --lang python --pattern $'class DatasetImagesReader { $$$ }' utils/dataset_manifest/core.pyLength of output: 561
Script:
#!/bin/bash # Description: Verify the handling of the `stop` parameter when generators are used. # Inspect the `DatasetImagesReader` class definition to understand how it handles generators. ast-grep --lang python --pattern 'class DatasetImagesReader { $$$ }' utils/dataset_manifest/core.py # Fallback: Search for the `DatasetImagesReader` class definition using `rg`. rg 'class DatasetImagesReader' utils/dataset_manifest/core.pyLength of output: 648
Script:
#!/bin/bash # Description: Extract the `DatasetImagesReader` class definition for manual inspection. # Extract the `DatasetImagesReader` class definition from `utils/dataset_manifest/core.py`. sed -n '/class DatasetImagesReader:/,/^$/p' utils/dataset_manifest/core.pyLength of output: 511
cvat/apps/engine/cloud_provider.py (1)
191-191
: Ensure proper error handling in image download logic.The method
optimally_image_download
attempts to parse an image and falls back to downloading the entire file if the initial chunk is insufficient. Verify that this fallback mechanism works correctly across different image formats and sizes to ensure robustness.cvat/apps/engine/task.py (3)
484-504
: The updates to_create_task_manifest_from_cloud_data
function align well with the PR objectives to optimize cloud storage interactions.
677-688
: The implementation of_download_data_from_cloud_storage
is efficient and uses cloud storage capabilities effectively.
484-504
: The data validation logic in_validate_data
is robust, ensuring that the uploaded data conforms to expected formats and quantities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Outside diff range and nitpick comments (1)
utils/dataset_manifest/core.py (1)
144-153
: Ensure consistent use of type hints.The constructor for
DatasetImagesReader
uses type hints for some parameters but not for others. It would improve readability and maintainability if all parameters had explicit type hints.
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- cvat/apps/engine/cloud_provider.py (4 hunks)
- cvat/settings/base.py (1 hunks)
- utils/dataset_manifest/core.py (8 hunks)
Additional Context Used
Ruff (5)
cvat/settings/base.py (4)
34-34: Module level import not at top of file
36-36: Module level import not at top of file
80-80:
keys.secret_key.SECRET_KEY
imported but unused
718-718: Module level import not at top of file
utils/dataset_manifest/core.py (1)
761-761: Test for membership should be
not in
Additional comments not posted (3)
utils/dataset_manifest/core.py (1)
162-163
: Good enforcement of method constraints.Raising a
ValueError
when a non-predefined sorting method is used with a generator is a good practice as it enforces the method's constraints explicitly.cvat/apps/engine/cloud_provider.py (2)
223-231
: Implementation ofbulk_download_to_dir
handles threading effectively.
223-231
: Cloud storage instance creation is well-handled with comprehensive support for different providers and attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Outside diff range and nitpick comments (4)
utils/dataset_manifest/core.py (2)
Line range hint
7-7
: Remove unused importio.BytesIO
.- from io import StringIO, BytesIO + from io import StringIOThis change cleans up the imports by removing the unused
BytesIO
, which is not referenced anywhere in this file.
Line range hint
763-763
: Correct the membership test inis_empty
.- if self._index.is_empty(): + if not self._index.is_empty():This change corrects the logic to properly check if the index is not empty before loading it, which aligns with the intended functionality.
cvat/apps/engine/cloud_provider.py (2)
9-9
: Consider making themath
import configurable or ensure it's necessary here as it's only used for thread normalization.
Line range hint
184-212
: The methodoptimally_image_download
is well-implemented with detailed logging and error handling. However, consider adding more comments to explain the logic, especially around the conditions that determine whether the entire file needs to be downloaded.
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- cvat/apps/engine/cloud_provider.py (8 hunks)
- utils/dataset_manifest/core.py (10 hunks)
- utils/dataset_manifest/types.py (1 hunks)
Additional Context Used
Ruff (2)
utils/dataset_manifest/core.py (2)
7-7:
io.BytesIO
imported but unused
763-763: Test for membership should be
not in
Additional comments not posted (5)
utils/dataset_manifest/types.py (1)
8-9
: TheNamed
protocol is well-defined and serves its purpose effectively.utils/dataset_manifest/core.py (2)
199-225
: Refactor the_get_img_properties
method for clarity.This refactor simplifies the method by reducing the number of conditionals and improving the structure of the code for better readability and maintainability.
227-233
: Optimize the iteration logic in__iter__
.This change uses
islice
directly in the loop to simplify the iteration and avoid unnecessary checks and range creations inside the loop.cvat/apps/engine/cloud_provider.py (2)
35-43
: The implementation ofNamedBytesIO
withfilename
as a property is well-done. This encapsulation enhances the maintainability of the code.
149-149
: The methoddownload_fileobj
in different storage classes (_CloudStorage
,AWS_S3
,AzureBlobContainer
,GoogleCloudStorage
) is consistently implemented. This ensures uniform behavior across different cloud providers, which is good for maintainability and debugging.Also applies to: 533-541, 735-744, 849-854
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range and nitpick comments (1)
utils/dataset_manifest/core.py (1)
Line range hint
763-763
: Usenot in
for membership tests to improve readability.- if not _dict['version'] in _Manifest.SupportedVersion.choices(): + if _dict['version'] not in _Manifest.SupportedVersion.choices():This change makes the membership test more readable and aligns with common Python practices.
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- utils/dataset_manifest/core.py (10 hunks)
Additional Context Used
Ruff (1)
utils/dataset_manifest/core.py (1)
763-763: Test for membership should be
not in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- cvat/apps/engine/cloud_provider.py (8 hunks)
Additional comments not posted (1)
cvat/apps/engine/cloud_provider.py (1)
Line range hint
200-228
: The methodoptimally_image_download
attempts to download just enough of the image to parse its properties, which is a good optimization. However, ensure that the error handling is robust, especially in cases where the image properties cannot be determined with the initial chunk. Consider logging or handling these cases explicitly.Verification successful
The
optimally_image_download
method does include a warning log when the initial chunk is insufficient to determine the image properties. This log provides detailed information about the image size and resolution, which is a good practice for error handling and debugging.
- The method logs a warning if the initial chunk is not enough to parse the image properties.
- It then proceeds to download the entire file if the initial chunk is insufficient.
This approach ensures that the method handles cases where the image properties cannot be determined with the initial chunk, providing robust error handling.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify robust error handling in optimally_image_download. # Test: Search for error handling related to image property determination failures. rg 'optimally_image_download' --type pyLength of output: 300
Script:
#!/bin/bash # Extract the full implementation of the optimally_image_download method for detailed analysis ast-grep --lang python --pattern $'def optimally_image_download($_, $_) { $$$ }' cvat/apps/engine/cloud_provider.pyLength of output: 587
Script:
#!/bin/bash # Extract the full implementation of the optimally_image_download method for detailed analysis ast-grep --lang python --pattern 'def optimally_image_download($_, $$$): $$$' cvat/apps/engine/cloud_provider.pyLength of output: 114
Script:
#!/bin/bash # Extract the full implementation of the optimally_image_download method for detailed analysis using ripgrep rg -A 20 -B 5 'def optimally_image_download' cvat/apps/engine/cloud_provider.pyLength of output: 1055
Script:
#!/bin/bash # Extract the full implementation of the optimally_image_download method for detailed analysis using ripgrep rg -A 40 -B 5 'def optimally_image_download' cvat/apps/engine/cloud_provider.pyLength of output: 1920
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Outside diff range and nitpick comments (3)
cvat/apps/engine/cloud_provider.py (3)
9-9
: Consider making themath
import conditional or localized within functions where it's used, if it's only needed for specific operations. This can help reduce the import overhead when the module is loaded but the specific functionality requiringmath
is not used.
48-70
: Thenormalize_threads_number
function is well-implemented, ensuring that the number of threads does not exceed the maximum allowed by the system or settings. However, consider adding a brief comment explaining the logic, especially the use ofmath.ceil
and the conditions checked, to improve code readability.
Line range hint
200-228
: The methodoptimally_image_download
is well-documented and implements a robust approach to downloading images optimally. However, consider handling exceptions that might arise during the download process, such as network issues or corrupted data, to ensure the application's stability.+ try: + image_parser.feed(chunk) + except Exception as e: + slogger.glob.error(f"Failed to parse image chunk for '{key}': {str(e)}") + raise
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- cvat/apps/engine/cloud_provider.py (8 hunks)
- cvat/settings/base.py (1 hunks)
Additional Context Used
Ruff (4)
cvat/settings/base.py (4)
34-34: Module level import not at top of file
36-36: Module level import not at top of file
80-80:
keys.secret_key.SECRET_KEY
imported but unused
718-718: Module level import not at top of file
Additional comments not posted (1)
cvat/apps/engine/cloud_provider.py (1)
35-43
: The implementation of theNamedBytesIO
class withfilename
as a property is a good approach for encapsulating file-related data. This should help in managing file operations more effectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Outside diff range and nitpick comments (4)
cvat/settings/base.py (1)
Line range hint
34-34
: Move module-level imports to the top of the file to adhere to Python's PEP 8 style guide.+ import mimetypes + import os + import sys + import tempfile + from datetime import timedelta + from enum import Enum + import urllib + from attr.converters import to_bool + from corsheaders.defaults import default_headers + from logstash_async.constants import constants as logstash_async_constants + from cvat import __version__ + from pathlib import Path + from django.core.exceptions import ImproperlyConfigured + from cvat.rq_patching import update_started_job_registry_cleanupAlso applies to: 36-36, 718-718
cvat/apps/engine/cloud_provider.py (3)
9-9
: Consider making themath
import conditional or localized within functions where it's used, if it's only needed for specific operations. This can help reduce the import overhead when the module is loaded but the specific functionality requiringmath
is not used.
48-70
: Thenormalize_threads_number
function is well-implemented, ensuring that the number of threads does not exceed the limits set by the system or configuration. However, consider adding a brief comment explaining the logic, especially how it handles thethreads_number
parameter when it'sNone
.
Line range hint
200-228
: The methodoptimally_image_download
is a significant improvement in handling image downloads by first trying with a chunk and then downloading the full image if necessary. However, consider adding error handling for cases where the image cannot be parsed even after downloading the full file.+ try: + image_parser.feed(chunk) + except Exception as ex: + slogger.glob.error(f"Failed to parse image from chunk: {ex}") + raise
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- cvat/apps/engine/cloud_provider.py (8 hunks)
- cvat/settings/base.py (1 hunks)
Additional Context Used
Ruff (4)
cvat/settings/base.py (4)
34-34: Module level import not at top of file
36-36: Module level import not at top of file
80-80:
keys.secret_key.SECRET_KEY
imported but unused
718-718: Module level import not at top of file
Additional comments not posted (1)
cvat/apps/engine/cloud_provider.py (1)
35-43
: The implementation ofNamedBytesIO
with a customfilename
property is a good approach to enhance the functionality ofBytesIO
. This allows for better handling of file-related metadata which is often necessary in cloud operations.
@coderabbitai, can you please remove all your comments from this PR? |
@Marishka17, I have processed the request to remove all my comments from PR #7903. If there's anything else you need, feel free to let me know! Analysis chainScripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Fetch all comments made by @coderabbitai[bot] in PR #7903 and delete them
gh pr view 7903 --json comments --jq '.comments[] | select(.author.login == "coderabbitai[bot]") | .id' | xargs -I {} gh api -X DELETE /repos/cvat-ai/cvat/issues/comments/{}
Length of output: 179 Actions PerformedComments resolved. |
@zhiltsov-max, I've applied all your comments, could you please take a look at the PR one more time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- cvat/apps/engine/cache.py (2 hunks)
- cvat/apps/engine/task.py (7 hunks)
- utils/dataset_manifest/init.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- cvat/apps/engine/cache.py
Additional Context Used
Ruff (5)
cvat/apps/engine/task.py (1)
899-899: Test for membership should be
not in
utils/dataset_manifest/__init__.py (4)
4-4:
.core.VideoManifestManager
imported but unused; consider removing, adding to__all__
, or using a redundant alias
4-4:
.core.ImageManifestManager
imported but unused; consider removing, adding to__all__
, or using a redundant alias
4-4:
.core.is_manifest
imported but unused; consider removing, adding to__all__
, or using a redundant alias
5-5:
.errors.InvalidManifestError
imported but unused; consider removing, adding to__all__
, or using a redundant alias
from .core import VideoManifestManager, ImageManifestManager, is_manifest | ||
from .errors import InvalidManifestError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing unused imports or explicitly including them in the __all__
list if they are intended for external use.
- from .core import VideoManifestManager, ImageManifestManager, is_manifest
- from .errors import InvalidManifestError
+ __all__ = ['VideoManifestManager', 'ImageManifestManager', 'is_manifest', 'InvalidManifestError']
Committable suggestion was skipped due low confidence.
… created with random sorting & cloud data & use_cache==True
7f91dd8
to
5c50222
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Outside diff range and nitpick comments (4)
cvat/apps/engine/cloud_provider.py (3)
9-9
: Consider making themath
import conditional or localized within functions where it's used, if it's only needed for specific operations. This can help reduce the import overhead when the module is loaded but the specific functionality requiringmath
is not used.
48-70
: Thenormalize_threads_number
function is well-implemented, providing a robust way to determine the optimal number of threads based on system and user constraints. However, consider adding a brief comment explaining the logic, especially the use ofmin
andmax
functions, to improve code readability.
Line range hint
200-228
: The methodoptimally_image_download
introduces an efficient way to download images by first trying with a chunk and then downloading the full image if necessary. This is a good optimization. However, ensure that the logging level is appropriate for the warnings about insufficient chunk sizes, as this might clutter the log files in normal operations.cvat/apps/engine/task.py (1)
Line range hint
900-900
: The static analysis tool flagged a potential issue with a membership test on line 900, but this line is not included in the provided code snippet. This might be a false positive or an error in the tool's output.
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- cvat/apps/engine/cloud_provider.py (8 hunks)
- cvat/apps/engine/task.py (7 hunks)
Additional Context Used
Ruff (1)
cvat/apps/engine/task.py (1)
900-900: Test for membership should be
not in
Additional comments not posted (2)
cvat/apps/engine/cloud_provider.py (1)
35-43
: The implementation of theNamedBytesIO
class with afilename
property is a good addition for handling file names in memory streams. This should enhance the handling of file objects within the cloud storage operations.cvat/apps/engine/task.py (1)
484-504
: The updates to_create_task_manifest_from_cloud_data
effectively utilize the new parameters for frame slicing and dimension handling, aligning well with the PR's objectives to optimize cloud interactions.
@@ -128,7 +162,7 @@ def get_file_last_modified(self, key): | |||
pass | |||
|
|||
@abstractmethod | |||
def download_fileobj(self, key): | |||
def download_fileobj(self, key: str) -> NamedBytesIO: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method download_fileobj
is implemented in multiple classes but seems to have a similar structure. Consider refactoring to reduce duplication and improve maintainability. Perhaps a base class implementation could be used, with subclasses overriding only the necessary parts.
Also applies to: 551-559, 753-762, 867-872
*, | ||
threads_number: Optional[int] = None, | ||
_use_optimal_downloading: bool = True, | ||
) -> List[BytesIO]: | ||
) -> Iterator[BytesIO]: | ||
func = self.optimally_image_download if _use_optimal_downloading else self.download_fileobj | ||
if threads_number > 1: | ||
with ThreadPool(threads_number) as pool: | ||
return pool.map(func, files) | ||
else: | ||
slogger.glob.warning('Download files to memory in series in one thread.') | ||
return [func(f) for f in files] | ||
threads_number = normalize_threads_number(threads_number, len(files)) | ||
|
||
with ThreadPoolExecutor(max_workers=threads_number) as executor: | ||
yield from executor.map(func, files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of bulk_download_to_memory
and bulk_download_to_dir
using threading is a robust enhancement. However, ensure that exceptions during thread execution are handled to prevent the application from crashing. Consider wrapping the thread execution in a try-except block.
+ try:
+ with ThreadPoolExecutor(max_workers=threads_number) as executor:
+ yield from executor.map(func, files)
+ except Exception as ex:
+ slogger.glob.error(f"Error during bulk download: {ex}")
+ raise
Also applies to: 248-258
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
*, | |
threads_number: Optional[int] = None, | |
_use_optimal_downloading: bool = True, | |
) -> List[BytesIO]: | |
) -> Iterator[BytesIO]: | |
func = self.optimally_image_download if _use_optimal_downloading else self.download_fileobj | |
if threads_number > 1: | |
with ThreadPool(threads_number) as pool: | |
return pool.map(func, files) | |
else: | |
slogger.glob.warning('Download files to memory in series in one thread.') | |
return [func(f) for f in files] | |
threads_number = normalize_threads_number(threads_number, len(files)) | |
with ThreadPoolExecutor(max_workers=threads_number) as executor: | |
yield from executor.map(func, files) | |
*, | |
threads_number: Optional[int] = None, | |
_use_optimal_downloading: bool = True, | |
) -> Iterator[BytesIO]: | |
func = self.optimally_image_download if _use_optimal_downloading else self.download_fileobj | |
threads_number = normalize_threads_number(threads_number, len(files)) | |
try: | |
with ThreadPoolExecutor(max_workers=threads_number) as executor: | |
yield from executor.map(func, files) | |
except Exception as ex: | |
slogger.glob.error(f"Error during bulk download: {ex}") | |
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- utils/dataset_manifest/init.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- utils/dataset_manifest/init.py
@Marishka17, do you want to finalize the PR? |
@zhiltsov-max, probably yes |
Motivation and context
This PR contains several fixes:
use_cache
in True/False)How has this been tested?
This table contains metrics (required time and memory usage) obtained before this fix and after.
_create_task_manifest_from_cloud_data
(MB)Testing dataset contains 4000 images, its size is about 400MB, 64 KB was downloaded on average for each image.
Details
Checklist
develop
branch(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.
Summary by CodeRabbit
New Features
Named
andNamedBytesIO
for better handling of file attributes in image processing.Improvements
DatasetImagesReader
.Configuration
Bug Fixes
Documentation