⚡️ Speed up function _match_cell_ids_by_similarity by 33%
#626
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 33% (0.33x) speedup for
_match_cell_ids_by_similarityinmarimo/_utils/cell_matching.py⏱️ Runtime :
570 milliseconds→428 milliseconds(best of21runs)📝 Explanation and details
This optimization achieves a 33% speedup through several targeted micro-optimizations that reduce overhead in computationally intensive functions:
Key Optimizations:
similarity_score(74.6% of original runtime): Eliminated expensive string operations by replacings1[::-1]ands2[::-1]string reversals with direct index-based suffix scanning. This avoids creating new string objects and uses tight while-loops instead of slowerzip()iterations.pop_localfunction: Replacedmin()with lambda function (which had high per-call overhead) with a direct for-loop that manually tracks the best match. This is significantly faster for the typical small list sizes encountered._hungarian_algorithm: Added local variable caching (score_matrix_i = score_matrix[i]) to avoid repeated list lookups in nested loops, and optimized the uncovered cell detection by pre-computing masks rather than checking conditions repeatedly.group_lookupandextract_order: Minor optimizations including cachingsetdefaultas a local variable and pre-allocating lists with correct sizes.Why This Matters:
The function is called from
match_cell_ids_by_similarity(), which appears to be used for matching cells in notebook operations - likely during cell reordering, copying, or merging operations. The test results show consistent 30-35% speedups across all scenarios, particularly benefiting:The optimizations are most effective for workloads involving many cells or frequent cell matching operations, where the cumulative effect of these micro-optimizations provides substantial performance gains.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
_ast/test_cell_manager.py::TestCellMatching.test_completely_different_codes_ast/test_cell_manager.py::TestCellMatching.test_empty_lists_ast/test_cell_manager.py::TestCellMatching.test_empty_strings_ast/test_cell_manager.py::TestCellMatching.test_exact_matches_ast/test_cell_manager.py::TestCellMatching.test_fewer_next_cells_ast/test_cell_manager.py::TestCellMatching.test_left_inexact_matches_with_dupes_ast/test_cell_manager.py::TestCellMatching.test_more_next_cells_ast/test_cell_manager.py::TestCellMatching.test_outer_inexact_matches_ast/test_cell_manager.py::TestCellMatching.test_outer_inexact_matches_with_dupes_ast/test_cell_manager.py::TestCellMatching.test_reordered_codes_ast/test_cell_manager.py::TestCellMatching.test_right_inexact_matches_with_dupes_ast/test_cell_manager.py::TestCellMatching.test_similar_but_not_exact_matches_ast/test_cell_manager.py::TestCellMatching.test_similar_but_not_exact_matches_with_dupes_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_all_codes_being_substrings_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_completely_different_codes_edge_case_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_empty_strings_edge_case_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_identical_codes_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_maximum_length_differences_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_mixed_case_sensitivity_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_multiple_identical_codes_in_next_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_multiple_identical_codes_in_prev_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_similar_reduction_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_special_python_syntax_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_unicode_and_special_characters_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_very_long_common_prefixes_suffixes_ast/test_cell_manager.py::TestCellMatchingEdgeCases.test_whitespace_variations🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-_match_cell_ids_by_similarity-mhwr4v88and push.