Skip to content

Bug: Wrong memory override lead to a violation of HNSW algorithm which will lead to a decrease of recall rate. #576

@ycw66

Description

@ycw66

Describe the bug

Problem statement

A critical memory corruption occurs in form_reverse_links_ when new_neighbors (a view of context.top) becomes invalidated during iteration. new_neighbors is a view of context.top, during traversal of new_neighbors, context.top may be modified to complete the function refine_, therefore corrupt the 'new_neighbors' as it is just a view.

Subsequent accesses to new_neighbors after refine_ function on a corrupted/stale memory range, leading to a violation of HNSW algorithm

In a word, the following code will cause some correctness problem, it will lead to some wrong links among nodes that will never appear in original HNSW algorithm.

form_reverse_links_ function is as follows:

 template <typename value_at, typename metric_at>
    void form_reverse_links_( //
        metric_at&& metric, compressed_slot_t new_slot, candidates_view_t new_neighbors, value_at&& value,
        level_t level, context_t& context) usearch_noexcept_m {

        top_candidates_t& top = context.top_candidates;
        std::size_t const connectivity_max = level ? config_.connectivity : config_.connectivity_base;

        // Reverse links from the neighbors:
        for (auto new_neighbor : new_neighbors) {
            compressed_slot_t close_slot = new_neighbor.slot;
            if (close_slot == new_slot)
                continue;
            node_lock_t close_lock = node_lock_(close_slot);
            node_t close_node = node_at_(close_slot);
            neighbors_ref_t close_header = neighbors_(close_node, level);

            // The node may have no neighbors only in one case, when it's the first one in the index,
            // but that is problematic to track in multi-threaded environments, where the order of insertion
            // is not guaranteed.
            // usearch_assert_m(close_header.size() || new_slot == 1, "Possible corruption - isolated node");
            usearch_assert_m(close_header.size() <= connectivity_max, "Possible corruption - overflow");
            usearch_assert_m(close_slot != new_slot, "Self-loops are impossible");
            usearch_assert_m(level <= close_node.level(), "Linking to missing level");

            // If `new_slot` is already present in the neighboring connections of `close_slot`
            // then no need to modify any connections or run the heuristics.
            if (close_header.size() < connectivity_max) {
                close_header.push_back(new_slot);
                continue;
            }

            // To fit a new connection we need to drop an existing one.
            top.clear();
            usearch_assert_m((top.capacity() >= (close_header.size() + 1)),
                             "The memory must have been reserved in `add`");
            top.insert_reserved({context.measure(value, citerator_at(close_slot), metric), new_slot});
            for (compressed_slot_t successor_slot : close_header)
                top.insert_reserved(
                    {context.measure(citerator_at(close_slot), citerator_at(successor_slot), metric), successor_slot});

            // Export the results:
            close_header.clear();
            candidates_view_t top_view =
                refine_(metric, connectivity_max, top, context, context.computed_distances_in_reverse_refines);
            usearch_assert_m(top_view.size(), "This would lead to isolated nodes");
            for (std::size_t idx = 0; idx != top_view.size(); idx++)
                close_header.push_back(top_view[idx].slot);
        }
    }

Steps to reproduce

  1. Write a test program, trying to continuely add the vectors into usearch index one by one, ~1000 vectors.
  2. Set a breakpoint in funtion form_reverse_link_, here is accurate code postion: link
  3. Run the test program and it will hit the breakpoint.
  4. Watch the elements in new_neighbors when you reach the breakpoint.
  5. Then run the program step by step, you will find the change of the new_neighbours, which should not be changed.

Expected behavior

During the execution of the function form_reverse_link_, the new_neighbors should never be changed.

USearch version

latest

Operating System

Ubuntu 24.04

Hardware architecture

x86

Which interface are you using?

C++ implementation

Contact Details

No response

Are you open to being tagged as a contributor?

  • I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions