-
Notifications
You must be signed in to change notification settings - Fork 213
Description
Describe the bug
Problem statement
A critical memory corruption occurs in form_reverse_links_ when new_neighbors
(a view of context.top) becomes invalidated during iteration. new_neighbors
is a view of context.top
, during traversal of new_neighbors
, context.top
may be modified to complete the function refine_
, therefore corrupt the 'new_neighbors' as it is just a view.
Subsequent accesses to new_neighbors
after refine_
function on a corrupted/stale memory range, leading to a violation of HNSW algorithm
In a word, the following code will cause some correctness problem, it will lead to some wrong links among nodes that will never appear in original HNSW algorithm.
form_reverse_links_
function is as follows:
template <typename value_at, typename metric_at>
void form_reverse_links_( //
metric_at&& metric, compressed_slot_t new_slot, candidates_view_t new_neighbors, value_at&& value,
level_t level, context_t& context) usearch_noexcept_m {
top_candidates_t& top = context.top_candidates;
std::size_t const connectivity_max = level ? config_.connectivity : config_.connectivity_base;
// Reverse links from the neighbors:
for (auto new_neighbor : new_neighbors) {
compressed_slot_t close_slot = new_neighbor.slot;
if (close_slot == new_slot)
continue;
node_lock_t close_lock = node_lock_(close_slot);
node_t close_node = node_at_(close_slot);
neighbors_ref_t close_header = neighbors_(close_node, level);
// The node may have no neighbors only in one case, when it's the first one in the index,
// but that is problematic to track in multi-threaded environments, where the order of insertion
// is not guaranteed.
// usearch_assert_m(close_header.size() || new_slot == 1, "Possible corruption - isolated node");
usearch_assert_m(close_header.size() <= connectivity_max, "Possible corruption - overflow");
usearch_assert_m(close_slot != new_slot, "Self-loops are impossible");
usearch_assert_m(level <= close_node.level(), "Linking to missing level");
// If `new_slot` is already present in the neighboring connections of `close_slot`
// then no need to modify any connections or run the heuristics.
if (close_header.size() < connectivity_max) {
close_header.push_back(new_slot);
continue;
}
// To fit a new connection we need to drop an existing one.
top.clear();
usearch_assert_m((top.capacity() >= (close_header.size() + 1)),
"The memory must have been reserved in `add`");
top.insert_reserved({context.measure(value, citerator_at(close_slot), metric), new_slot});
for (compressed_slot_t successor_slot : close_header)
top.insert_reserved(
{context.measure(citerator_at(close_slot), citerator_at(successor_slot), metric), successor_slot});
// Export the results:
close_header.clear();
candidates_view_t top_view =
refine_(metric, connectivity_max, top, context, context.computed_distances_in_reverse_refines);
usearch_assert_m(top_view.size(), "This would lead to isolated nodes");
for (std::size_t idx = 0; idx != top_view.size(); idx++)
close_header.push_back(top_view[idx].slot);
}
}
Steps to reproduce
- Write a test program, trying to continuely add the vectors into usearch index one by one, ~1000 vectors.
- Set a breakpoint in funtion
form_reverse_link_
, here is accurate code postion: link - Run the test program and it will hit the breakpoint.
- Watch the elements in
new_neighbors
when you reach the breakpoint. - Then run the program step by step, you will find the change of the
new_neighbours
, which should not be changed.
Expected behavior
During the execution of the function form_reverse_link_
, the new_neighbors
should never be changed.
USearch version
latest
Operating System
Ubuntu 24.04
Hardware architecture
x86
Which interface are you using?
C++ implementation
Contact Details
No response
Are you open to being tagged as a contributor?
- I am open to being mentioned in the project
.git
history as a contributor
Is there an existing issue for this?
- I have searched the existing issues
Code of Conduct
- I agree to follow this project's Code of Conduct