Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReadAccessViolation Exception #517

Open
mfranzen0906 opened this issue Nov 5, 2024 · 2 comments
Open

ReadAccessViolation Exception #517

mfranzen0906 opened this issue Nov 5, 2024 · 2 comments

Comments

@mfranzen0906
Copy link

mfranzen0906 commented Nov 5, 2024

I am currently debugging a program of us, which uses dds in the background. We get an exception, which indicates that the in the EntityDelegate optain_callback_lock() function this->callback_mutex does not exist anymore. Is this a known problem that can occur?

Version: 0.10.5

When we use a lock_guard in our program before calling the dds functions, it works fine.

I looked into the ddsc.dll da_or_do_cb_invoke function an wondered why you unlock a mutex and later lock it again. I changed this into a lock and then an unlock, which seemed to work better without crashes. Yet the comments in the code suggest, that this was done on purpose.

Here is the debug trace:

Exception: ReadAccessViolation

>	ddscxx.dll!org::eclipse::cyclonedds::core::EntityDelegate::obtain_callback_lock() Line 283	C++
 	ddscxx.dll!callback_on_data_available(int reader, void * arg) Line 178	C++
 	ddsc.dll!da_or_dor_cb_invoke(dds_reader * rd, const dds_listener * const lst, unsigned int status_and_mask, bool async) Line 240	C
 	ddsc.dll!dds_reader_data_available_cb(dds_reader * rd) Line 263	C
 	ddsc.dll!dds_rhc_default_store(ddsi_rhc * rhc_common, const ddsi_writer_info * wrinfo, ddsi_serdata * sample, ddsi_tkmap_instance * tk) Line 1756	C
 	ddsc.dll!ddsi_rhc_store(ddsi_rhc * rhc, const ddsi_writer_info * wrinfo, ddsi_serdata * sample, ddsi_tkmap_instance * tk) Line 67	C
 	ddsc.dll!deliver_locally_fastpath(ddsi_domaingv * gv, ddsi_entity_common * source_entity, bool source_entity_locked, ddsi_local_reader_ary * fastpath_rdary, const ddsi_writer_info * wrinfo, const deliver_locally_ops * ops, void * vsourceinfo) Line 238	C
 	ddsc.dll!deliver_locally_allinsync(ddsi_domaingv * gv, ddsi_entity_common * source_entity, bool source_entity_locked, ddsi_local_reader_ary * fastpath_rdary, const ddsi_writer_info * wrinfo, const deliver_locally_ops * ops, void * vsourceinfo) Line 264	C
 	ddsc.dll!deliver_locally(ddsi_writer * wr, ddsi_serdata * payload, ddsi_tkmap_instance * tk) Line 194	C
 	ddsc.dll!deliver_data_any(thread_state * const thrst, ddsi_writer * ddsi_wr, dds_writer * wr, ddsi_serdata_any * d, nn_xpack * xp, bool flush) Line 286	C
 	ddsc.dll!dds_writecdr_impl_common(ddsi_writer * ddsi_wr, nn_xpack * xp, ddsi_serdata_any * din, bool flush, dds_writer * wr) Line 321	C
 	ddsc.dll!dds_write_impl_plain(dds_writer * wr, ddsi_writer * ddsi_wr, bool writekey, const void * data, __int64 tstamp, dds_write_action action) Line 572	C
 	ddsc.dll!dds_write_impl(dds_writer * wr, const void * data, __int64 tstamp, dds_write_action action) Line 599	C
 	ddsc.dll!dds_write(int writer, const void * data) Line 55	C
 	ddscxx.dll!org::eclipse::cyclonedds::pub::AnyDataWriterDelegate::write(int writer, const void * data, const dds::core::TInstanceHandle<org::eclipse::cyclonedds::core::InstanceHandleDelegate> & handle, const dds::core::Time & timestamp) Line 243	C++
@eboasson
Copy link
Contributor

eboasson commented Nov 5, 2024

Hi @mfranzen0906, it is not a known problem. I know the C++ binding just doesn't get the care that the core C code gets, mostly because I am not a C++ user myself, and so I naturally gravitate to thinking that crash is due to something in the C++ binding itself ... Not that it helps you much.

Is there anything interesting in the circumstances under which this crash happens, for example, that it is always correlated with deleting a reader? That would make some sense, and at least gives a starting point.

I looked into the ddsc.dll da_or_do_cb_invoke function an wondered why you unlock a mutex and later lock it again.

If I am thinking of the right thing, it is so you can read the status word from inside a listener with deadlocking and so unrelated operations that need to set a bit in there don't have to wait for the callback to complete. It goes through quite some effort to serialize all listener callbacks (and setting listeners) on an entity, so there is not much else to be gained.

I might remember something else, though ... if so, I'll add another comment

@mfranzen0906
Copy link
Author

mfranzen0906 commented Nov 6, 2024

Hi @eboasson,
Our program has multiple threads in which writers are created to send data. Maybe there is a race condition.

In the debugger I can see that the memory is indeed corrupt an cannot be read.

image

If you go one step backwards to ddscxx.dll!callback_on_data_available(int reader, void * arg) Line 178 C++, the debugger shows that la->cpp_reference is already corrupt (which will be this in ddscxx.dll!org::eclipse::cyclonedds::core::EntityDelegate::obtain_callback_lock()).

In dds_reader_data_available_cb, something is locked, then da_or_do_cb_invoke gets called and later it is unlocked.

else
{
  // "lock" listener object so we can look at "lst" without holding m_observers_lock
  data_avail_cb_enter_listener_exclusive_access (&rd->m_entity);
  signal = da_or_dor_cb_invoke(rd, lst, status_and_mask, true);
  data_avail_cb_leave_listener_exclusive_access (&rd->m_entity);
}

This except from da_or_do_invoke calls on_data_available, where a reinterpret_cast happens. Then cpp_reference does not exist anymore:

else if(rd->m_entity.m_listener.on_data_available)
{
  if (!(lst->reset_on_invoke & DDS_DATA_AVAILABLE_STATUS))
    signal = data_avail_cb_set_status (&rd->m_entity, status_and_mask);
  ddsrt_mutex_unlock (&rd->m_entity.m_observers_lock);
  lst->on_data_available (rd->m_entity.m_hdllink.hdl, lst->on_data_available_arg);
  ddsrt_mutex_lock (&rd->m_entity.m_observers_lock);
}
DDS_FN_EXPORT void callback_on_data_available (dds_entity_t reader, void* arg)
{
  org::eclipse::cyclonedds::core::ListenerArg *la =
    reinterpret_cast<org::eclipse::cyclonedds::core::ListenerArg *>(arg);

Like i mentioned above, we have multiple threads that send data in the beginning. Could the problem be that the enter_listener_exclusive_access and ddsrt_mutex_unlock happen in a bad moment, where a second thread also tries to send data? It is just strange that his happens rather often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants