-
Notifications
You must be signed in to change notification settings - Fork 181
Open
Labels
kind/bugSomething isn't workingSomething isn't working
Description
Description
A race condition exists during process shutdown with prometheus metrics are enabled. I believe it is caused by the following issue by my very rusty C++ debugging skills:
The HTTP server thread iterates the thread table threadinfo_map_t::loop() while the event processing thread is removing entries sinsp_thread_manager::remove_thread(). Because the underlying std::unordered_map is not thread safe. In what appears to be a rare case; this can result in a nullptr dereference (sinsp_threadinfo::get_fd_tabe(this=0x0)), which results in a SEGFAULT and a crashed process.
This results in noisy process monitoring metrics, and false alarms during normal k8s cluster churn at scale.
Steps to reproduce
- Using falco 0.43.0 amd64 container on kubernetes
- Launch a large daemonset with the following
falco.ymlwebserver and metrics configuration:
webserver:
enabled: true
threadiness: 0
listen_address: 0.0.0.0
listen_port: 8765
ssl_enabled: false
k8s_healthz_endpoint: /healthz
prometheus_metrics_enabled: true
metrics:
enabled: true
interval: 1m
output_rule: false
output_file: /dev/stdout
rules_counters_enabled: true
resource_utilization_enabled: true
state_counters_enabled: true
kernel_event_counters_enabled: true
libbpf_stats_enabled: true
plugins_metrics_enabled: true
convert_memory_to_mb: true
include_empty_values: false- In a loop continually rollout the daemon set
while true; do kubectl rollout restart ds/falco; sleep 10; done- Monitor pod process exit status for SEGFAULTs.
Expected Behaviour
We expect clean shutdown
Additional information
(gdb) bt
#0 sinsp_threadinfo::get_fd_table (this=0x0) at /home/runner/work/falco/falco/build/falcosecurity-libs-repo/falcosecurity-libs-prefix/src/falcosecurity-libs/userspace/libsinsp/threadinfo.h:438
#1 libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0::operator()(sinsp_threadinfo&) const (this=0x78a4743efe08, tinfo=...)
at /home/runner/work/falco/falco/build/falcosecurity-libs-repo/falcosecurity-libs-prefix/src/falcosecurity-libs/userspace/libsinsp/metrics_collector.cpp:260
#2 std::__1::__invoke[abi:ne200100]<libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0&, sinsp_threadinfo&>(libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0&, sinsp_threadinfo&) (__f=..., __args=...) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__type_traits/invoke.h:179
#3 std::__1::__invoke_void_return_wrapper<bool, false>::__call[abi:ne200100]<libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0&, sinsp_threadinfo&>(libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0&, sinsp_threadinfo&) (__args=..., __args=...)
at /home/runner/work/falco/falco/zig/lib/libcxx/include/__type_traits/invoke.h:243
#4 std::__1::__invoke_r[abi:ne200100]<bool, libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0&, sinsp_threadinfo&>(libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0&, sinsp_threadinfo&) (__args=..., __args=...) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__type_traits/invoke.h:273
#5 std::__1::__function::__alloc_func<libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0, std::__1::allocator<libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0>, bool (sinsp_threadinfo&)>::operator()[abi:ne200100](sinsp_threadinfo&) (this=0x78a4743efe08, __arg=...)
at /home/runner/work/falco/falco/zig/lib/libcxx/include/__functional/function.h:167
#6 std::__1::__function::__func<libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0, std::__1::allocator<libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0>, bool (sinsp_threadinfo&)>::operator()(sinsp_threadinfo&) (this=0x78a4743efe00, __arg=...)
at /home/runner/work/falco/falco/zig/lib/libcxx/include/__functional/function.h:319
#7 0x0000000001c89bb7 in std::__1::__function::__value_func<bool (sinsp_threadinfo&)>::operator()[abi:ne200100](sinsp_threadinfo&) const (this=0x78a4743efe00, __args=...)
at /home/runner/work/falco/falco/zig/lib/libcxx/include/__functional/function.h:436
#8 std::__1::function<bool (sinsp_threadinfo&)>::operator()(sinsp_threadinfo&) const (this=0x78a4743efe00, __arg=...) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__functional/function.h:995
#9 threadinfo_map_t::loop(std::__1::function<bool (sinsp_threadinfo&)>) (this=<optimized out>, callback=...)
at /home/runner/work/falco/falco/build/falcosecurity-libs-repo/falcosecurity-libs-prefix/src/falcosecurity-libs/userspace/libsinsp/threadinfo.h:612
#10 libs::metrics::libs_state_counters::libs_state_counters (this=<optimized out>, sinsp_stats_v2=..., thread_manager=<optimized out>)
at /home/runner/work/falco/falco/build/falcosecurity-libs-repo/falcosecurity-libs-prefix/src/falcosecurity-libs/userspace/libsinsp/metrics_collector.cpp:259
#11 0x0000000001c8b887 in libs::metrics::libs_metrics_collector::snapshot (this=0x78a4743efe90)
at /home/runner/work/falco/falco/build/falcosecurity-libs-repo/falcosecurity-libs-prefix/src/falcosecurity-libs/userspace/libsinsp/metrics_collector.cpp:425
#12 0x000000000179e4b6 in falco_metrics::sources_to_text_prometheus (state=..., prometheus_metrics_converter=..., additional_wrapper_metrics=...) at /home/runner/work/falco/falco/userspace/falco/falco_metrics.cpp:316
#13 0x00000000017a0976 in falco_metrics::to_text_prometheus (state=...) at /home/runner/work/falco/falco/userspace/falco/falco_metrics.cpp:534
#14 0x0000000001775f4b in falco_webserver::enable_prometheus_metrics(falco::app::state const&)::$_0::operator()(httplib::Request const&, httplib::Response&) const (this=<optimized out>, res=...)
at /home/runner/work/falco/falco/userspace/falco/webserver.cpp:108
#15 std::__1::__invoke[abi:ne200100]<falco_webserver::enable_prometheus_metrics(falco::app::state const&)::$_0&, httplib::Request const&, httplib::Response&>(falco_webserver::enable_prometheus_metrics(falco::app::state const&)::$_0&, httplib::Request const&, httplib::Response&) (__f=..., __args=..., __args=...) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__type_traits/invoke.h:179
#16 std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne200100]<falco_webserver::enable_prometheus_metrics(falco::app::state const&)::$_0&, httplib::Request const&, httplib::Response&>(falco_webserver::enable_prometheus_metrics(falco::app::state const&)::$_0&, httplib::Request const&, httplib::Response&) (__args=..., __args=..., __args=...) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__type_traits/invoke.h:251
#17 std::__1::__invoke_r[abi:ne200100]<void, falco_webserver::enable_prometheus_metrics(falco::app::state const&)::$_0&, httplib::Request const&, httplib::Response&>(falco_webserver::enable_prometheus_metrics(falco::app::state const&)::$_0&, httplib::Request const&, httplib::Response&) (__args=..., __args=..., __args=...) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__type_traits/invoke.h:273
#18 std::__1::__function::__alloc_func<falco_webserver::enable_prometheus_metrics(falco::app::state const&)::$_0, std::__1::allocator<falco_webserver::enable_prometheus_metrics(falco::app::state const&)::$_0>, void (httplib::Request const&, httplib::Response&)>::operator()[abi:ne200100](httplib::Request const&, httplib::Response&) (this=<optimized out>, __arg=..., __arg=...) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__functional/function.h:167
#19 std::__1::__function::__func<falco_webserver::enable_prometheus_metrics(falco::app::state const&)::$_0, std::__1::allocator<falco_webserver::enable_prometheus_metrics(falco::app::state const&)::$_0>, void (httplib::Request const&, httplib::Response&)>::operator()(httplib::Request const&, httplib::Response&) (this=<optimized out>, __arg=..., __arg=...) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__functional/function.h:319
#20 0x000000000177db26 in std::__1::__function::__value_func<void (httplib::Request const&, httplib::Response&)>::operator()[abi:ne200100](httplib::Request const&, httplib::Response&) const (this=<optimized out>, __args=...,
__args=...) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__functional/function.h:436
#21 std::__1::function<void (httplib::Request const&, httplib::Response&)>::operator()(httplib::Request const&, httplib::Response&) const (this=<optimized out>, __arg=..., __arg=...)
at /home/runner/work/falco/falco/zig/lib/libcxx/include/__functional/function.h:995
#22 httplib::Server::dispatch_request(httplib::Request&, httplib::Response&, std::__1::vector<std::__1::pair<std::__1::unique_ptr<httplib::detail::MatcherBase, std::__1::default_delete<httplib::detail::MatcherBase> >, std::__1::function<void (httplib::Request const&, httplib::Response&)> >, std::__1::allocator<std::__1::pair<std::__1::unique_ptr<httplib::detail::MatcherBase, std::__1::default_delete<httplib::detail::MatcherBase> >, std::__1::function<void (httplib::Request const&, httplib::Response&)> > > > const&) const (this=<optimized out>, req=..., res=..., handlers=...) at /home/runner/work/falco/falco/build/_deps/cpp-httplib-src/httplib.h:7894
#23 0x000000000177db26 in httplib::Server::routing (this=0x78a4743efe88, this@entry=0xa, req=..., res=..., strm=...)
#24 0x000000000177b0d2 in httplib::Server::process_request(httplib::Stream&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool, bool&, std::__1::function<void (httplib::Request&)> const&) (this=0x78a4cb13c900, strm=..., remote_addr=..., remote_port=<optimized out>, local_addr=..., local_port=8765,
close_connection=<optimized out>, connection_closed=@0x78a4743f18c7: false, setup_request=...) at /home/runner/work/falco/falco/build/_deps/cpp-httplib-src/httplib.h:8145
#25 0x000000000177a2f3 in httplib::Server::process_and_close_socket(int)::{lambda(httplib::Stream&, bool, bool&)#1}::operator()(httplib::Stream&, bool, bool&) const (this=<optimized out>, strm=..., close_connection=false,
connection_closed=@0x78a4743efe90: 216) at /home/runner/work/falco/falco/build/_deps/cpp-httplib-src/httplib.h:8239
#26 httplib::detail::process_server_socket<httplib::Server::process_and_close_socket(int)::{lambda(httplib::Stream&, bool, bool&)#1}>(std::__1::atomic<int> const&, int, unsigned long, long, long, long, long, long, httplib::Server::process_and_close_socket(int)::{lambda(httplib::Stream&, bool, bool&)#1})::{lambda(bool, bool&)#1}::operator()(bool, bool&) const (this=this@entry=0x78a4743f1948, close_connection=false, connection_closed=@0x78a4743efe90: 216)
at /home/runner/work/falco/falco/build/_deps/cpp-httplib-src/httplib.h:3452
#27 0x0000000001778244 in httplib::detail::process_server_socket_core<httplib::detail::process_server_socket<httplib::Server::process_and_close_socket(int)::{lambda(httplib::Stream&, bool, bool&)#1}>(std::__1::atomic<int> const&, int, unsigned long, long, long, long, long, long, httplib::Server::process_and_close_socket(int)::{lambda(httplib::Stream&, bool, bool&)#1})::{lambda(bool, bool&)#1}>(std::__1::atomic<int> const&, int, unsigned long, long, httplib::detail::pr--Type <RET> for more, q to quit, c to continue without paging--c
ocess_server_socket<httplib::Server::process_and_close_socket(int)::{lambda(httplib::Stream&, bool, bool&)#1}>(std::__1::atomic<int> const&, int, unsigned long, long, long, long, long, long, httplib::Server::process_and_close_socket(int)::{lambda(httplib::Stream&, bool, bool&)#1})::{lambda(bool, bool&)#1}) (svr_sock=..., sock=1950285456, keep_alive_max_count=<optimized out>, keep_alive_timeout_sec=132647720256536, callback=...) at /home/runner/work/falco/falco/build/_deps/cpp-httplib-src/httplib.h:3433
#28 httplib::detail::process_server_socket<httplib::Server::process_and_close_socket(int)::{lambda(httplib::Stream&, bool, bool&)#1}>(std::__1::atomic<int> const&, int, unsigned long, long, long, long, long, long, httplib::Server::process_and_close_socket(int)::{lambda(httplib::Stream&, bool, bool&)#1}) (svr_sock=..., sock=26, keep_alive_max_count=<optimized out>, keep_alive_timeout_sec=132647720256536, read_timeout_sec=5, read_timeout_usec=0, write_timeout_sec=5, write_timeout_usec=0, callback=...) at /home/runner/work/falco/falco/build/_deps/cpp-httplib-src/httplib.h:3447
#29 httplib::Server::process_and_close_socket (this=<optimized out>, sock=1950285456) at /home/runner/work/falco/falco/build/_deps/cpp-httplib-src/httplib.h:8234
#30 0x000000000179ac58 in std::__1::__function::__value_func<void ()>::operator()[abi:ne200100]() const (this=0x78a4743f1da0) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__functional/function.h:436
#31 std::__1::function<void ()>::operator()() const (this=0x78a4743f1da0) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__functional/function.h:995
#32 httplib::ThreadPool::worker::operator() (this=0x78a4743efe88, this@entry=0x78a4743efe98) at /home/runner/work/falco/falco/build/_deps/cpp-httplib-src/httplib.h:927
#33 0x000000000179a9de in std::__1::__invoke[abi:ne200100]<httplib::ThreadPool::worker>(httplib::ThreadPool::worker&&) (__f=...) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__type_traits/invoke.h:179
#34 _ZNSt3__116__thread_executeB8ne200100INS_10unique_ptrINS_15__thread_structENS_14default_deleteIS2_EEEEN7httplib10ThreadPool6workerEJETpTnmJEEEvRNS_5tupleIJT_T0_DpT1_EEENS_15__tuple_indicesIJXspT2_EEEE (__t=...) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__thread/thread.h:199
#35 std::__1::__thread_proxy[abi:ne200100]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, httplib::ThreadPool::worker> >(void*) (__vp=0x78a4743efe90) at /home/runner/work/falco/falco/zig/lib/libcxx/include/__thread/thread.h:208
#36 0x000078a4cb60b51d in start_thread () from /debug/usr/lib/libc.so.6
#37 0x000078a4cb690f6c in __clone3 () from /debug/usr/lib/libc.so.6
(gdb) frame 0
#0 sinsp_threadinfo::get_fd_table (this=0x0) at /home/runner/work/falco/falco/build/falcosecurity-libs-repo/falcosecurity-libs-prefix/src/falcosecurity-libs/userspace/libsinsp/threadinfo.h:438
438 /home/runner/work/falco/falco/build/falcosecurity-libs-repo/falcosecurity-libs-prefix/src/falcosecurity-libs/userspace/libsinsp/threadinfo.h: No such file or directory.
(gdb) info args
this = 0x0
(gdb) info locals
root = <optimized out>
(gdb) frame 1
#1 libs::metrics::libs_state_counters::libs_state_counters(std::__1::shared_ptr<sinsp_stats_v2> const&, sinsp_thread_manager*)::$_0::operator()(sinsp_threadinfo&) const (this=0x78a4743efe08, tinfo=...)
at /home/runner/work/falco/falco/build/falcosecurity-libs-repo/falcosecurity-libs-prefix/src/falcosecurity-libs/userspace/libsinsp/metrics_collector.cpp:260
260 /home/runner/work/falco/falco/build/falcosecurity-libs-repo/falcosecurity-libs-prefix/src/falcosecurity-libs/userspace/libsinsp/metrics_collector.cpp: No such file or directory.
(gdb) info args
this = 0x78a4743efe08
tinfo = <error reading variable: Cannot access memory at address 0x0>
(gdb) info locals
fdtable = <optimized out>
(gdb)
Environment
- Falco version: 0.43.0 amd64
- System info:
{
"machine": "x86_64",
"nodename": "gke-trust-staging-us-default-custom-1-e2802b56-mpk9",
"release": "6.6.113+",
"sysname": "Linux",
"version": "#1 SMP Sat Nov 29 10:43:19 UTC 2025"
}- Cloud provider or hardware configuration: GCP/GKE
- OS: Multiple
- Kernel: Multiple
- Installation method: Kubernetes
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
kind/bugSomething isn't workingSomething isn't working