Skip to content

[filter_multiline] [engine] Segmentation fault (SIGSEGV) and/or deadlock in threaded mode #9835

Open
@drbugfinder-work

Description

@drbugfinder-work

Bug Report

Describe the bug
When using threaded mode in filter_multiline, segmentation faults or deadlocks are occurring randomly (especially in high load situations).
I assume this is caused by missing thread-safe implementation within the flb_log_event_encoder functions.

There is also an auto-closed issue #6728, together with an open and outdated PR from @nokute78 #6765 which are describing a similar issue, which is obviously still not fixed.

Example deadlock stacktraces:

flb_log_event_encoder_commit_record

Thread 57 (Thread 0x7fbe132dc6c0 (LWP 113) "flb-in-tail.47-"):
#0 futex_wait (private=0, expected=2, futex_word=0x7fbe4ec16708) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait (futex=futex@entry=0x7fbe4ec16708, private=0) at ./nptl/lowlevellock.c:49
#2 0x00007fbe505a90f1 in lll_mutex_lock_optimized (mutex=0x7fbe4ec16708) at ./nptl/pthread_mutex_lock.c:48
#3 ___pthread_mutex_lock (mutex=0x7fbe4ec16708) at ./nptl/pthread_mutex_lock.c:93
#4 0x00005648f0d551d1 in ?? ()
#5 0x00005648f0d625b0 in ?? ()
#6 0x00005648f0cf2417 in ?? ()
#7 0x00005648f0dd4436 in flb_log_event_encoder_dynamic_field_scope_leave ()
#8 0x00005648f0dd465d in flb_log_event_encoder_dynamic_field_flush ()
#9 0x00005648f0dd2ac6 in flb_log_event_encoder_commit_record ()
#10 0x00005648f0db459d in flb_ml_flush_stream_group ()
#11 0x00005648f0dd6627 in flb_ml_rule_process ()
#12 0x00005648f0db4f9b in ?? ()
#13 0x00005648f0db5458 in ?? ()
#14 0x00005648f0db573d in flb_ml_append_object ()
#15 0x00005648f0eb7963 in ?? ()
#16 0x00005648f0da95bb in flb_processor_run ()
#17 0x00005648f0dcc8e7 in ?? ()
#18 0x00005648f0dcca6c in flb_input_log_append_skip_processor_stages ()
#19 0x00005648f0ebe3dc in ?? ()
#20 0x00005648f0da95bb in flb_processor_run ()
#21 0x00005648f0dcc8e7 in ?? ()
#22 0x00005648f0dcca9d in flb_input_log_append_records ()
#23 0x00005648f0e0b516 in flb_tail_file_chunk ()
#24 0x00005648f0e05c57 in in_tail_collect_event ()

flb_log_event_encoder_dynamic_field_reset

Thread 153 (Thread 0x7fbe4f67f6c0 (LWP 17) "flb-pipeline"):
#0 futex_wait (private=0, expected=2, futex_word=0x7fbe4ec16708) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait (futex=futex@entry=0x7fbe4ec16708, private=0) at ./nptl/lowlevellock.c:49
#2 0x00007fbe505a90f1 in lll_mutex_lock_optimized (mutex=0x7fbe4ec16708) at ./nptl/pthread_mutex_lock.c:48
#3 ___pthread_mutex_lock (mutex=0x7fbe4ec16708) at ./nptl/pthread_mutex_lock.c:93
#4 0x00005648f0d551d1 in ?? ()
#5 0x00005648f0d625b0 in ?? ()
#6 0x00005648f0cf2417 in ?? ()
#7 0x00005648f0dd4436 in flb_log_event_encoder_dynamic_field_scope_leave ()
#8 0x00005648f0dd46aa in flb_log_event_encoder_dynamic_field_reset ()
#9 0x00005648f0dd2891 in flb_log_event_encoder_reset_record ()
#10 0x00005648f0dd2979 in flb_log_event_encoder_emit_record ()
#11 0x00005648f0db459d in flb_ml_flush_stream_group ()
#12 0x00005648f0db4cd5 in flb_ml_flush_parser_instance ()
#13 0x00005648f0db4d91 in flb_ml_flush_pending ()
#14 0x00005648f0da0446 in flb_sched_event_handler ()
#15 0x00005648f0d9c7c8 in flb_engine_start ()
#16 0x00005648f0d79268 in ?? ()
#17 0x00007fbe505a5a94 in start_thread (arg=) at ./nptl/pthread_create.c:447
#18 0x00007fbe50632c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

and similar stacktraces for other flb_log_event_encoder functions.

Example stacktrace for segmentation fault crash:

[2025/01/09 08:36:09] [engine] caught signal (SIGSEGV)
[2025/01/09 08:36:09] [engine] caught signal (SIGSEGV)
#0  0x55a8643027c8      in  cfl_list_add_before() at lib/cfl/include/cfl/cfl_list.h:130
#1  0x55a864302832      in  cfl_list_prepend() at lib/cfl/include/cfl/cfl_list.h:154
#2  0x55a8643063f2      in  flb_log_event_encoder_dynamic_field_scope_enter() at src/flb_log_event_encoder_dynamic_field.c:67
#3  0x55a864306524      in  flb_log_event_encoder_dynamic_field_begin_array() at src/flb_log_event_encoder_dynamic_field.c:124
#4  0x55a8642fbab2      in  flb_log_event_encoder_emit_record() at src/flb_log_event_encoder.c:168
#5  0x55a8642fbd1c      in  flb_log_event_encoder_commit_record() at src/flb_log_event_encoder.c:267
#6  0x55a8642806a0      in  flb_ml_flush_stream_group() at src/multiline/flb_ml.c:1505
#7  0x55a86427d92a      in  flb_ml_flush_parser_instance() at src/multiline/flb_ml.c:117
#8  0x55a86427d9e0      in  flb_ml_flush_pending() at src/multiline/flb_ml.c:137
#9  0x55a86427da93      in  cb_ml_flush_timer() at src/multiline/flb_ml.c:163  
#10 0x55a864225b73      in  flb_sched_event_handler() at src/flb_scheduler.c:624
#11 0x55a864216cf7      in  flb_engine_start() at src/flb_engine.c:1044
#12 0x55a8641ae5d4      in  flb_lib_worker() at src/flb_lib.c:763
#13 0x7f2ac7abaa93      in  start_thread() at c:447
#14 0x7f2ac7b47c3b      in  clone3() at inux/x86_64/clone3.S:78
#15 0xffffffffffffffff  in  ???() at ???:0

@nokute78 (cc @edsiper) Was there a reason for #6765 not to be merged (and updated to current code base)?

To Reproduce

  • Use tail input plugin (we use globs for multiple files)
  • Use multiline filter with threaded mode enabled
  • Put enough load on it and watch it crash/see deadlock in gdb (e.g. use: gdb -p <pid> --batch -ex "thread apply all bt" -ex "detach" -ex "quit")

Your Environment

  • Version used: 3.2.4 (but the issue exists since many versions)

Maybe related:

As I read in the announcement of v2.0.2, the memory ring buffer mem_buf_limit should be no less than 20M in size. As far as I understand the code, the in_emitter is used with memrb in case of threaded multiline filter.
However, as I've already mentioned in #8473, there is this strange (and most probably wrong) assignment:

ctx->ring_buffer_size = DEFAULT_EMITTER_RING_BUFFER_FLUSH_FREQUENCY;

The default value for the flush frequency is 2000, so I assume this would set the ring buffer size to only 2k. Can you please verify this @nokute78 @edsiper @leonardo-albertovich @pwhelan

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions