Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH] Valkey 7.2 Cluster Mode crashes with activedefrag on #1872

Open
dmitrypol opened this issue Mar 22, 2025 · 0 comments · May be fixed by #1873
Open

[CRASH] Valkey 7.2 Cluster Mode crashes with activedefrag on #1872

dmitrypol opened this issue Mar 22, 2025 · 0 comments · May be fixed by #1873
Assignees

Comments

@dmitrypol
Copy link
Contributor

dmitrypol commented Mar 22, 2025

Crash report

69307:C 22 Mar 2025 08:34:55.718 # WARNING: Changing databases number from 16 to 1 since we are in cluster mode
69308:C 22 Mar 2025 08:34:55.720 * oO0OoO0OoO0Oo Valkey is starting oO0OoO0OoO0Oo
69308:C 22 Mar 2025 08:34:55.738 * Valkey version=7.2.8, bits=64, commit=c12d5fc3, modified=1, pid=69308, just started
69308:C 22 Mar 2025 08:34:55.739 * Configuration loaded
69308:M 22 Mar 2025 08:34:55.740 * Increased maximum number of open files to 10032 (it was originally set to 256).
69308:M 22 Mar 2025 08:34:55.741 * monotonic clock: POSIX clock_gettime
69308:M 22 Mar 2025 08:34:55.760 # Failed to write PID file: Permission denied
69308:M 22 Mar 2025 08:34:55.777 * Running mode=cluster, port=30002.
69308:M 22 Mar 2025 08:34:55.778 # WARNING: The TCP backlog setting of 511 cannot be enforced because kern.ipc.somaxconn is set to the lower value of 128.
69308:M 22 Mar 2025 08:34:55.780 * No cluster configuration found, I'm c62c8421cadb0968a6bed80f40d72535eb898b88
69308:M 22 Mar 2025 08:34:55.820 * Server initialized
69308:M 22 Mar 2025 08:34:55.821 * Creating AOF base file appendonly-30002.aof.1.base.rdb on server start
69308:M 22 Mar 2025 08:34:55.913 * Creating AOF incr file appendonly-30002.aof.1.incr.aof on server start
69308:M 22 Mar 2025 08:34:55.914 * Ready to accept connections tcp
69308:M 22 Mar 2025 08:34:56.232 * configEpoch set to 2 via CLUSTER SET-CONFIG-EPOCH
69308:M 22 Mar 2025 08:34:56.388 * IP address for this node updated to 127.0.0.1
69308:M 22 Mar 2025 08:34:56.388 * Successfully completed handshake with 5a5aa254fc93c574e7923a4f33d193865a9f56cf ()
69308:M 22 Mar 2025 08:34:57.965 * Cluster state changed: ok
69308:M 22 Mar 2025 08:34:58.382 * Replica 127.0.0.1:30004 asks for synchronization
69308:M 22 Mar 2025 08:34:58.383 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'd2044eb7c373e479f30c4c426076099f14475179', my replication IDs are '1447dd05ff60864c2f9ba1fb00a41b4e7a493436' and '0000000000000000000000000000000000000000')
69308:M 22 Mar 2025 08:34:58.383 * Replication backlog created, my new replication IDs are 'c15fc6c78fadad7f3f0444016662fce6131a6bdd' and '0000000000000000000000000000000000000000'
69308:M 22 Mar 2025 08:34:58.384 * Delay next BGSAVE for diskless SYNC
69308:M 22 Mar 2025 08:35:03.167 * Starting BGSAVE for SYNC with target: replicas sockets
69308:M 22 Mar 2025 08:35:03.169 * Background RDB transfer started by pid 69542
69542:C 22 Mar 2025 08:35:03.170 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
69308:M 22 Mar 2025 08:35:03.171 * Diskless rdb transfer, done reading from pipe, 1 replicas still up.
69308:M 22 Mar 2025 08:35:03.217 * Background RDB transfer terminated with success
69308:M 22 Mar 2025 08:35:03.218 * Streamed RDB transfer with replica 127.0.0.1:30004 succeeded (socket). Waiting for REPLCONF ACK from replica to enable streaming
69308:M 22 Mar 2025 08:35:03.218 * Synchronization with replica 127.0.0.1:30004 succeeded
69308:M 22 Mar 2025 08:35:04.688 * Starting automatic rewriting of AOF on 6903659200% growth
69308:M 22 Mar 2025 08:35:04.785 * Creating AOF incr file appendonly-30002.aof.2.incr.aof on background rewrite
69308:M 22 Mar 2025 08:35:04.804 * Background append only file rewriting started by pid 69600
69600:C 22 Mar 2025 08:35:08.485 * Successfully created the temporary AOF base file temp-rewriteaof-bg-69600.aof
69600:C 22 Mar 2025 08:35:08.486 * Fork CoW for AOF rewrite: current 0 MB, peak 0 MB, average 0 MB
69308:M 22 Mar 2025 08:35:08.557 * Background AOF rewrite terminated with success
69308:M 22 Mar 2025 08:35:08.594 * Successfully renamed the temporary AOF base file temp-rewriteaof-bg-69600.aof into appendonly-30002.aof.2.base.rdb
69308:M 22 Mar 2025 08:35:08.643 * Removing the history file appendonly-30002.aof.1.incr.aof in the background
69308:M 22 Mar 2025 08:35:08.666 * Removing the history file appendonly-30002.aof.1.base.rdb in the background
69308:M 22 Mar 2025 08:35:08.739 * Background AOF rewrite finished successfully
69308:M 22 Mar 2025 08:35:09.829 # Client id=4 addr=127.0.0.1:59318 laddr=127.0.0.1:30002 fd=23 name= age=11 idle=0 flags=S db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=5062 omem=268450448 tot-mem=268472848 events=rw cmd=replconf user=default redir=-1 resp=2 lib-name= lib-ver= scheduled to be closed ASAP for overcoming of output buffer limits.
69308:M 22 Mar 2025 08:35:09.831 * Connection with replica 127.0.0.1:30004 lost.
69308:M 22 Mar 2025 08:35:09.940 * Replica 127.0.0.1:30004 asks for synchronization
69308:M 22 Mar 2025 08:35:09.941 * Unable to partial resync with replica 127.0.0.1:30004 for lack of backlog (Replica request was: 299358614).
69308:M 22 Mar 2025 08:35:09.942 * Delay next BGSAVE for diskless SYNC


=== REDIS BUG REPORT START: Cut & paste starting from here ===
69308:M 22 Mar 2025 08:35:10.242 # valkey 7.2.8 crashed by signal: 11, si_code: 1
69308:M 22 Mar 2025 08:35:10.245 # Accessing address: 0x50
69308:M 22 Mar 2025 08:35:10.246 # Crashed running the instruction at: 0x109e82510

------ STACK TRACE ------
EIP:
0   valkey-server                       0x0000000109e82510 dbDictAfterReplaceEntry + 576

Backtrace:
0   libsystem_platform.dylib            0x00007ff801d12e1d _sigtramp + 29
1   ???                                 0x0000000000000000 0x0 + 0
2   valkey-server                       0x0000000109e81782 dictDefragBucket + 354
3   valkey-server                       0x0000000109e813ec dictScanDefrag + 732
4   valkey-server                       0x0000000109faa520 activeDefragCycle + 592
5   valkey-server                       0x0000000109e86b8e serverCron + 4974
6   valkey-server                       0x0000000109e7e466 aeProcessEvents + 1094
7   valkey-server                       0x0000000109ea29ad main + 24781
8   dyld                                0x00007ff80194d2cd start + 1805

------ REGISTERS ------
69308:M 22 Mar 2025 08:35:10.251 # 
RAX:0000000000000000 RBX:0000000000000006
RCX:000000010a033a40 RDX:0000000000002672
RDI:000000010a366200 RSI:000000010a370f90
RBP:00007ff7b60ee9e0 RSP:00007ff7b60ee9d0
R8 :0000000000000001 R9 :0000000000000004
R10:0000000000000005 R11:000000010a3b9b56
R12:0000000000000000 R13:0000000000000000
R14:0000000000000000 R15:000000010a370f90
RIP:0000000109e82510 EFL:0000000000010206
CS :000000000000002b FS:0000000000000000  GS:0000000000000000
69308:M 22 Mar 2025 08:35:10.252 # (00007ff7b60ee9df) -> 0000000000000001
69308:M 22 Mar 2025 08:35:10.253 # (00007ff7b60ee9de) -> 000000010a383360
69308:M 22 Mar 2025 08:35:10.255 # (00007ff7b60ee9dd) -> 0000000109e813ec
69308:M 22 Mar 2025 08:35:10.256 # (00007ff7b60ee9dc) -> 00007ff7b60eeaa0
69308:M 22 Mar 2025 08:35:10.257 # (00007ff7b60ee9db) -> 0000000109fa7220
69308:M 22 Mar 2025 08:35:10.258 # (00007ff7b60ee9da) -> 0000000000000001
69308:M 22 Mar 2025 08:35:10.259 # (00007ff7b60ee9d9) -> 000000010a366200
69308:M 22 Mar 2025 08:35:10.260 # (00007ff7b60ee9d8) -> 0000000000000084
69308:M 22 Mar 2025 08:35:10.260 # (00007ff7b60ee9d7) -> 0000000000000084
69308:M 22 Mar 2025 08:35:10.262 # (00007ff7b60ee9d6) -> 0000000109fa5790
69308:M 22 Mar 2025 08:35:10.264 # (00007ff7b60ee9d5) -> 000000010a366200
69308:M 22 Mar 2025 08:35:10.265 # (00007ff7b60ee9d4) -> 0000000000000000
69308:M 22 Mar 2025 08:35:10.266 # (00007ff7b60ee9d3) -> 0000000109e81782
69308:M 22 Mar 2025 08:35:10.267 # (00007ff7b60ee9d2) -> 00007ff7b60eea30
69308:M 22 Mar 2025 08:35:10.269 # (00007ff7b60ee9d1) -> 000000010a364c20
69308:M 22 Mar 2025 08:35:10.271 # (00007ff7b60ee9d0) -> 000000010a370f90

------ INFO OUTPUT ------
# Server
redis_version:7.2.4
server_name:valkey
valkey_version:7.2.8
redis_git_sha1:c12d5fc3
redis_git_dirty:1
redis_build_id:4a5c5e9ebb374439
redis_mode:cluster
os:Darwin 24.3.0 x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:kqueue
atomicvar_api:c11-builtin
gcc_version:4.2.1
process_id:69308
process_supervised:no
run_id:3a0e9d6ae5412afca7924edd12f87fd9dca7eb6a
tcp_port:30002
server_time_usec:1742657710241363
uptime_in_seconds:15
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:14604462
executable:/Users/dpolyako/github/valkey-io/valkey/utils/create-cluster/../../src//valkey-server
config_file:
io_threads_active:0
listener0:name=tcp,bind=*,bind=-::*,port=30002

# Clients
connected_clients:33
cluster_connections:10
maxclients:10000
client_recent_max_input_buffer:114688
client_recent_max_output_buffer:261381680
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0

# Memory
used_memory:27264664
used_memory_human:26.00M
used_memory_rss:359575552
used_memory_rss_human:342.92M
used_memory_peak:796230136
used_memory_peak_human:759.34M
used_memory_peak_perc:3.42%
used_memory_overhead:9919772
used_memory_startup:1797280
used_memory_dataset:17344892
used_memory_dataset_perc:68.11%
allocator_allocated:27431480
allocator_active:27947008
allocator_resident:365920256
total_system_memory:34359738368
total_system_memory_human:32.00G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:64512
used_memory_vm_total_human:63.00K
used_memory_functions:184
used_memory_scripts:184
used_memory_scripts_human:184B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.02
allocator_frag_bytes:515528
allocator_rss_ratio:13.09
allocator_rss_bytes:337973248
rss_overhead_ratio:0.98
rss_overhead_bytes:-6344704
mem_fragmentation_ratio:13.19
mem_fragmentation_bytes:332313048
mem_not_counted_for_evict:3199416
mem_replication_backlog:1048580
mem_total_replication_buffers:1106424
mem_clients_slaves:57848
mem_clients_normal:3848064
mem_cluster_links:10720
mem_aof_buffer:3145728
mem_allocator:jemalloc-5.3.0
active_defrag_running:1
lazyfree_pending_objects:0
lazyfreed_objects:6070

# Persistence
loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:12288
rdb_bgsave_in_progress:0
rdb_last_save_time:1742657695
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
rdb_saves:0
rdb_last_cow_size:0
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:0
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:4
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_rewrites:1
aof_rewrites_consecutive_failures:0
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:620412027
aof_base_size:426509257
aof_pending_rewrite:0
aof_buffer_length:1700901
aof_pending_bio_fsync:0
aof_delayed_fsync:0

# Stats
total_connections_received:40
total_commands_processed:6248
instantaneous_ops_per_sec:1212
total_net_input_bytes:622186364
total_net_output_bytes:299441069
total_net_repl_input_bytes:0
total_net_repl_output_bytes:299396219
instantaneous_input_kbps:118117.77
instantaneous_output_kbps:55346.75
instantaneous_input_repl_kbps:0.00
instantaneous_output_repl_kbps:55340.84
rejected_connections:0
sync_full:2
sync_partial_ok:0
sync_partial_err:2
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:1122
total_forks:2
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:2
active_defrag_misses:599
active_defrag_key_hits:1
active_defrag_key_misses:149
total_active_defrag_time:8569
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:6286
total_writes_processed:6534
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:35
reply_buffer_expands:0
eventloop_cycles:1004
eventloop_duration_sum:6456406
eventloop_duration_cmd_sum:28691
instantaneous_eventloop_cycles_per_sec:217
instantaneous_eventloop_duration_usec:13554
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0

# Replication
role:master
connected_slaves:1
slave0:ip=127.0.0.1,port=30004,state=wait_bgsave,offset=0,lag=0
master_failover_state:no-failover
master_replid:c15fc6c78fadad7f3f0444016662fce6131a6bdd
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:622129647
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:621037521
repl_backlog_histlen:1092127

# CPU
used_cpu_sys:1.772137
used_cpu_user:0.817246
used_cpu_sys_children:0.149947
used_cpu_user_children:0.758357

# Modules

# Commandstats
cmdstat_replconf:calls=11,usec=1626,usec_per_call=147.82,rejected_calls=0,failed_calls=0
cmdstat_ping:calls=2,usec=1,usec_per_call=0.50,rejected_calls=0,failed_calls=0
cmdstat_flushdb:calls=2,usec=198,usec_per_call=99.00,rejected_calls=0,failed_calls=0
cmdstat_set:calls=6218,usec=19727,usec_per_call=3.17,rejected_calls=0,failed_calls=0
cmdstat_psync:calls=2,usec=5733,usec_per_call=2866.50,rejected_calls=0,failed_calls=0
cmdstat_cluster|set-config-epoch:calls=1,usec=403,usec_per_call=403.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|addslots:calls=1,usec=205,usec_per_call=205.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|info:calls=1,usec=42,usec_per_call=42.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|meet:calls=1,usec=19,usec_per_call=19.00,rejected_calls=0,failed_calls=0
cmdstat_cluster|nodes:calls=5,usec=448,usec_per_call=89.60,rejected_calls=0,failed_calls=0
cmdstat_config|get:calls=2,usec=19,usec_per_call=9.50,rejected_calls=0,failed_calls=0
cmdstat_info:calls=2,usec=270,usec_per_call=135.00,rejected_calls=0,failed_calls=0

# Errorstats

# Latencystats
latency_percentiles_usec_replconf:p50=1.003,p99=1613.823,p99.9=1613.823
latency_percentiles_usec_ping:p50=0.001,p99=1.003,p99.9=1.003
latency_percentiles_usec_flushdb:p50=22.015,p99=176.127,p99.9=176.127
latency_percentiles_usec_set:p50=3.007,p99=13.055,p99.9=55.039
latency_percentiles_usec_psync:p50=2146.303,p99=3604.479,p99.9=3604.479
latency_percentiles_usec_cluster|set-config-epoch:p50=403.455,p99=403.455,p99.9=403.455
latency_percentiles_usec_cluster|addslots:p50=205.823,p99=205.823,p99.9=205.823
latency_percentiles_usec_cluster|info:p50=42.239,p99=42.239,p99.9=42.239
latency_percentiles_usec_cluster|meet:p50=19.071,p99=19.071,p99.9=19.071
latency_percentiles_usec_cluster|nodes:p50=84.479,p99=159.743,p99.9=159.743
latency_percentiles_usec_config|get:p50=6.015,p99=13.055,p99.9=13.055
latency_percentiles_usec_info:p50=68.095,p99=202.751,p99.9=202.751

# Cluster
cluster_enabled:1

# Keyspace
db0:keys=148,expires=0,avg_ttl=0

# Cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:2
cluster_stats_messages_ping_sent:64
cluster_stats_messages_pong_sent:64
cluster_stats_messages_meet_sent:1
cluster_stats_messages_sent:129
cluster_stats_messages_ping_received:64
cluster_stats_messages_pong_received:65
cluster_stats_messages_received:129
total_cluster_links_buffer_limit_exceeded:0

------ CLUSTER NODES OUTPUT ------
c62c8421cadb0968a6bed80f40d72535eb898b88 127.0.0.1:30002@40002,,tls-port=0,shard-id=e1dcac11d446f0006ad37ea007fd3a9d87697e30 myself,master - 0 1742657709000 2 connected 5461-10922
017f76daa3fc8f49fd7adbbbbfda09e6c1f01919 127.0.0.1:30005@40005,,tls-port=0,shard-id=315f88d69217e7a0505989d0db1891277bfddd39 slave a846d13071d58486bbdd291f800ae72f6e67f646 0 1742657709165 3 connected
4fe4f0e9c7cfefb6222a80caa01cb73dd868d7b8 127.0.0.1:30004@40004,,tls-port=0,shard-id=e1dcac11d446f0006ad37ea007fd3a9d87697e30 slave c62c8421cadb0968a6bed80f40d72535eb898b88 0 1742657709245 2 connected
a846d13071d58486bbdd291f800ae72f6e67f646 127.0.0.1:30003@40003,,tls-port=0,shard-id=315f88d69217e7a0505989d0db1891277bfddd39 master - 0 1742657710207 3 connected 10923-16383
5a5aa254fc93c574e7923a4f33d193865a9f56cf 127.0.0.1:30001@40001,,tls-port=0,shard-id=b1147ea935213a64a8ef4f9b40fa47edf63a0f6d master - 0 1742657709909 1 connected 0-5460
7f68769a5da7e458aad51822bebb675ab8fdb0e1 127.0.0.1:30006@40006,,tls-port=0,shard-id=b1147ea935213a64a8ef4f9b40fa47edf63a0f6d slave 5a5aa254fc93c574e7923a4f33d193865a9f56cf 0 1742657709362 1 connected

------ CLIENT LIST OUTPUT ------
id=18 addr=127.0.0.1:59366 laddr=127.0.0.1:30002 fd=34 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=19 addr=127.0.0.1:59369 laddr=127.0.0.1:30002 fd=35 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=20 addr=127.0.0.1:59372 laddr=127.0.0.1:30002 fd=36 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=21 addr=127.0.0.1:59375 laddr=127.0.0.1:30002 fd=37 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=22 addr=127.0.0.1:59378 laddr=127.0.0.1:30002 fd=38 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=23 addr=127.0.0.1:59381 laddr=127.0.0.1:30002 fd=39 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=24 addr=127.0.0.1:59384 laddr=127.0.0.1:30002 fd=40 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=25 addr=127.0.0.1:59387 laddr=127.0.0.1:30002 fd=41 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=26 addr=127.0.0.1:59390 laddr=127.0.0.1:30002 fd=42 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=27 addr=127.0.0.1:59393 laddr=127.0.0.1:30002 fd=43 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=28 addr=127.0.0.1:59396 laddr=127.0.0.1:30002 fd=44 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=7 addr=127.0.0.1:59333 laddr=127.0.0.1:30002 fd=12 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=8 addr=127.0.0.1:59336 laddr=127.0.0.1:30002 fd=24 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=41 addr=127.0.0.1:59438 laddr=127.0.0.1:30002 fd=11 name= age=1 idle=1 flags=S db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=20474 argv-mem=0 multi-mem=0 rbs=16384 rbp=16384 obl=0 oll=0 omem=0 tot-mem=37760 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver=
id=9 addr=127.0.0.1:59339 laddr=127.0.0.1:30002 fd=25 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=10 addr=127.0.0.1:59342 laddr=127.0.0.1:30002 fd=26 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=29 addr=127.0.0.1:59399 laddr=127.0.0.1:30002 fd=45 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=30 addr=127.0.0.1:59402 laddr=127.0.0.1:30002 fd=46 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=31 addr=127.0.0.1:59405 laddr=127.0.0.1:30002 fd=47 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=32 addr=127.0.0.1:59408 laddr=127.0.0.1:30002 fd=48 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=33 addr=127.0.0.1:59411 laddr=127.0.0.1:30002 fd=49 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=34 addr=127.0.0.1:59414 laddr=127.0.0.1:30002 fd=50 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=35 addr=127.0.0.1:59417 laddr=127.0.0.1:30002 fd=51 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=36 addr=127.0.0.1:59420 laddr=127.0.0.1:30002 fd=52 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=37 addr=127.0.0.1:59423 laddr=127.0.0.1:30002 fd=53 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=38 addr=127.0.0.1:59426 laddr=127.0.0.1:30002 fd=54 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=39 addr=127.0.0.1:59429 laddr=127.0.0.1:30002 fd=55 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=11 addr=127.0.0.1:59345 laddr=127.0.0.1:30002 fd=27 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=12 addr=127.0.0.1:59348 laddr=127.0.0.1:30002 fd=28 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=0 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=13 addr=127.0.0.1:59351 laddr=127.0.0.1:30002 fd=29 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=14 addr=127.0.0.1:59354 laddr=127.0.0.1:30002 fd=30 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=15 addr=127.0.0.1:59357 laddr=127.0.0.1:30002 fd=31 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=16 addr=127.0.0.1:59360 laddr=127.0.0.1:30002 fd=32 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=
id=17 addr=127.0.0.1:59363 laddr=127.0.0.1:30002 fd=33 name= age=6 idle=0 flags=N db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=0 qbuf-free=114678 argv-mem=0 multi-mem=0 rbs=1024 rbp=5 obl=5 oll=0 omem=0 tot-mem=116608 events=r cmd=set user=default redir=-1 resp=2 lib-name= lib-ver=

------ MODULES INFO OUTPUT ------

------ CONFIG DEBUG OUTPUT ------
activedefrag yes
list-compress-depth 0
repl-diskless-sync yes
lazyfree-lazy-user-del no
io-threads 1
lazyfree-lazy-expire no
lazyfree-lazy-user-flush no
repl-diskless-load disabled
sanitize-dump-payload no
client-query-buffer-limit 1gb
io-threads-do-reads no
replica-read-only yes
lazyfree-lazy-server-del no
proto-max-bulk-len 512mb
slave-read-only yes
lazyfree-lazy-eviction no

------ DUMPING CODE AROUND EIP ------
Symbol: dbDictAfterReplaceEntry (base: 0x109e822d0)
Module: /Users/dpolyako/github/valkey-io/valkey/src/valkey-server (base 0x109e10000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=0x109e822d0 -D -b binary -m i386:x86-64 /tmp/dump.bin
------
69308:M 22 Mar 2025 08:35:10.279 # dump of function (hexdump of 704 bytes):
554889e55350833de3804200000f843f02000040f6c6070f853c020000488b4618488b4e204885c9740df6c1070f8526020000488971184885c07411a8070f85150200004883c020e9020200004889f040f6c6017503488b060fb648ff89ca83e20783fa040f87610100004c8d0502020000496314904c01c2ffe248c1e903eb140fb648fdeb0e0fb748fbeb088b48f7eb038b48ef31d285c97e150f1f440000803c107b740a48ffc24839d175f2eb0439ca756785c90f8e1001000083f9010f84030100004189c84183e0fe31d24c8d0d9b161b0066662e0f1f8400000000000fb7d289d3c1e308c1ea08440fb6104131d2430fb7145131d3c1e2080fb6df440fb650014883c0024131da66433314514183c0fe75cae9fb0000004189d14d8d51014139ca0f8d940000004e8d1c104531c066666666662e0f1f84000000000043803c037d740d49ffc0438d1c1039cb7ceeeb6bf7d201ca4439c274624585c0745d4585c07e654c01c848ffc04183f80174554489c183e1fe31d24c8d0dee151b000fb7d289d3c1e308c1ea08440fb6104131d2430fb7145131d3c1e2080fb6df440fb650014883c0024131da664333145183c1fe75cb41f6c001755eeb7785c97e0983f901750831d2eb4f31d2eb664189c84183e0fe31d24c8d0d90151b000fb7d289d3c1e308c1ea08440fb6104131d2430fb7145131d3c1e2080fb6df440fb650014883c0024131da66433314514183c0fe75caf6c101741b0fb7ca89cac1e208c1e9080fb60031c8488d0d3e151b006633144181e2ff3f0000488b4738488b405048c1e2044801d04883c0084889304883c4085b5dc3488d3d7e741700488d35522a1700ba19030000e8dff309000f1f0007feffff0dfeffff13feffff19feffff1efeffff0f1f840000000000554889e50fb677ff89f083e00783f804771a488d0d43000000486304814801c8ffe048c1ee035de9b4cbffff31f65de9

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

       Please report the crash by opening an issue on github:

           http://github.com/valkey-io/valkey/issues

  If a module was involved, please open in the module's repo instead.

  Suspect RAM error? Use valkey-server --test-memory to verify it.

  Some other issues could be detected by valkey-server --check-system

Additional information

This was on Mac OS 15.3.2

Checkout Valkey 7.2 branch and compile the code
make sure valkey-server is in your path
modify utils/create-cluster/create-cluster.sh:
replace redis-cli with valkey-cli and redis-server with valkey-server
ADDITIONAL_OPTIONS="--activedefrag yes --active-defrag-ignore-bytes 10b --active-defrag-threshold-lower 1"

./create-cluster start && ./create-cluster create -f
Verify defrag:
valkey-cli -3 -p 30001 config get activedefrag

Generate data:
valkey-benchmark -p 30001 -t set -r 10000 -d 100000 -c 100 -n 100000 --cluster -l

Run this a few times to force defrag:
valkey-cli -p 30001 flushdb async && valkey-cli -p 30002 flushdb async && valkey-cli -p 30003 flushdb async

If you leave activedefrag off by default it does NOT crash.  Nor does it crash without Cluster Mode.  

I think this is related to this fix in Redis 7.2 that occurred AFTER the fork https://github.com/redis/redis/pull/13315/files.  And this code https://github.com/valkey-io/valkey/blob/7.2.8/src/cluster.c#L7658-L7675
@enjoy-binbin enjoy-binbin self-assigned this Mar 23, 2025
enjoy-binbin added a commit to enjoy-binbin/valkey that referenced this issue Mar 23, 2025
There is a crash report in valkey-io#1872:
```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
69308:M 22 Mar 2025 08:35:10.242 # valkey 7.2.8 crashed by signal: 11, si_code: 1
69308:M 22 Mar 2025 08:35:10.245 # Accessing address: 0x50
69308:M 22 Mar 2025 08:35:10.246 # Crashed running the instruction at: 0x109e82510

------ STACK TRACE ------
EIP:
0   valkey-server                       0x0000000109e82510 dbDictAfterReplaceEntry + 576

Backtrace:
0   libsystem_platform.dylib            0x00007ff801d12e1d _sigtramp + 29
1   ???                                 0x0000000000000000 0x0 + 0
2   valkey-server                       0x0000000109e81782 dictDefragBucket + 354
3   valkey-server                       0x0000000109e813ec dictScanDefrag + 732
4   valkey-server                       0x0000000109faa520 activeDefragCycle + 592
5   valkey-server                       0x0000000109e86b8e serverCron + 4974
6   valkey-server                       0x0000000109e7e466 aeProcessEvents + 1094
7   valkey-server                       0x0000000109ea29ad main + 24781
8   dyld                                0x00007ff80194d2cd start + 1805
```

The reason is that when doing FLUSHDB async, in emptyDbAsync, after we create
the new dict, we did not call slotToKeyInit to init the clusterDictMetadata.
And then in slotToKeyReplaceEntry we will get a wrong pointer and crash.

This issue only occurs under 7.2 since the code structure changed in unstable
branch. Fixes valkey-io#1872.

Signed-off-by: Binbin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants