Skip to content

Connection manager repeatedly retries connection for dead process [JIRA: RIAK-2379] #723

@ian-mi

Description

@ian-mi

When the calling process exits or crashes before a connection completes, the riak_core_connection process will crash with noproc when it attempts to call the connected callback. The connection manager will then repeatedly retry the connection with the same PID resulting in repeated noproc errors.

Seen at a customer where the fssource crashes due to:

2016-02-07 21:37:52 =SUPERVISOR REPORT====
     Supervisor: {local,riak_repl2_fssource_sup}
     Context:    child_terminated
     Reason:     {normal,{gen_server,call,[<11363.32378.80>,cluster_name,120000]}}
     Offender:   [{pid,<0.15265.27>},{name,822094670998632891489572718402909198556462055424},{mfargs,{riak_repl2_fssource,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}]

Which was then followed by periodic noproc crashes such as

2016-02-07 21:37:52 =ERROR REPORT====
** State machine <0.15266.27> terminating 
** Last message in was {tcp,#Port<0.819352>,<<131,104,2,100,0,2,111,107,104,2,100,0,8,102,117,108,108,115,121,110,99,104,3,97,3,97,0,97,0>>}
** When State == wait_for_protocol
**      Data  == {state,ranch_tcp,#Port<0.819352>,fullsync,[{3,0},{2,0},{1,1}],[{keepalive,true},{nodelay,true},{packet,4},{active,false}],riak_repl2_fssource,<0.15265.27>,"riak_tpsrvc_test2_iscc_104",[{clustername,"riak_tpsrvc_test2_iscc_104"},{ssl_enabled,false}],[{clustername,"riak_tpsrvc_test2_corp_104"},{ssl_enabled,false}],{10,253,50,54},9080}
** Reason for termination = 
** {noproc,{gen_server,call,[<0.15265.27>,{connected,#Port<0.819352>,ranch_tcp,{{REDACTED},9080},{fullsync,{3,0},{3,0}},[{clustername,"riak_tpsrvc_test2_corp_104"},{ssl_enabled,false}]},120000]}}
2016-02-07 21:37:52 =CRASH REPORT====
  crasher:
    initial call: riak_core_connection:init/1
    pid: <0.15266.27>
    registered_name: []
    exception exit: {{noproc,{gen_server,call,[<0.15265.27>,{connected,#Port<0.819352>,ranch_tcp,{{REDACTED},9080},{fullsync,{3,0},{3,0}},[{clustername,"riak_tpsrvc_test2_corp_104"},{ssl_enabled,false}]},120000]}},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,622}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
    ancestors: [<0.15251.27>]
    messages: []
    links: [#Port<0.819352>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 610
    stack_size: 27
    reductions: 1637
  neighbours:

Always with the same PID. This behaviour continues until the node is restarted.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions