Skip to content

Race condition on fullsync start/stop [JIRA: RIAK-2535] #741

@macintux

Description

@macintux

The repl_cancel_fullsync test can fail when the sync_worker field of the state record in riak_repl2_fssource is undefined and riak_repl_keylist_server:cancel_fullsync is invoked. This happens if the worker has not started yet.

Extraordinarily unlikely to happen in production, but the code should handle the situation better. Some thoughts from @bsparrow435:

so the worker hadnt started yet
and it tried to do a gen_fsm:send_event to an undefined pid

2016-04-29 00:25:13.232 [info]  ---riak_test--- Starting fullsync.
2016-04-29 00:25:13.588 [info]  ---riak_test--- Stopping fullsync.
maybe we should wait for worker start here
instead of checking if the fscoordinator is running

honestly cancel_fullsync should be able to catch this in a guard and return no workers started

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions