Speeding up copying of snapshots #362

kishorenc · 2022-06-03T02:44:51Z

I've a cluster where the nodes are far apart geographically, so there is high network latency between the nodes. In such a case, I find the snapshot install from the leader to follower to be pretty slow for large datasets.

When I increase the FLAGS_raft_max_byte_count_per_rpc value, the trasfer became much faster.

Are there any other flags that I can tweak to increase the speed of snapshot transfer?
If a snapshot consists of multiple files, are the files transferred sequentially or parallely? Is there a way to make the file transfer parallel to max out available network bandwidth between nodes?

The text was updated successfully, but these errors were encountered:

PFZheng · 2022-06-06T00:51:33Z

I think there is no other way to speed up snapshot transfer, the files in snapshot are transferred sequentially.

kishorenc · 2022-06-06T00:54:35Z

Thanks for the clarification. Would you accept a patch that possibly parallelized this operation?

PFZheng · 2022-06-06T00:59:08Z

Thanks for the clarification. Would you accept a patch that possibly parallelized this operation?

Of course!

kishorenc · 2023-01-18T07:48:47Z

@PFZheng

Here's a proposed approach that I want to run by you before making changes:

Make LocalSnapshotCopier::copy_file take a vector of filenames:

braft/src/braft/snapshot.cpp

Line 946 in 2c9f611

void LocalSnapshotCopier::copy_file(const std::string& filename) {

Make this file copying code block run parallel on multiple threads (rest of the code is fast so can just remain sequential):

braft/src/braft/snapshot.cpp

Lines 979 to 987 in 2c9f611

    
           scoped_refptr<RemoteFileCopier::Session> session 
        
               = _copier.start_to_copy_to_file(filename, file_path, NULL); 
        
           if (session == NULL) { 
        
               LOG(WARNING) << "Fail to copy " << filename 
        
                            << " path: " << _writer->get_path(); 
        
               set_error(-1, "Fail to copy %s", filename.c_str()); 
        
               return; 
        
           } 
        
           _cur_session = session.get();

Wrap _copier and _cur_session into a struct and use a vector of structs so that each copy thread can have a separate copy state.

Please let me know if this sounds like a good approach? Also, is it okay to use std::thread -- if not, please point me to some examples in the code where threads are managed via bthread to use as reference.

kishorenc · 2023-01-24T06:08:40Z

@PFZheng

Took a stab at this but ran into concurrency issues with FileServiceImpl::get_file which seems to have trouble allowing fetching files in parallel. Specifically, we got an error here when we tried to download multiple snapshot files in-parallel:

https://github.com/baidu/braft/blob/master/src/braft/file_service.cpp#L70

Here's the rough diff of the changes we've attempted: master...krunal1313:braft:master

Any guidance here is appreciated.

cc @chenzhangyi

kishorenc · 2023-02-13T05:29:34Z

@PFZheng @chenzhangyi

I'm sorry to follow-up: I will really appreciate if you can provide any pointers here.

chenzhangyi · 2023-02-13T05:53:03Z

@kishorenc Which kind of error did you get?

kishorenc · 2023-02-13T06:01:54Z

This is the error we get with these changes.

W0123 12:03:47.506687 80182 external/com_github_brpc_braft/src/braft/snapshot.cpp:786] 
Fail to copy, error_code 22 error_msg [E22][10.13.87.194:8107][E22]Fail to read from path=/var/lib/app/state/snapshot/snapshot_00000000000000000369 filename=db_snapshot/000444.sst : 
Invalid argument writer path /var/lib/app/state/snapshot/temp

It seems like, on the remote end, multiple files cannot be accessed at the same time.

Currently snapshot transfer happens by sending GetFileRequest for every file known to be in the remote snapshot. This happens sequentially for each file. The only real configurations which allow tuning the throughput of this transfer are the throttle which can be set when initializing the braft::Node, or the runtime configuration raft_max_byte_count_per_rpc which determines how many chunks a large file will be broken into during the transfer. The default is 128KiB, so a 1MiB file will be transfered in about 8 GetFileRequests. This works great for snapshots which have a handful of large files. But if a snapshot has hundreds or thousands of small files then transfer of these snapshots can be pretty slow. I locally create a snapshot with 100k files on my development machine, for example, it can take up to 30 minutes to transfer all of those files in that snapshot. Even though the latency per transfer is low, there is a full round trip plus a flush of __raft_meta on the receiving end for each file. This patch adds concurrency to these transfers. When a remote snapshot is transferred locally, up to raft_max_get_file_request_concurrency GetFileRequests will be sent concurrently. This defaults to 64. With this patch, the 100k file snapshot consistently transfers in under 10 seconds on my development machine. This should resolve baidu#362.

ambroff mentioned this issue Feb 6, 2025

snapshot: Transfer files concurrently to speed up snapshot transfer ambroff/braft#1

Open

ambroff linked a pull request Feb 6, 2025 that will close this issue

snapshot: Transfer files concurrently to speed up snapshot transfer #482

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up copying of snapshots #362

Speeding up copying of snapshots #362

kishorenc commented Jun 3, 2022

PFZheng commented Jun 6, 2022

kishorenc commented Jun 6, 2022

PFZheng commented Jun 6, 2022

kishorenc commented Jan 18, 2023

kishorenc commented Jan 24, 2023

kishorenc commented Feb 13, 2023

chenzhangyi commented Feb 13, 2023

kishorenc commented Feb 13, 2023 •

edited

Loading

Speeding up copying of snapshots #362

Speeding up copying of snapshots #362

Comments

kishorenc commented Jun 3, 2022

PFZheng commented Jun 6, 2022

kishorenc commented Jun 6, 2022

PFZheng commented Jun 6, 2022

kishorenc commented Jan 18, 2023

kishorenc commented Jan 24, 2023

kishorenc commented Feb 13, 2023

chenzhangyi commented Feb 13, 2023

kishorenc commented Feb 13, 2023 • edited Loading

kishorenc commented Feb 13, 2023 •

edited

Loading