Skip to content

Commit 3e90af1

Browse files
authored
add benchmark table for dataflux download (#27)
* add benchmark table for dataflux download * udpated with machine type
1 parent 6337bf9 commit 3e90af1

1 file changed

Lines changed: 11 additions & 0 deletions

File tree

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,17 @@ The `dataflux_download_parallel` function is the most performant stand-alone dow
7979

8080
The `dataflux_download_threaded` function allows for some amount of downlod parallelization while running within daemonic processes (e.g. a distributed ML workload leveraging [ray](https://www.ray.io/)). Daemonic processes are not permitted to spin up child processes, and thus threading must be used in these instances. Threading download performance is similar to that of multiprocessing for most use-cases, but loses out on performance as the thread/process count increases. Additionally, threading does not allow for signal interuption, so SIGINT cleanup triggers are disabled when running a threaded download.
8181

82+
### Dataflux Download Benchmark Results
83+
84+
These benchmarks were performed on a n2-standard-48 48 vCPU virtual machine on files of approximately 10kb each.
85+
86+
|Number of Objects|Standard Linear Download|Dataflux Compose Download|Dataflux Threaded Compose Download (48 Threads)|Dataflux Parallel Compose Download (48 Processes)|
87+
|-----------------|------------------------|-------------------------|-----------------------------------------------|-------------------------------------------------|
88+
|111 |18.27 Seconds |5.17 Seconds |3.94 Seconds |2.06 Seconds |
89+
|1111 |176.22 Seconds |61.78 Seconds |5.21 Seconds |3.14 Seconds |
90+
|11098 |1396.98 Seconds |392.23 Seconds |16.85 Seconds |14.88 Seconds |
91+
92+
8293
## Getting Started
8394

8495
To get started leveraging the dataflux client library, we encourage you to start from the [Dataflux Dataset for Pytorch](https://github.com/GoogleCloudPlatform/dataflux-pytorch). For an example of client-specific implementation, please see the [benchmark code](dataflux_core/benchmarking/dataflux_client_bench.py).

0 commit comments

Comments
 (0)