Performance comparison between Dolphin and other frameworks #265

yunseong · 2015-11-11T12:16:47Z

We've run experiments by using same LR algorithm with URL reputation dataset on multiple frameworks: Dolphin, Vortex. As @beomyeol mentioned at the meeting, we've seen some performance issues such as vector computation, data loading, etc. We can also take a look at Spark because it can run LR algorithm and the performance turned out to be much faster than Vortex (not sure compared to Dolphin yet).

This issue aims to investigate the performance of both frameworks as we can run the same algorithm on the same data set. It would be great if we can find some points to improve in performance.

As a first step, I'll run the experiment on Microsoft YARN cluster which consists of 20 machines (8core CPU, 8GB RAM, YARN 2.7.1).

yunseong · 2015-11-12T02:54:28Z

@jsjason, @johnyangk, @beomyeol , @gyeongin and I had a discussion and it'd be good to check followings:

How scalable are the systems?
In @gyeongin's experiment, he didn't use the full data set (URLreputation, 2.2GB), but rather some partial data (around 80MB). When I tried to run 2.2 GB, the job failed at some moment. I have not investigated the logs yet, but the exception was not OOM since I used SparseVector. Vortex also had some scalability issues, but after we changed to use HDFS, we can run LR smoothly with at least 2.2 GB dataset. This might NOT weaken our argument ("elastic memory management is crucial for the performance"), but I think it'd be good to check which made the LR implementation not scalable. How does it sound?
The performance of vector computation
All systems use Sparse Vector for vector computation, but we used different implementations: Dolphin used Mahout's SparseVector and Vortex built a custom vector based on Array and Map (@jsjason volunteered to check the one in Spark). As @beomyeol and @gyeongin mentioned the vector computation appeared quite slow, it'd be good to check 1) how the implementations are different and 2) how the computation takes differently. For 2), I think we can compare the ComputeTasks' computation time per iteration.
The accuracy
The algorithm computes accuracy in each iteration, but the value seems a bit confusing. Here I found in the log in Dolphin LR:

0-th iteration accuracy: 0%
1-th iteration accuracy: 65.96%
2-th iteration accuracy: 65.96%
3-th iteration accuracy: 65.96%

The result were almost same (similar accuracy value, fixed after "1-th iteration"(2nd actually)) in Vortex as I referred most part of the algorithm from Dolphin's. Even when using the full data set, the result was still similar. Spark, on the other hand, works differently: 1) the accuracy grows as iteration goes, and 2) the ultimate accuracy is higher with the same iteration. It'd be worth taking a look at algorithm for correctness.

If you have more things to want to check or ask, please feel free to add. After that, I think this issue can be split into multiple items.

gyeongin · 2015-12-15T06:35:47Z

I pushed a branch named gy-lr-test which uses scala library breeze instead of mahout. I made some changes to save memory and improve performance. With the entire URL reputation dataset and R730-02 machine(48core cpu, 128GB memory), the job took 263 seconds.
This is the command I used for test:
./run_logistic.sh -dim 3231961 -maxIter 20 -stepSize 1.0 -lambda 0.01 -local false -split 8 -input /total -output output_logistic -maxNumEvalLocal 5 -isDense false -evalSize 1000 -timeout 1200000
The final model accuracy was 93.421%.

yunseong · 2015-12-15T07:24:55Z

@gyeongin Thanks for sharing the result. This looks awesome!
Could you kindly give us a very short summary what changes you made for better performance and memory usage?

johnyangk · 2015-12-15T07:31:01Z

@gyeongin Great!

gyeongin · 2015-12-15T08:57:51Z

Changes I made:

Use breeze instead of mahout (mahout does not support inplace update)
Change vector computations to inplace update to not allocate redundant new vectors
Use DenseVector for model and cumGradient (dv + sv is much faster than sv + sv)
Each ComputeTasks update the model only once in each iteration

yunseong added the cruise-ps label Nov 11, 2015

This was referenced Nov 18, 2015

Debug Dolphin's algorithm correctness #273

Open

Profile memory usage in Dolphin #274

Open

Test performance of vector computation #275

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance comparison between Dolphin and other frameworks #265

Performance comparison between Dolphin and other frameworks #265

yunseong commented Nov 11, 2015

yunseong commented Nov 12, 2015

gyeongin commented Dec 15, 2015

yunseong commented Dec 15, 2015

johnyangk commented Dec 15, 2015

gyeongin commented Dec 15, 2015

Performance comparison between Dolphin and other frameworks #265

Performance comparison between Dolphin and other frameworks #265

Comments

yunseong commented Nov 11, 2015

yunseong commented Nov 12, 2015

gyeongin commented Dec 15, 2015

yunseong commented Dec 15, 2015

johnyangk commented Dec 15, 2015

gyeongin commented Dec 15, 2015