You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am creating dataframe for 3.5m records and 25 vector. it is taking over 1min.
# construct data for 3.5m records and close to 25 same key element in each hash.data=[{m: 'abc',a: 1.2,b: 2.1,c: 2.3},{m: 'xyz',a: 1.1,b: 22.1,c: 223.3}
...
]# Convert from array of hash to hash of arrayvc={}data.first.keys.eachdo |ky|
vc[ky]=data.map{|dt| dt[ky]}endBenchmark.bmdo |x|
x.report("df array_of_hash: "){Daru::DataFrame.new(data,clone: false)}x.report("df hash_of_array: "){Daru::DataFrame.new(vc,clone: false)}end### user system total real# df array_of_hash: 86.398855 0.311986 86.710841 ( 86.850770)# df hash_of_array: 21.745897 0.027261 21.773158 ( 21.814447)
After converting data (which also took a min), it is little faster but 21 sec is still a lot of time to create dataframe.
Any ideas how to speed this up?
The text was updated successfully, but these errors were encountered:
Unfortunately, daru is currently without a developer.
I recommend that you create your own fork, give daru another name, such as daru2, and take over the project, or use one of the following alternatives
The former is recommended for general use.
The latter is a new data frame with Apache Arrow as its backend. The functionality may be improved in the future.
Hi,
I am creating dataframe for 3.5m records and 25 vector. it is taking over 1min.
After converting data (which also took a min), it is little faster but 21 sec is still a lot of time to create dataframe.
Any ideas how to speed this up?
The text was updated successfully, but these errors were encountered: