Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uproot4 to uproot5 TTree to DataFrame arrays slower #1257

Open
masterfelu opened this issue Jul 26, 2024 · 1 comment
Open

uproot4 to uproot5 TTree to DataFrame arrays slower #1257

masterfelu opened this issue Jul 26, 2024 · 1 comment
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged

Comments

@masterfelu
Copy link

masterfelu commented Jul 26, 2024

Hello.

I run a script to load a TTree as a DataFrame with 5 million rows and 3 columns, and measure the time taken. Here's an MWE for ipython:

%timeit uproot.open('tree.root')['time'].arrays(['year','month','day'],library='pd')

For uproot 5.3.7 with pandas 2.2.2, 19.2 s ± 177 ms per loop
For uproot 4.1.9 with pandas 1.3.5, 1.61 s ± 9.27 ms per loop

The value reported is mean ± std. dev. of 7 runs, 1 loop each. We can see uproot5 is 4 times slower than uproot4. The discrepancy increases further when more columns are loaded, leading to more than ten minutes of time for what previously took seconds.

I just want to know if there has been any major change that can cause such a reduction of load time for large TTree to DataFrame. Also, if any more checks should be done before I reach a conclusion, that is super helpful.

Thanks a lot for your time.

@masterfelu masterfelu added the bug (unverified) The problem described would be a bug, but needs to be triaged label Jul 26, 2024
@vvsagar
Copy link

vvsagar commented Jul 26, 2024

It might also be important to mention the differences in the pandas versions in order to reproduce the results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (unverified) The problem described would be a bug, but needs to be triaged
Projects
None yet
Development

No branches or pull requests

2 participants