Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster deepcopy #237

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

martinuncountable
Copy link

Profiling my setup I noticed I was spending a lot of time in this deepcopy call.

Looking at this: https://stackoverflow.com/questions/24756712/deepcopy-is-extremely-slow#answer-29385667
Using pickle to copy objects is simple approach to getting a faster copy of objects.

I tested this and it gave a 50% faster time when dumping out one table. From 2.6-2.8s to 1.1-1.2s.

@jmafc
Copy link
Member

jmafc commented Dec 21, 2021

@martinuncountable I'm hesitant to merging this. For one thing, the SO issue appears to indicate the pickle is worse than deepcopy in Python 2, and although that is supposed to be "on the way out", the last issue reported to Pyrseas was from someone using 2.7. Second, IIRC the deepcopy was added first to fix issue #182, then @feikesteenbergen discovered that we were spending too much time deepcopying columns, when it wasn't really necessary, so I merged PR #187. Since table columns are likely the most frequently occurring object in a typical database, my gut feel is that the other deepcopy's are not that critical.
Since I've been away from Pyrseas, and not coding anything in Python for a while, I would like a second opinion, perhaps from @feikesteenbergen or @dvarrazzo?

@dvarrazzo
Copy link
Contributor

If you could do with a shallow copy, that would be better, yes. Using pickle as a deepcopy optimisation is quite a wart to see so it's great if its reason to do so could disappear 😄

Python 2.7 is not on its way out: it is totally a thing of the past and should be deeply buried, so it should be of no concern whatsoever today (a.k.a., if you maintain a Free Software project and someone asks you to support Python 2.7, it is your chance to ask for money to do so, because you don't owe it to anyone).

@jmafc
Copy link
Member

jmafc commented Dec 21, 2021

@dvarrazzo Thanks for the comments, Daniele. With all due respect, I'm on Debian 10 and the default Python is 2.7, so it's not entirely a ghost of Xmas past , yet ... And if you look at #226, you'll see a stack trace from 2.7 from 2 Nov 2021.
To make deepcopy disappear, would entail reinvestigating #182 to determine where exactly a deep copy is absolutely needed, and possibly implementing a specialized copy (the SO issue above has an example) to deal with those cases. FWIW, I believe the main problem in #182 was with foreign keys being visited before the referred table or vice versa.

@dvarrazzo
Copy link
Contributor

No problem for me, @jmafc :) it goes without saying that you are free to choose whatever deprecation policy you prefer for the projects you maintain.

@jmafc
Copy link
Member

jmafc commented Dec 22, 2021

@martinuncountable I may merge this after testing against something more representative and substantial (probably the pagila database under tests/functional) to see if the overall runtime is significantly affected or not.

@martinuncountable
Copy link
Author

@jmafc sounds good! Thanks for responding so quickly.

FYI, for dumping my full schema it dropped the time from ~2-3 mins down to 1 min.

@jmafc
Copy link
Member

jmafc commented Dec 22, 2021

@martinuncountable I'm curious now: how many tables and other objects in your full schema? In my experience, dbtoyaml never runs beyond one minute.

@martinuncountable
Copy link
Author

martinuncountable commented Dec 22, 2021

@jmafc There are ~300 tables, ~100 types and ~10 functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants