You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am building a generic query executioner and my current dilemma is getting the query results from Athena into a spark DF. Currently I am using a cursor and fetching rows into a small df and appending that to a final one. The use case of doing this is that I use Glue, so using something like a pandas implementation wouldn't work since that would load it into the memory of the executor only and don't use distributed memory, so if the results are bigger than memory it fails. I have something like this:
At this point, .execute() already happened. This takes about 45 seconds for 50k rows. So my question is, is there an implementation similar to the pandas one I can use for this? Or alternatively, do you have any suggestions to improve this?
The text was updated successfully, but these errors were encountered:
Also, I do admit that one of the reasons I'm taking this approach is to maintain the data types coming from the cursor as opposed to reading the results from the results csv. I find it annoying to have to deal with that separately. :)
I am building a generic query executioner and my current dilemma is getting the query results from Athena into a spark DF. Currently I am using a cursor and fetching rows into a small df and appending that to a final one. The use case of doing this is that I use Glue, so using something like a pandas implementation wouldn't work since that would load it into the memory of the executor only and don't use distributed memory, so if the results are bigger than memory it fails. I have something like this:
At this point,
.execute()
already happened. This takes about 45 seconds for 50k rows. So my question is, is there an implementation similar to the pandas one I can use for this? Or alternatively, do you have any suggestions to improve this?The text was updated successfully, but these errors were encountered: