Add ability to handle pages of transactions asynchronously #22

marijnvanderhorst · 2023-01-31T13:51:42Z

Proposed changes

Recall that, when querying transactions, the dataset service uses a paginated response. I.e. only 10,000 transactions can be queried at a time (which we call 1 page). Currently, the SDK abstracts away from this by requesting all pages one-by-one until all transactions have been received.

However, in some cases it can be useful to already start processing a page of transactions, while waiting for the next one to be returned. This PR adds that functionality by adding a callback parameter to the get_transactions function that is called after each page of transactions is received.

Types of changes

What types of changes does your code introduce to this repository?

Bugfix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Testing

Explain here how to run your unit tests, and explain how to execute manual testing.

Unit testing

n/a

Manual testing

Install snapshot version using

pip install pricecypher-sdk==0.6.0rc0.dev0

Then, for instance, something like the following. It should print the different pages and their sizes (always 10,000) multiple times.

import logging
import asyncio
from pricecypher import Datasets

logging.basicConfig(level=logging.DEBUG)


async def handle_page(page, page_nr, last_page):
    print(f'Handling page {page_nr}/{last_page}')
    print(f'Nr of transactions in page: {len(page)}')
    print(f'Handling page {page_nr}/{last_page} done')


async def main():
    async with Datasets(BEARER_TOKEN) as ds:
        # Specify desired comments for transactions dataframe
        columns = [
            {'representation': 'net_sales', 'key': 'net_sales'},
            {'representation': 'cost_price', 'key': 'cost_price'}
        ]

        index = asyncio.create_task(ds.index())
        transactions = asyncio.create_task(ds.get_transactions(58, False, columns, page_cb=handle_page))
        meta = asyncio.create_task(ds.get_meta(58))
        summary = asyncio.create_task(ds.get_transaction_summary(58))

        # Wait for scopes to be requested, such that we can use one for requesting its scope values.
        scopes = await asyncio.create_task(ds.get_scopes(58))

        print('scopes', scopes)

        values = asyncio.create_task(ds.get_scope_values(58, scopes.where_type('enum')[0].id))

        print('datasets', await index)
        print('trans', await transactions)
        print('meta', await meta)
        print('summary', await summary)
        print('scope values', await values)

asyncio.run(main())

Further comments

Uses Python's built-in asyncio to handle concurrency. This allows us to process a page of transactions while waiting for the next page of transactions to be returned by the API.

src/pricecypher/endpoints/datasets.py

EmielSchmeink

Tested locally and seems to work.

marijnvanderhorst · 2023-09-29T10:13:49Z

Status update:

This is currently not needed anymore. It was only used by Data Quality scripts when running time limits were very tight (Azure Functions). Since that is not the case anymore (using Argo Workflows instead now), this is not needed at the moment.

If picked up at a later point: need to think carefully about a way to integrate async in the current SDK without breaking current contracts / keeping backward compatibility.

marijnvanderhorst added 2 commits January 31, 2023 14:41

Update readme for snapshot releases

5475c6e

Add ability to handle pages of transactions async

7563f35

marijnvanderhorst requested review from Eline2020, EmielSchmeink, JorisMarcelis, ilagith, levykort1 and sonalif January 31, 2023 13:51

JorisMarcelis reviewed Jan 31, 2023

View reviewed changes

src/pricecypher/endpoints/datasets.py Show resolved Hide resolved

levykort1 previously approved these changes Jan 31, 2023

View reviewed changes

sonalif previously approved these changes Jan 31, 2023

View reviewed changes

Refactor to use Python's built-in asyncio for scheduling tasks

620fb0f

marijnvanderhorst dismissed stale reviews from sonalif and levykort1 via 620fb0f February 1, 2023 12:10

marijnvanderhorst requested review from JorisMarcelis and levykort1 February 1, 2023 12:15

Return transactions after all page callbacks are done

9ea2bdd

JorisMarcelis previously approved these changes Feb 1, 2023

View reviewed changes

levykort1 previously approved these changes Feb 1, 2023

View reviewed changes

marijnvanderhorst added 2 commits February 15, 2023 09:54

Add limit on nr of TCP sessions & retry after 429

c427673

Bump version

04e15c9

marijnvanderhorst dismissed stale reviews from levykort1 and JorisMarcelis via 04e15c9 February 15, 2023 08:55

marijnvanderhorst requested review from JorisMarcelis, levykort1 and sonalif February 15, 2023 08:55

levykort1 approved these changes Mar 7, 2023

View reviewed changes

JorisMarcelis approved these changes Mar 30, 2023

View reviewed changes

EmielSchmeink approved these changes Apr 7, 2023

View reviewed changes

Eline2020 removed their request for review August 4, 2023 10:42

marijnvanderhorst marked this pull request as draft September 29, 2023 10:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to handle pages of transactions asynchronously #22

Add ability to handle pages of transactions asynchronously #22

Uh oh!

marijnvanderhorst commented Jan 31, 2023 •

edited

Loading

Uh oh!

Uh oh!

EmielSchmeink left a comment

Uh oh!

marijnvanderhorst commented Sep 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Add ability to handle pages of transactions asynchronously #22

Are you sure you want to change the base?

Add ability to handle pages of transactions asynchronously #22

Uh oh!

Conversation

marijnvanderhorst commented Jan 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Types of changes

Testing

Unit testing

Manual testing

Further comments

Uh oh!

Uh oh!

EmielSchmeink left a comment

Choose a reason for hiding this comment

Uh oh!

marijnvanderhorst commented Sep 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

marijnvanderhorst commented Jan 31, 2023 •

edited

Loading