Skip to content

Conversation

@marijnvanderhorst
Copy link
Member

@marijnvanderhorst marijnvanderhorst commented Jan 31, 2023

Proposed changes

Recall that, when querying transactions, the dataset service uses a paginated response. I.e. only 10,000 transactions can be queried at a time (which we call 1 page). Currently, the SDK abstracts away from this by requesting all pages one-by-one until all transactions have been received.

However, in some cases it can be useful to already start processing a page of transactions, while waiting for the next one to be returned. This PR adds that functionality by adding a callback parameter to the get_transactions function that is called after each page of transactions is received.

Types of changes

What types of changes does your code introduce to this repository?

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Testing

Explain here how to run your unit tests, and explain how to execute manual testing.

Unit testing

n/a

Manual testing

Install snapshot version using

pip install pricecypher-sdk==0.6.0rc0.dev0

Then, for instance, something like the following. It should print the different pages and their sizes (always 10,000) multiple times.

import logging
import asyncio
from pricecypher import Datasets

logging.basicConfig(level=logging.DEBUG)


async def handle_page(page, page_nr, last_page):
    print(f'Handling page {page_nr}/{last_page}')
    print(f'Nr of transactions in page: {len(page)}')
    print(f'Handling page {page_nr}/{last_page} done')


async def main():
    async with Datasets(BEARER_TOKEN) as ds:
        # Specify desired comments for transactions dataframe
        columns = [
            {'representation': 'net_sales', 'key': 'net_sales'},
            {'representation': 'cost_price', 'key': 'cost_price'}
        ]

        index = asyncio.create_task(ds.index())
        transactions = asyncio.create_task(ds.get_transactions(58, False, columns, page_cb=handle_page))
        meta = asyncio.create_task(ds.get_meta(58))
        summary = asyncio.create_task(ds.get_transaction_summary(58))

        # Wait for scopes to be requested, such that we can use one for requesting its scope values.
        scopes = await asyncio.create_task(ds.get_scopes(58))

        print('scopes', scopes)

        values = asyncio.create_task(ds.get_scope_values(58, scopes.where_type('enum')[0].id))

        print('datasets', await index)
        print('trans', await transactions)
        print('meta', await meta)
        print('summary', await summary)
        print('scope values', await values)

asyncio.run(main())

Further comments

Uses Python's built-in asyncio to handle concurrency. This allows us to process a page of transactions while waiting for the next page of transactions to be returned by the API.

levykort1
levykort1 previously approved these changes Jan 31, 2023
sonalif
sonalif previously approved these changes Jan 31, 2023
JorisMarcelis
JorisMarcelis previously approved these changes Feb 1, 2023
levykort1
levykort1 previously approved these changes Feb 1, 2023
Copy link
Contributor

@EmielSchmeink EmielSchmeink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally and seems to work.

@Eline2020 Eline2020 removed their request for review August 4, 2023 10:42
@marijnvanderhorst
Copy link
Member Author

Status update:

This is currently not needed anymore. It was only used by Data Quality scripts when running time limits were very tight (Azure Functions). Since that is not the case anymore (using Argo Workflows instead now), this is not needed at the moment.

If picked up at a later point: need to think carefully about a way to integrate async in the current SDK without breaking current contracts / keeping backward compatibility.

@marijnvanderhorst marijnvanderhorst marked this pull request as draft September 29, 2023 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants