Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop workaround for pubmed API #201

Open
10 tasks
tubamos opened this issue Jun 18, 2024 · 0 comments
Open
10 tasks

Develop workaround for pubmed API #201

tubamos opened this issue Jun 18, 2024 · 0 comments
Labels
data pipeline Items that are related to the scrapers of the data pipeline external API Items related to accessing and utilising 3rd-party APIs

Comments

@tubamos
Copy link
Contributor

tubamos commented Jun 18, 2024

Domain

data pipeline

Description

Develop a strategy and write a workaround for the PubMed API's limitation that caps free users' results at 10K. The solution might involve segmenting API calls by publication year to avoid hitting the cap.

User Story

  • As a user,
  • I want as much of the lastest medical research information available as possible,
  • to populate the custom context that will imporve the responses I get from querrying the LLMs.

Acceptance Criteria

  • The system can handle PubMed API queries without hitting the 10K limit by segmenting searches by year or other criteria.
  • The implementation includes a method for dynamically adjusting query parameters to stay within the API limits.
  • Testing confirms that the workaround allows for complete data retrieval from PubMed without errors related to the limits.
  • The solution is integrated into the existing data acquisition pipeline and works correctly in tandem with the rest of the scrapers.

Definition of Done

  • The feature has been fully implemented.
  • The feature has been manually tested and works as expected without critical bugs.
  • The feature code is documented with clear explanations of its functionality and usage.
  • The feature code has been reviewed and approved by at least one team member.
  • The feature branches have been merged into the main branch and closed.
  • The feature utility, function and usage have been documented in the respective project wiki on github.
@tubamos tubamos converted this from a draft issue Jun 18, 2024
@tubamos tubamos added external API Items related to accessing and utilising 3rd-party APIs data pipeline Items that are related to the scrapers of the data pipeline labels Jun 18, 2024
@tubamos tubamos added this to the Part A: Data acquisition milestone Jun 18, 2024
@tubamos tubamos added sprint-09 Items assigned to sprint 09 and removed sprint-09 Items assigned to sprint 09 labels Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data pipeline Items that are related to the scrapers of the data pipeline external API Items related to accessing and utilising 3rd-party APIs
Projects
Status: Product Backlog
Development

No branches or pull requests

1 participant