Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature: New model] API Generation #6

Merged
merged 7 commits into from
Jan 18, 2024
Merged

Conversation

chucheria
Copy link
Collaborator

@chucheria chucheria commented Jan 15, 2024

Introducing Two New Code Generation Models:

API Generation Models

We're excited to unveil two powerful models designed to streamline code generation: the API Generator and API Formatter.

1. API Generator:

  • Utilizing a few-shot technique, this Supervised model generates a REST API in YAML format. It interprets instructions and examples provided during training, allowing users to incorporate specific instructions for later use in API creation prompt. Predictions are derived from a user query, with a focus on the intended goal of the API.

2. API Formatter:

  • Operating as an Unsupervised model, the API Formatter takes a YAML string as input and applies corrections based on provided instructions. Unlike its predecessors, it does not rely on a vectorstore. This model accepts user instructions, and its prediction output is an enhanced version of the input API.

Both models leverage the concept that valuable information from examples should be harnessed for effective code generation. Rigorously tested under GPT3.5, a new parser is introduced to ensure the output is exclusively in YAML format.

Noteworthy Changes:

  • A dedicated parser ensures the output is limited to YAML.
  • Adjustments have been implemented to seamlessly integrate these models, with the API Formatter distinguished by its unique functionality that eliminates the need for a vectorstore.

However, it's crucial to mention that the test_prompt_spelling currently does not pass for the API Formatter due to the absence of the {__EXAMPLES__} label in its prompt.

Model Deprecation: Transition to gpt-3.5-turbo-instruct

In response to OpenAI's deprecation of the instruct model text-davinci, we have updated our model to use gpt-3.5-turbo-instruct. The transition includes appropriate modifications, including updated examples. However, it is advised to review the results, as there may be variations from the previous model.

@chucheria chucheria marked this pull request as draft January 15, 2024 15:26
chucheria and others added 2 commits January 15, 2024 16:59
- Change in base.py from prompts, formatting the prompt
- Change in test_prompts adding a new symbol to delete
@chucheria chucheria marked this pull request as ready for review January 16, 2024 10:49
@codecov-commenter
Copy link

codecov-commenter commented Jan 16, 2024

Codecov Report

Attention: 151 lines in your changes are missing coverage. Please review.

Comparison is base (1df8463) 93.74% compared to head (dd49458) 83.68%.
Report is 1 commits behind head on main.

Files Patch % Lines
promptmeteo/api_formatter.py 24.13% 88 Missing ⚠️
promptmeteo/api_generator.py 27.69% 47 Missing ⚠️
promptmeteo/parsers/api_parser.py 30.76% 9 Missing ⚠️
promptmeteo/tasks/task.py 81.25% 3 Missing ⚠️
promptmeteo/models/base.py 33.33% 2 Missing ⚠️
promptmeteo/models/google_vertexai.py 66.66% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##             main       #6       +/-   ##
===========================================
- Coverage   93.74%   83.68%   -10.07%     
===========================================
  Files          35       38        +3     
  Lines        1103     1330      +227     
===========================================
+ Hits         1034     1113       +79     
- Misses         69      217      +148     
Flag Coverage Δ
pytest 83.68% <47.20%> (-10.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ehooo ehooo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Muchas gracias por la aportación, un pedazo de curro.
Te he puesto algunos comentarios para mejorar laguna cosas que he visto.
Muchas gracias por colaborar.

promptmeteo/api_formatter.py Outdated Show resolved Hide resolved
promptmeteo/api_formatter.py Outdated Show resolved Hide resolved
promptmeteo/api_formatter.py Outdated Show resolved Hide resolved
promptmeteo/api_formatter.py Outdated Show resolved Hide resolved
promptmeteo/api_formatter.py Outdated Show resolved Hide resolved
promptmeteo/api_generator.py Outdated Show resolved Hide resolved
promptmeteo/api_generator.py Show resolved Hide resolved
promptmeteo/api_generator.py Outdated Show resolved Hide resolved
promptmeteo/base.py Outdated Show resolved Hide resolved
promptmeteo/api_generator.py Outdated Show resolved Hide resolved
@chucheria chucheria merged commit 8ceaf0d into main Jan 18, 2024
9 checks passed
@chucheria chucheria deleted the feature/api-generation branch January 18, 2024 15:10
chucheria added a commit that referenced this pull request Jan 31, 2024
* Add new models: OpenAI GPT3.5-Turbo and Azure OpenAI. Azure OpenAI allows for the embeddings model to be from a different endpoint

* Parser for the API generation and correction response

* add models, prompts and add to tests

* Changes:
- Change in base.py from prompts, formatting the prompt
- Change in test_prompts adding a new symbol to delete

* add data for examples

---------

Co-authored-by: Miguel Lopez <[email protected]>
chucheria added a commit that referenced this pull request Feb 2, 2024
* Add new models: OpenAI GPT3.5-Turbo and Azure OpenAI. Azure OpenAI allows for the embeddings model to be from a different endpoint

* Parser for the API generation and correction response

* add models, prompts and add to tests

* Changes:
- Change in base.py from prompts, formatting the prompt
- Change in test_prompts adding a new symbol to delete

* add data for examples

---------

Co-authored-by: Miguel Lopez <[email protected]>
chucheria added a commit that referenced this pull request Mar 1, 2024
* initial commit

* [Feature: New model] API Generation (#6)

* Add new models: OpenAI GPT3.5-Turbo and Azure OpenAI. Azure OpenAI allows for the embeddings model to be from a different endpoint

* Parser for the API generation and correction response

* add models, prompts and add to tests

* Changes:
- Change in base.py from prompts, formatting the prompt
- Change in test_prompts adding a new symbol to delete

* add data for examples

---------

Co-authored-by: Miguel Lopez <[email protected]>

* task_builder.py:
- New task added.

New prompts added:
- json-summarizer in spanish and english

json_summarizer.py:
- New model  class for JSON summarization task. TODO: Fix the example and delete commented code

__init__.py:
- Added the new model in the init script

promptmeteo/parsers/__init__.py:
- New option for new type of task (json summarization).
- TODO: We must change it to use a new parser for JSON treatment.

* JSON parser added

* Changes:
prompts. New prompt for anthropic claude added and name changing for gpt 3.5 models.

pyproject.toml. Added boto3 in the requirements to connect with aws.

__init__.py. Name changing of the task

json_info_extraction.py:
- New file replacing json_summarizer.py with the new naming

/models/__init__.py:
- New model provider added for Bedrock

/models/bedrock.py:
- Class for Bedrock models.
- Only anthropic claude v2 integrated
- By the moment, huggingface embeddings

/parsers/__init__.py:
- Name changing of the parser

/tasks/task_builder.py:
- Name changing and summarization task included for new development.

* __init__.py:
- Added Summarizer class

json_info_extraction.py:
- Changes in the example and code comments.
- json_fields parameter removed due to it is implicit in fields_description

summarizer.py:
- New class for Summarization task

parsers/__init__.py:
- New parser added for summarization task (dummy parser)

new prompts for anthropic claude  summarization and minor changes in prompts

tests/tools/dictionary_checker:
- 'sample' word added in spanish dictionary because is a keyword for injecting.

* Minor changes in example documentation

* Removing comments

* Minor change:
- Change for region selection

* Changes:
- test_models.py: Added model Bedrock for unit test

- test_parsers.py: Added unit test for json parser.

- json_parser.py: Change in description

* Merge branch 'main' into integracion-summary

* Changes:
json_info_extraction.py:
- Adapted to new Base model building structure

summarizer.py:
- Adapted to new Base model building structure.

pyproject.toml:
- boto3 added

* prompts added and readme

* Changes:
prompts/base.py:
- Change to allow to fill 'domain' field in prompt even if there is no 'prompt domain' param (same with prompt detail)
- Removal of JSON info extraction task

* Changes in vocabulary in prompts and minor bug in prompts/base

* Minor changes in prompt/base, new words added to spanish dictionary and .prompts modification

* api_formatter no changes

* Changes:
- Added in BedrockLLM model the option of kwargs argument to allow to select different arguments for boto3 client
- Change in test for models the region argument

* minor error in test model

* minor

---------

Co-authored-by: Angel Delgado Panadero <[email protected]>
Co-authored-by: Bea <[email protected]>
Co-authored-by: Miguel Lopez <[email protected]>
Co-authored-by: Bea <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants