Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Once you have an error uploading a model, your account (web and api) corrupts and Dataset/Model environment will no longer work #632

Open
littlejohn-ai opened this issue Jan 10, 2025 · 0 comments

Comments

@littlejohn-ai
Copy link

littlejohn-ai commented Jan 10, 2025

Using your example with your CSV file.

import cohere

co = cohere.Client()

# upload a dataset
my_dataset = co.datasets.create(
    name="datasettest",
    data=open("./Arts.Class.1000.csv", "rb"),
    type="single-label-classification-finetune-input",
)

# wait for validation to complete
response = co.wait(my_dataset)

print(response)

This is working fine.

But when you try to create a huge dataset (less than 5gb as you say in your documentation but a huge one) ... the adtaset create/upload method crashes ... and therefore, datasets page is not working, and dataset api is not working anymore, and no model is working since you load datasets when you load models in your web (through your API call, I saw in the developer console network tab)

I have a jsonl of 800Mb (about 180.000 lines of jsonl) ...
Each time (each new account, I have tested in many new accounts) I try to upload it ... the python gives me this error (after some minutes):

...
...
...
...
...
...
...
...
...
...
...
...
...
Traceback (most recent call last):
  File "import.py", line 13, in <module>
    response = co.wait(my_dataset)
               ^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/cohere/utils.py", line 108, in wait
    job = get_job(cohere, awaitable)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/cohere/utils.py", line 46, in get_job
    return cohere.datasets.get(id=get_id(awaitable))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/cohere/datasets/client.py", line 727, in get
    raise TooManyRequestsError(
cohere.errors.too_many_requests_error.TooManyRequestsError: status_code: 429, body: {'message': 'grpc: received message larger than max (24069399 vs. 4194304)'}

and from this point:

  1. the dataset web page is not longer working ... it says in red "Error loading datasets. Please try again later."
  2. the dataset api is not longer working ... it never uploads althought I test with the Arts.Class.1000.csv test example you have, or even with a dataset of 2 lines ... it does not work ANYMORE
  3. no Model page in your web page works ANYMORE ... because as you know you load datasets through your API in your frontend

Something must be happening in Cohere backend, that after a big model upload crashes, the account is useless (since everything depends on Dataset API) ... not the inference of the models, but to browse datasets or models, no matter API or website.

Once you have that error trying to upload a big big model (no matter through web or API) ... the account is completely FROZEN and no longer supports to add datasets, view datasets or view models (web or api)

it's a BIG BUG !!!

Not only it would be needed to solve this bug , but to tell how to upload a 800mb/180.000 lines jsonl model to train (since web frontend does not support it, and API either), and in your documentation you say limit is 5Gb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant