Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add model artifact and registry info to huggingface examples #451

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

parambharat
Copy link
Contributor

This PR updates the HuggingFace colab examples to include model registry and artifact info and turns on model checkpoint logging be default.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link

github-actions bot commented Aug 18, 2023

Thanks for contributing to wandb/examples!
We appreciate your efforts in opening a PR for the examples repository. Our goal is to ensure a smooth and enjoyable experience for you 😎.

Guidelines

The examples repo is regularly tested against the ever-evolving ML stack. To facilitate our work, please adhere to the following guidelines:

  • Notebook naming: You can use a combination of snake_case and CamelCase for your notebook name. Avoid using spaces (replace them with _) and special characters (&%$?). For example:
Cool_Keras_integration_example_with_weights_and_biases.ipynb 

is acceptable, but

Cool Keras Example with W&B.ipynb

is not. Avoid spaces and the & character. To refer to W&B, you can use: weights_and_biases or just wandb (it's our library, after all!)

  • Managing dependencies within the notebook: You may need to set up dependencies to ensure that your code works. Please avoid the following practices:

    • Docker-related activities. If Docker installation is required, consider adding a full example with the corresponding Dockerfile to the wandb/examples/examples folder (where non-Colab examples reside).
    • Using pip install as the primary method to install packages. When calling pip in a cell, avoid performing other tasks. We automatically filter these types of cells, and executing other actions might break the automatic testing of the notebooks. For example,
    pip install -qU wandb transformers gpt4
    

    is acceptable, but

    pip install -qU wandb
    import wandb

    is not.

    • Installing packages from a GitHub branch. Although it's acceptable 😎 to directly obtain the latest bleeding-edge libraries from GitHub, did you know that you can install them like this:
    !pip install -q git+https://github.com/huggingface/transformers

    You don't need to clone, then cd into the repo and install it in editable mode.

    • Avoid referencing specific Colab directories. Google Colab has a /content directory where everything resides. Avoid explicitly referencing this directory because we test our notebooks with pure Jupyter (without Colab). Instead, use relative paths to make the notebook reproducible.
  • The Jupyter notebook file .ipynb is nothing more than a JSON file with primarily two types of cells: markdown and code. There is also a bunch of other metadata specific to Google Colab. We have a set of tools to ensure proper notebook formatting. These tools can be found at wandb/nb_helpers.

Before merging, wait for a maintainer to clean and format the notebooks you're adding. You can tag @tcapelle.

Before marking the PR as ready for review, please run your notebook one more time. Restart the Colab and run all. We will provide you with links to open the Colabs below

The following colabs were changed
-colabs/huggingface/Huggingface_wandb.ipynb
-colabs/huggingface/Optimize_Hugging_Face_models_with_Weights_&_Biases.ipynb
-colabs/huggingface/Visualize_your_Hugging_Face_data_with_Weights_&_Biases.ipynb

"\n",
"The Model Registry is a central page that lives above individual W&B projects. It houses **Registered Models**, portfolios that store \"links\" to the valuable checkpoints living in individual W&B Projects.\n",
"\n",
"The model registry offers a centralized place to house the best checkpoints for all your model tasks. Any `model` artifact you log can be \"linked\" to a Registered Model. Please refer to the following [tutorial](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/wandb-model-registry/Model_Registry_E2E.ipynb) for more information related to the model registry\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should link the Model Registry docs rather than jumping straight into the colab? https://docs.wandb.ai/guides/models:

Pleaser refer to [this](https://docs.wandb.ai/guides/models) guide where you can find a [walkthrough](https://docs.wandb.ai/guides/models/walkthrough) of the Model Registry to learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as suggested.

"\n",
"### 📈 Track key information effortlessly by default\n",
"You will also be able to view your model checkpoints in the W&B [Artifacts](https://docs.wandb.ai/guides/artifacts) page of the run above. These artifacts store all the metadata associated with this model, the W&B runs consuming it, and the whole lineage of upstream and downstream artifacts!\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see an example model checkpoint and its in the UI here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated as suggested

"### 📈 Track key information effortlessly by default\n",
"You will also be able to view your model checkpoints in the W&B [Artifacts](https://docs.wandb.ai/guides/artifacts) page of the run above. These artifacts store all the metadata associated with this model, the W&B runs consuming it, and the whole lineage of upstream and downstream artifacts!\n",
"\n",
"Here's an example of the artifacts generated from the above run\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change text to

This image shows a summary of all the checkpoints (artifact) generated from the above run.

We should embed the image rather than link to gdrive, but ideally can we just link to this view (if project is public) so people can see output summary in the UI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added link to workspace artifacts page instead of image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants