-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds new artifacts colab. #526
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thanks for contributing to GuidelinesThe examples repo is regularly tested against the ever-evolving ML stack. To facilitate our work, please adhere to the following guidelines:
is acceptable, but
is not. Avoid spaces and the
Before marking the PR as ready for review, please run your notebook one more time. Restart the Colab and run all. We will provide you with links to open the Colabs belowThe following colabs were changed |
Hey @katjacksonWB - some feedback from reviewing the whole thing:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a comment with some requested changes!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left another round of comments!
Ping me when ready for final review/merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we don't show the classic:
# you can log using the one-liner:
wandb.log_artifact("file.csv", name="my_artifact", type="data")
# or
at = Artifact(name="my_artifact", type="data")
at.add_file("file.csv")
# add_dir(...)
wandb.log_artifact(at)
- Shouldn't this live in
wandb-artifacts
? - Please also replace the "old outdated one"
is referring to a specific line? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last request on lineage
"source": [ | ||
"You can also manage your Artifacts via the W&B platform. This can give you insight into your model's performance or dataset versioning. To navigate to the relevant information, click this [link](https://wandb.ai/wandb/artifact-basics/overview), then click on the **Artifacts** tab.\n", | ||
"\n", | ||
"Navigating to the **Lineage** section in the tab will show the dependency graph formed by calling `run.use_artifact()` when an Artifact is an input to a run, and `run.log_artifact()` when an Artifact is output to a run. This helps visualize the relationship between different model versions and other objects like datasets and jobs in your project. Click [this](https://wandb.ai/wandb/artifact-basics/artifacts/dataset/my_first_artifact/v0/lineage) link to navigate to the project's lineage page." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make sure we include a screenshot of a more complex lineage for a user to explore, and also link the relevant project (probably the artifact workflow project)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap, I am missing some screenshots. Add those or from the docs or upload them as files in the same folder.
" inplace=True)\n", | ||
"csvData.to_csv(\"/content/sample_data/california_housing_test.csv\") # overwrites file with the sorted data\n", | ||
"# adds the new file to the artifact\n", | ||
"run = wandb.init(project=\"artifact-basics\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think would be better here to init the run at the start of the code block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- maybe split in 2 cells.
csv_data
instead ofcsvData
.
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now the sorted file will be logged in `my_first_artifact`. Any changes you log to an artifact will overwrite any older version. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm the wording of overwriting old versions may be confusing to users (line 228 and 236) as we don't really overwrite the old version, we create a new version instead. Overwriting to me implies it replaces the previous.
"artifact = run.use_artifact(artifact_or_name=\"my_first_artifact:latest\")\n", | ||
"# This will download the specified artifact to where your code is running\n", | ||
"datadir = artifact.download()\n", | ||
"run.finish()\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest moving the run.finish() to the end of the block as it is usually better practice (e.g., in this case, it would capture what is printed out)
"source": [ | ||
"You can also manage your Artifacts via the W&B platform. This can give you insight into your model's performance or dataset versioning. To navigate to the relevant information, click this [link](https://wandb.ai/wandb/artifact-basics/overview), then click on the **Artifacts** tab.\n", | ||
"\n", | ||
"Navigating to the **Lineage** section in the tab will show the dependency graph formed by calling `run.use_artifact()` when an Artifact is an input to a run, and `run.log_artifact()` when an Artifact is output to a run. This helps visualize the relationship between different model versions and other objects like datasets and jobs in your project. Click [this](https://wandb.ai/wandb/artifact-basics/artifacts/dataset/my_first_artifact/v0/lineage) link to navigate to the project's lineage page." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following wording here may be a little confusing to some users:
"run.log_artifact()
when an Artifact is output to a run"
I think "to a run" should be "of a run"?
I would also use this PR to remove/replace old stuff in wandb-artifacts (and put this file in there as a getting started) |
"outputs": [], | ||
"source": [ | ||
"!pip install wandb\n", | ||
"import wandb\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split pip install from import wandb
please
"The general workflow for creating an Artifact is:\n", | ||
"\n", | ||
"\n", | ||
"1. Intialize a run.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just feel like we are not following with code this plan.
" inplace=True)\n", | ||
"csvData.to_csv(\"/content/sample_data/california_housing_test.csv\") # overwrites file with the sorted data\n", | ||
"# adds the new file to the artifact\n", | ||
"run = wandb.init(project=\"artifact-basics\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- maybe split in 2 cells.
csv_data
instead ofcsvData
.
"source": [ | ||
"run = wandb.init(project=\"artifact-basics\")\n", | ||
"artifact = run.use_artifact(artifact_or_name=\"my_first_artifact:latest\")\n", | ||
"# This will download the specified artifact to where your code is running\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add blanck line before the comment for readability
"source": [ | ||
"You can also manage your Artifacts via the W&B platform. This can give you insight into your model's performance or dataset versioning. To navigate to the relevant information, click this [link](https://wandb.ai/wandb/artifact-basics/overview), then click on the **Artifacts** tab.\n", | ||
"\n", | ||
"Navigating to the **Lineage** section in the tab will show the dependency graph formed by calling `run.use_artifact()` when an Artifact is an input to a run, and `run.log_artifact()` when an Artifact is output to a run. This helps visualize the relationship between different model versions and other objects like datasets and jobs in your project. Click [this](https://wandb.ai/wandb/artifact-basics/artifacts/dataset/my_first_artifact/v0/lineage) link to navigate to the project's lineage page." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap, I am missing some screenshots. Add those or from the docs or upload them as files in the same folder.
Hey @rymc - would you be up to revising the changes you proposed directly? Katherine is out on medical leave so I am trying to get some support with wrapping up her in-flight docs PR so we can get the new and improved Artifacts colab out. If it is easy enough to make those fixes directly that would be a huge help so I can focus on some of the other docs work. |
Hey @noaleetz done. Addressed comments and confirmed working on Colab. |
Ryan you are awesome, thank you so so much. I will give the colab a final run myself, but we should be good to merge. @ngrayluna can I ask you to give your review and stamp as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocker: Notebook needs to be executable
"outputs": [], | ||
"source": [ | ||
"run = wandb.init(project=\"artifact-basics\")\n", | ||
"run.log_artifact(artifact_or_path=\"/content/sample_data/mnist_test.csv\", name=\"my_first_artifact\", type=\"dataset\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notebooks need to be executable...we'll want to use a real dataset before merging this in.
Ah, good point @ngrayluna. I've pushed a new version that makes the Colab notebook executable regardless of where it runs. |
PR for small nits: #548 |
can you make both wandbcode consistent? |
Not sure I follow? |
Adds a new Artifacts colab to replace the old, outdated one linked on the Artifacts landing page.