-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Kaggle Notebook | Is it Chipa | Version 6
- Loading branch information
Showing
1 changed file
with
1 addition
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"pygments_lexer":"ipython3","nbconvert_exporter":"python","version":"3.6.4","file_extension":".py","codemirror_mode":{"name":"ipython","version":3},"name":"python","mimetype":"text/x-python"}},"nbformat_minor":4,"nbformat":4,"cells":[{"source":"<a href=\"https://www.kaggle.com/code/fernandoleguizamon/is-it-chipa?scriptVersionId=143397402\" target=\"_blank\"><img align=\"left\" alt=\"Kaggle\" title=\"Open in Kaggle\" src=\"https://kaggle.com/static/images/open-in-kaggle.svg\"></a>","metadata":{},"cell_type":"markdown"},{"cell_type":"markdown","source":"## Is it Chipa?","metadata":{}},{"cell_type":"code","source":"#NB: Kaggle requires phone verification to use the internet or a GPU. If you haven't done that yet, the cell below will fail\n# This code is only here to check that your internet is enabled. It doesn't do anything else.\n# Here's a help thread on getting your phone number verified: https://www.kaggle.com/product-feedback/135367\n\nimport socket,warnings\ntry:\n socket.setdefaulttimeout(1)\n socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect(('1.1.1.1', 53))\nexcept socket.error as ex: raise Exception(\"STOP: No internet. Click '>|' in top right and set 'Internet' switch to on\")","metadata":{"execution":{"iopub.status.busy":"2023-09-18T10:41:20.355347Z","iopub.execute_input":"2023-09-18T10:41:20.355648Z","iopub.status.idle":"2023-09-18T10:41:20.363478Z","shell.execute_reply.started":"2023-09-18T10:41:20.355616Z","shell.execute_reply":"2023-09-18T10:41:20.362479Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"# It's a good idea to ensure you're running the latest version of any libraries you need.\n# `!pip install -Uqq <libraries>` upgrades to the latest version of <libraries>\n# NB: You can safely ignore any warnings or errors pip spits out about running as root or incompatibilities\nimport os\niskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')\n\nif iskaggle:\n !pip install -Uqq fastai duckduckgo_search","metadata":{"_kg_hide-input":true,"_kg_hide-output":true,"execution":{"iopub.status.busy":"2023-09-18T10:41:22.73596Z","iopub.execute_input":"2023-09-18T10:41:22.736252Z","iopub.status.idle":"2023-09-18T10:41:34.795882Z","shell.execute_reply.started":"2023-09-18T10:41:22.73622Z","shell.execute_reply":"2023-09-18T10:41:34.794875Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Step 1: Download images of Chipa and Donuts","metadata":{}},{"cell_type":"code","source":"from duckduckgo_search import ddg_images\nfrom fastcore.all import *\n\ndef search_images(term, max_images=30):\n print(f\"Searching for '{term}'\")\n return L(ddg_images(term, max_results=max_images)).itemgot('image')","metadata":{"_kg_hide-input":true,"execution":{"iopub.status.busy":"2023-09-18T05:25:20.488275Z","iopub.execute_input":"2023-09-18T05:25:20.489027Z","iopub.status.idle":"2023-09-18T05:25:20.493996Z","shell.execute_reply.started":"2023-09-18T05:25:20.488987Z","shell.execute_reply":"2023-09-18T05:25:20.493121Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Let's start by searching for a chipa photo and seeing what kind of result we get. We'll start by getting URLs from a search:","metadata":{}},{"cell_type":"code","source":"#NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.\n# If you get a JSON error, just try running it again (it may take a couple of tries).\nurls = search_images('chipa paraguay', max_images=1)\nurls[0]","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:36:54.354178Z","iopub.execute_input":"2023-09-18T05:36:54.354964Z","iopub.status.idle":"2023-09-18T05:36:54.893032Z","shell.execute_reply.started":"2023-09-18T05:36:54.354924Z","shell.execute_reply":"2023-09-18T05:36:54.892012Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"...and then download a URL and take a look at it:","metadata":{}},{"cell_type":"code","source":"from fastdownload import download_url\ndest = 'chipa.jpg'\ndownload_url(urls[0], dest, show_progress=False)\n\nfrom fastai.vision.all import *\nim = Image.open(dest)\nim.to_thumb(256,256)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:37:30.713945Z","iopub.execute_input":"2023-09-18T05:37:30.714245Z","iopub.status.idle":"2023-09-18T05:37:31.104984Z","shell.execute_reply.started":"2023-09-18T05:37:30.714204Z","shell.execute_reply":"2023-09-18T05:37:31.104155Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Now let's do the same with \"donut photos\":","metadata":{}},{"cell_type":"code","source":"download_url(search_images('donut photos', max_images=1)[0], 'donut.jpg', show_progress=False)\nImage.open('donut.jpg').to_thumb(256,256)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:38:02.976397Z","iopub.execute_input":"2023-09-18T05:38:02.97673Z","iopub.status.idle":"2023-09-18T05:38:03.684898Z","shell.execute_reply.started":"2023-09-18T05:38:02.976699Z","shell.execute_reply":"2023-09-18T05:38:03.684041Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Our searches seem to be giving reasonable results, so let's grab a few examples of each of \"chipa\" and \"donut\" photos, and save each group of photos to a different folder (I'm also trying to grab a range of lighting conditions here):","metadata":{}},{"cell_type":"code","source":"searches = 'chipa','donut'\npath = Path('chipa_or_not')\nfrom time import sleep\n\nfor o in searches:\n dest = (path/o)\n dest.mkdir(exist_ok=True, parents=True)\n download_images(dest, urls=search_images(f'{o} photo'))\n sleep(10) # Pause between searches to avoid over-loading server\n download_images(dest, urls=search_images(f'{o} dark photo'))\n sleep(10)\n download_images(dest, urls=search_images(f'{o} light photo'))\n sleep(10)\n resize_images(path/o, max_size=400, dest=path/o)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:38:52.428458Z","iopub.execute_input":"2023-09-18T05:38:52.429015Z","iopub.status.idle":"2023-09-18T05:40:29.15418Z","shell.execute_reply.started":"2023-09-18T05:38:52.428974Z","shell.execute_reply":"2023-09-18T05:40:29.153055Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Step 2: Train our model","metadata":{}},{"cell_type":"markdown","source":"Some photos might not download correctly which could cause our model training to fail, so we'll remove them:","metadata":{}},{"cell_type":"code","source":"failed = verify_images(get_image_files(path))\nfailed.map(Path.unlink)\nlen(failed)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:41:13.801461Z","iopub.execute_input":"2023-09-18T05:41:13.801772Z","iopub.status.idle":"2023-09-18T05:41:14.362894Z","shell.execute_reply.started":"2023-09-18T05:41:13.801739Z","shell.execute_reply":"2023-09-18T05:41:14.362008Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"To train a model, we'll need `DataLoaders`, which is an object that contains a *training set* (the images used to create a model) and a *validation set* (the images used to check the accuracy of a model -- not used during training). In `fastai` we can create that easily using a `DataBlock`, and view sample images from it:","metadata":{}},{"cell_type":"code","source":"dls = DataBlock(\n blocks=(ImageBlock, CategoryBlock), \n get_items=get_image_files, \n splitter=RandomSplitter(valid_pct=0.2, seed=42),\n get_y=parent_label,\n item_tfms=[Resize(192, method='squish')]\n).dataloaders(path, bs=32)\n\ndls.show_batch(max_n=6)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:48:00.192221Z","iopub.execute_input":"2023-09-18T05:48:00.192872Z","iopub.status.idle":"2023-09-18T05:48:00.851899Z","shell.execute_reply.started":"2023-09-18T05:48:00.192831Z","shell.execute_reply":"2023-09-18T05:48:00.85127Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Here what each of the `DataBlock` parameters means:\n\n blocks=(ImageBlock, CategoryBlock),\n\nThe inputs to our model are images, and the outputs are categories (in this case, \"chipa\" or \"donut\").\n\n get_items=get_image_files, \n\nTo find all the inputs to our model, run the `get_image_files` function (which returns a list of all image files in a path).\n\n splitter=RandomSplitter(valid_pct=0.2, seed=42),\n\nSplit the data into training and validation sets randomly, using 20% of the data for the validation set.\n\n get_y=parent_label,\n\nThe labels (`y` values) is the name of the `parent` of each file (i.e. the name of the folder they're in, which will be *chipa* or *donut*).\n\n item_tfms=[Resize(192, method='squish')]\n\nBefore training, resize each image to 192x192 pixels by \"squishing\" it (as opposed to cropping it).","metadata":{}},{"cell_type":"markdown","source":"Now we're ready to train our model. The fastest widely used computer vision model is `resnet18`. You can train this in a few minutes, even on a CPU! (On a GPU, it generally takes under 10 seconds...)\n\n`fastai` comes with a helpful `fine_tune()` method which automatically uses best practices for fine tuning a pre-trained model, so we'll use that.","metadata":{}},{"cell_type":"code","source":"learn = vision_learner(dls, resnet18, metrics=error_rate)\nlearn.fine_tune(10)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:47:04.108884Z","iopub.execute_input":"2023-09-18T05:47:04.109486Z","iopub.status.idle":"2023-09-18T05:47:17.545921Z","shell.execute_reply.started":"2023-09-18T05:47:04.109445Z","shell.execute_reply":"2023-09-18T05:47:17.545036Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Step 3: Use our model","metadata":{}},{"cell_type":"markdown","source":"Let's see what our model thinks about that chipa we downloaded at the start:","metadata":{}},{"cell_type":"code","source":"is_chipa,_,probs = learn.predict(PILImage.create('chipa.jpg'))\nprint(f\"This is a: {is_chipa}.\")\nprint(f\"Probability this are chipa: {probs[0]:.4f}\")","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:47:22.502388Z","iopub.execute_input":"2023-09-18T05:47:22.503207Z","iopub.status.idle":"2023-09-18T05:47:22.726324Z","shell.execute_reply.started":"2023-09-18T05:47:22.50316Z","shell.execute_reply":"2023-09-18T05:47:22.72544Z"},"trusted":true},"execution_count":null,"outputs":[]}]} |