Kaggle Notebook | Is it Chipa | Version 6

Gords · Sep 18, 2023 · 92ee3b5 · 92ee3b5
1 parent 51a545e
commit 92ee3b5
Showing 1 changed file with 1 addition and 0 deletions.
diff --git a/is-it-chipa.ipynb b/is-it-chipa.ipynb
@@ -0,0 +1 @@
+{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"pygments_lexer":"ipython3","nbconvert_exporter":"python","version":"3.6.4","file_extension":".py","codemirror_mode":{"name":"ipython","version":3},"name":"python","mimetype":"text/x-python"}},"nbformat_minor":4,"nbformat":4,"cells":[{"source":"<a href=\"https://www.kaggle.com/code/fernandoleguizamon/is-it-chipa?scriptVersionId=143397402\" target=\"_blank\"><img align=\"left\" alt=\"Kaggle\" title=\"Open in Kaggle\" src=\"https://kaggle.com/static/images/open-in-kaggle.svg\"></a>","metadata":{},"cell_type":"markdown"},{"cell_type":"markdown","source":"## Is it Chipa?","metadata":{}},{"cell_type":"code","source":"#NB: Kaggle requires phone verification to use the internet or a GPU. If you haven't done that yet, the cell below will fail\n#    This code is only here to check that your internet is enabled. It doesn't do anything else.\n#    Here's a help thread on getting your phone number verified: https://www.kaggle.com/product-feedback/135367\n\nimport socket,warnings\ntry:\n    socket.setdefaulttimeout(1)\n    socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect(('1.1.1.1', 53))\nexcept socket.error as ex: raise Exception(\"STOP: No internet. Click '>|' in top right and set 'Internet' switch to on\")","metadata":{"execution":{"iopub.status.busy":"2023-09-18T10:41:20.355347Z","iopub.execute_input":"2023-09-18T10:41:20.355648Z","iopub.status.idle":"2023-09-18T10:41:20.363478Z","shell.execute_reply.started":"2023-09-18T10:41:20.355616Z","shell.execute_reply":"2023-09-18T10:41:20.362479Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"code","source":"# It's a good idea to ensure you're running the latest version of any libraries you need.\n# `!pip install -Uqq <libraries>` upgrades to the latest version of <libraries>\n# NB: You can safely ignore any warnings or errors pip spits out about running as root or incompatibilities\nimport os\niskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')\n\nif iskaggle:\n    !pip install -Uqq fastai duckduckgo_search","metadata":{"_kg_hide-input":true,"_kg_hide-output":true,"execution":{"iopub.status.busy":"2023-09-18T10:41:22.73596Z","iopub.execute_input":"2023-09-18T10:41:22.736252Z","iopub.status.idle":"2023-09-18T10:41:34.795882Z","shell.execute_reply.started":"2023-09-18T10:41:22.73622Z","shell.execute_reply":"2023-09-18T10:41:34.794875Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Step 1: Download images of Chipa and Donuts","metadata":{}},{"cell_type":"code","source":"from duckduckgo_search import ddg_images\nfrom fastcore.all import *\n\ndef search_images(term, max_images=30):\n    print(f\"Searching for '{term}'\")\n    return L(ddg_images(term, max_results=max_images)).itemgot('image')","metadata":{"_kg_hide-input":true,"execution":{"iopub.status.busy":"2023-09-18T05:25:20.488275Z","iopub.execute_input":"2023-09-18T05:25:20.489027Z","iopub.status.idle":"2023-09-18T05:25:20.493996Z","shell.execute_reply.started":"2023-09-18T05:25:20.488987Z","shell.execute_reply":"2023-09-18T05:25:20.493121Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Let's start by searching for a chipa photo and seeing what kind of result we get. We'll start by getting URLs from a search:","metadata":{}},{"cell_type":"code","source":"#NB: `search_images` depends on duckduckgo.com, which doesn't always return correct responses.\n#    If you get a JSON error, just try running it again (it may take a couple of tries).\nurls = search_images('chipa paraguay', max_images=1)\nurls[0]","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:36:54.354178Z","iopub.execute_input":"2023-09-18T05:36:54.354964Z","iopub.status.idle":"2023-09-18T05:36:54.893032Z","shell.execute_reply.started":"2023-09-18T05:36:54.354924Z","shell.execute_reply":"2023-09-18T05:36:54.892012Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"...and then download a URL and take a look at it:","metadata":{}},{"cell_type":"code","source":"from fastdownload import download_url\ndest = 'chipa.jpg'\ndownload_url(urls[0], dest, show_progress=False)\n\nfrom fastai.vision.all import *\nim = Image.open(dest)\nim.to_thumb(256,256)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:37:30.713945Z","iopub.execute_input":"2023-09-18T05:37:30.714245Z","iopub.status.idle":"2023-09-18T05:37:31.104984Z","shell.execute_reply.started":"2023-09-18T05:37:30.714204Z","shell.execute_reply":"2023-09-18T05:37:31.104155Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Now let's do the same with \"donut photos\":","metadata":{}},{"cell_type":"code","source":"download_url(search_images('donut photos', max_images=1)[0], 'donut.jpg', show_progress=False)\nImage.open('donut.jpg').to_thumb(256,256)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:38:02.976397Z","iopub.execute_input":"2023-09-18T05:38:02.97673Z","iopub.status.idle":"2023-09-18T05:38:03.684898Z","shell.execute_reply.started":"2023-09-18T05:38:02.976699Z","shell.execute_reply":"2023-09-18T05:38:03.684041Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Our searches seem to be giving reasonable results, so let's grab a few examples of each of \"chipa\" and \"donut\" photos, and save each group of photos to a different folder (I'm also trying to grab a range of lighting conditions here):","metadata":{}},{"cell_type":"code","source":"searches = 'chipa','donut'\npath = Path('chipa_or_not')\nfrom time import sleep\n\nfor o in searches:\n    dest = (path/o)\n    dest.mkdir(exist_ok=True, parents=True)\n    download_images(dest, urls=search_images(f'{o} photo'))\n    sleep(10)  # Pause between searches to avoid over-loading server\n    download_images(dest, urls=search_images(f'{o} dark photo'))\n    sleep(10)\n    download_images(dest, urls=search_images(f'{o} light photo'))\n    sleep(10)\n    resize_images(path/o, max_size=400, dest=path/o)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:38:52.428458Z","iopub.execute_input":"2023-09-18T05:38:52.429015Z","iopub.status.idle":"2023-09-18T05:40:29.15418Z","shell.execute_reply.started":"2023-09-18T05:38:52.428974Z","shell.execute_reply":"2023-09-18T05:40:29.153055Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Step 2: Train our model","metadata":{}},{"cell_type":"markdown","source":"Some photos might not download correctly which could cause our model training to fail, so we'll remove them:","metadata":{}},{"cell_type":"code","source":"failed = verify_images(get_image_files(path))\nfailed.map(Path.unlink)\nlen(failed)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:41:13.801461Z","iopub.execute_input":"2023-09-18T05:41:13.801772Z","iopub.status.idle":"2023-09-18T05:41:14.362894Z","shell.execute_reply.started":"2023-09-18T05:41:13.801739Z","shell.execute_reply":"2023-09-18T05:41:14.362008Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"To train a model, we'll need `DataLoaders`, which is an object that contains a *training set* (the images used to create a model) and a *validation set* (the images used to check the accuracy of a model -- not used during training). In `fastai` we can create that easily using a `DataBlock`, and view sample images from it:","metadata":{}},{"cell_type":"code","source":"dls = DataBlock(\n    blocks=(ImageBlock, CategoryBlock), \n    get_items=get_image_files, \n    splitter=RandomSplitter(valid_pct=0.2, seed=42),\n    get_y=parent_label,\n    item_tfms=[Resize(192, method='squish')]\n).dataloaders(path, bs=32)\n\ndls.show_batch(max_n=6)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:48:00.192221Z","iopub.execute_input":"2023-09-18T05:48:00.192872Z","iopub.status.idle":"2023-09-18T05:48:00.851899Z","shell.execute_reply.started":"2023-09-18T05:48:00.192831Z","shell.execute_reply":"2023-09-18T05:48:00.85127Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"Here what each of the `DataBlock` parameters means:\n\n    blocks=(ImageBlock, CategoryBlock),\n\nThe inputs to our model are images, and the outputs are categories (in this case, \"chipa\" or \"donut\").\n\n    get_items=get_image_files, \n\nTo find all the inputs to our model, run the `get_image_files` function (which returns a list of all image files in a path).\n\n    splitter=RandomSplitter(valid_pct=0.2, seed=42),\n\nSplit the data into training and validation sets randomly, using 20% of the data for the validation set.\n\n    get_y=parent_label,\n\nThe labels (`y` values) is the name of the `parent` of each file (i.e. the name of the folder they're in, which will be *chipa* or *donut*).\n\n    item_tfms=[Resize(192, method='squish')]\n\nBefore training, resize each image to 192x192 pixels by \"squishing\" it (as opposed to cropping it).","metadata":{}},{"cell_type":"markdown","source":"Now we're ready to train our model. The fastest widely used computer vision model is `resnet18`. You can train this in a few minutes, even on a CPU! (On a GPU, it generally takes under 10 seconds...)\n\n`fastai` comes with a helpful `fine_tune()` method which automatically uses best practices for fine tuning a pre-trained model, so we'll use that.","metadata":{}},{"cell_type":"code","source":"learn = vision_learner(dls, resnet18, metrics=error_rate)\nlearn.fine_tune(10)","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:47:04.108884Z","iopub.execute_input":"2023-09-18T05:47:04.109486Z","iopub.status.idle":"2023-09-18T05:47:17.545921Z","shell.execute_reply.started":"2023-09-18T05:47:04.109445Z","shell.execute_reply":"2023-09-18T05:47:17.545036Z"},"trusted":true},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":"## Step 3: Use our model","metadata":{}},{"cell_type":"markdown","source":"Let's see what our model thinks about that chipa we downloaded at the start:","metadata":{}},{"cell_type":"code","source":"is_chipa,_,probs = learn.predict(PILImage.create('chipa.jpg'))\nprint(f\"This is a: {is_chipa}.\")\nprint(f\"Probability this are chipa: {probs[0]:.4f}\")","metadata":{"execution":{"iopub.status.busy":"2023-09-18T05:47:22.502388Z","iopub.execute_input":"2023-09-18T05:47:22.503207Z","iopub.status.idle":"2023-09-18T05:47:22.726324Z","shell.execute_reply.started":"2023-09-18T05:47:22.50316Z","shell.execute_reply":"2023-09-18T05:47:22.72544Z"},"trusted":true},"execution_count":null,"outputs":[]}]}