realpython · martin-martin · Dec 21, 2022 · Dec 21, 2022 · Feb 6, 2023 · Feb 6, 2023
diff --git a/build-a-web-scraper/01_inspect.ipynb b/build-a-web-scraper/01_inspect.ipynb
diff --git a/build-a-web-scraper/02_scrape.ipynb b/build-a-web-scraper/02_scrape.ipynb
diff --git a/build-a-web-scraper/03_parse.ipynb b/build-a-web-scraper/03_parse.ipynb
diff --git a/build-a-web-scraper/04_pipeline.ipynb b/build-a-web-scraper/04_pipeline.ipynb
@@ -12,15 +12,28 @@
     "- Target & Save Specific Information You Want"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## ⚠️ Durabilty Warning ⚠️\n",
+    "\n",
+    "Like [mentioned in the course](https://realpython.com/lessons/challenge-of-durability/), websites frequently change. Unfortunately the job board that you'll see in the course, indeed.com, has started to block scraping of their site since the recording of the course.\n",
+    "\n",
+    "Just like in the associated written tutorial on [web scraping with beautiful soup](https://realpython.com/beautiful-soup-web-scraper-python/#scrape-the-fake-python-job-site), you can instead use [Real Python's fake jobs site](https://realpython.github.io/fake-jobs/) to practice scraping a static website.\n",
+    "\n",
+    "All the concepts discussed in the course lessons are still accurate. Translating what you see onto a different website will be a good learning opportunity where you'll have to synthesize the information and apply it practically."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Your Tasks:\n",
     "\n",
-    "- Scrape the first 100 available search results\n",
+    "- Scrape all 100 available job postings\n",
     "- Generalize your code to allow searching for different locations/jobs\n",
-    "- Pick out information about the URL, job title, and job location\n",
+    "- Pick out information about the apply URL, job title, and job location\n",
     "- Save the results to a file"
    ]
   },
@@ -40,8 +53,7 @@
    "source": [
     "### Part 1: Inspect\n",
     "\n",
-    "- How do the URLs change when you navigate to the next results page?\n",
-    "- How do the URLs change when you use a different location and/or job title search?\n",
+    "- How do the URLs change when you navigate to a job detail?\n",
     "- Which HTML elements contain the link, title, and location of each job?"
    ]
   },
@@ -58,8 +70,9 @@
    "source": [
     "### Part 2: Scrape\n",
     "\n",
-    "- Build the code to fetch the first 100 search results. This means you will need to automatically navigate to multiple results pages\n",
-    "- Write functions that allow you to specify the job title, location, and amount of results as arguments"
+    "- Build the code to fetch all 100 available job postings.\n",
+    "- Write functions that allow you to specify the job title, location, and amount of results as arguments\n",
+    "- Also fetch the information provided on each job details page. For this, you'll need to automatically follow URLs that you've fetched when getting the job postings."
    ]
   },
   {
@@ -75,8 +88,9 @@
    "source": [
     "### Part 3: Parse\n",
     "\n",
-    "- Sieve through your HTML soup to pick out only the job title, link, and location\n",
-    "- Format the results in a readable format (e.g. JSON)\n",
+    "- Sieve through your HTML soup to pick out only the job title, link, and location from the main page\n",
+    "- Sieve through the HTML of each details page to get the job description and combine it with the other information\n",
+    "- Format the results in a readable format (e.g. JSON, TXT, TOML, ...)\n",
     "- Save the results to a file"
    ]
   },
@@ -90,7 +104,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -104,7 +118,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.0"
+   "version": "3.11.0"
   }
  },
  "nbformat": 4,