Skip to content

homework_1Β #13

@alexanderquispe

Description

@alexanderquispe

πŸ“š Data Science Homework 1 β€” Web Scraping

πŸ”§ Instructions

Dear all,

Please follow the instructions below to complete Homework 1:


1. Git Branch and Folder Structure

  • Each student must create a new branch named:

    [UPID]_hw1_2025_1
    

    Example: 123456_hw1_2025_1

  • Using this branch, create a folder named exactly as your branch in the folder hw1:

    123456_hw1_2025_1/
    
  • Inside your folder, include the following:

    • requirements.txt
    • Your scraping code (your jupyter notebook). The format name should be 123456_hw1_2025_1.ipynb
    • Your resulting CSV file
  • Save everything under the main homework1/ directory in the repo.


2. Task Description

Scrape all Data Science job offers from the Bumeran platform that match the following filters(using code not by hand!):

Filters Image


3. Suggested Scraping Strategy (Two Stages)

βœ… Stage 1: Extract Job Posting Links

  • Scrape all the job listing URLs based on the filters above.
  • Navigate across all pages if necessary.

Stage 1 Image

βœ… Stage 2: Scrape Job Details

  • For each job URL collected in Stage 1, extract the following:
    • Job Title
    • Description (up to the "Benefits" section)
    • District
    • Work Mode (e.g., on-site, remote, hybrid)

Stage 2 Image


4. Output

  • Your final output must be a CSV file with the following columns:
    Job Title | Description | District | Work Mode
    

5. πŸ“Ή Short Explanation Video

  • Create a 3-minute video explaining your work.

  • Your video should include:

    • A short explanation of your environment setup.
    • A walk-through of your code and any specific functions/classes you used.
    • A sample run showing the output.
  • Upload your video link to the next Google sheet.


Deadline - April 2 23:59 p.m. NO EXTENSION!
Let us know if you have any questions. Good luck!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions