Skip to content

homework_2 #70

Open
Open
@RodrigoGrijalba

Description

@RodrigoGrijalba

📊 Homework 2: Codeforces Data Analysis

For this homework, you’ll analyze data from the Codeforces API. The goal is to apply skills in data extraction, wrangling, visualization, and interpretation—all while exploring behavior on one of the largest competitive programming platforms.


✅ Objectives

Each student must:

  1. Extract Data from Codeforces

    • Use the Codeforces API to gather relevant contest, user, or submission data.
    • You must restrict your analysis to contests and user activity that occurred between July and December of 2024.
    • Also, only use contests that contain the strings "Hello", "Round", and "Good Bye" in their title.
    • Steps:
      • First, you will have to create a list of the contests using the contest.list endpoint
      • Second, using the information you got in the first step, you can extract:
      • It can be helpful to create a folder for each one of the contests that contains tables with the results from the endpoints
  2. Describe the Dataset

    • In your notebook, include a markdown cell that describes:
      • The API endpoints used
      • The structure of your dataset
      • A short explanation of each variable you’re analyzing
  3. Data Wrangling

    • Clean and transform your data:
      • Handle missing or nested fields
      • Convert timestamps
      • Create derived variables (explained further in the example table section):
        • finished_n
        • relative_time_n
        • time_to_answer_n: can be calculated from the difference between relative_time_n by sorting the values and finding the lagged differences.
        • rating_achieved
  4. Descriptive Figures and Analysis

    • Histogram for submission times
    • Density kernel figure for users' maxmimum ratings
    • Boxplots of language vs. time_to_answer
    • Binscatter for rating vs time_to_answer
    • Basic linear regression for rating vs rating_achieved (with scatter plot and regression line)
    • For each plot, include an interpretation

🧾 Example of Cleaned Dataset

Your cleaned dataset may look something like this:

author_handle finished_1 finished_2 finished_3 ... 1_language 2_language ... relative_time_1 relative_time_2 ... time_to_answer_1 ... rating_1 rating_2 ... rating_achieved contest_id contest_name contest_start_time country city rating max_rating
78442 ---0_0--- True True False ... C++20 (GCC 13-64) C++20 (GCC 13-64) ... 286 1965 ... 286 ... 800 1100 ... 1900 1995 Codeforces Round 961 (Div. 2) 1721745300 India nan 1615 1670
78443 --Accepted-- True True False ... C++17 (GCC 7-32) C++17 (GCC 7-32) ... 675 5186 ... 675 ... 800 1100 ... 1900 1995 Codeforces Round 961 (Div. 2) 1721745300 nan nan 1260 1260

Explanation:

  • finished_n: whether the user solved problem n in the contest.
  • n_language: language used to solve problem n.
  • relative_time_n: time in seconds from contest start to user's submission for problem n.
  • time_to_answer_n: difference between the time to answer question n and whichever question was answered before it.
  • rating_n: difficulty rating of problem n.
  • rating_achieved: sum of the rating of the problems the user was able to solve.
  • contest_name, contest_start_time: contextual info for labeling.
  • country, city, rating, max_rating: profile metadata.

Use this structure as a guiding example, but feel free to adapt based on the focus of your analysis.


📌 Deliverables

  • Submit a Jupyter or Colab notebook with:
    • Code for API access and analysis
    • Markdown explanations for each section
    • Clear, labeled plots
  • If working in a group: include names + a short section on what each person contributed

💡 Tips for Success

  • Start small! Focus on one user group or contest division.
  • Use requests and pandas for API access and processing.
  • Think of economic or behavioral interpretations of what you're seeing (e.g., trade-offs, learning curves, productivity).

Deadline April 17, 23:59.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions