homework_2


**📊 Homework 2: Codeforces Data Analysis**

For this homework, you’ll analyze data from the [Codeforces API](https://codeforces.com/apiHelp). The goal is to apply skills in **data extraction, wrangling, visualization**, and **interpretation**—all while exploring behavior on one of the largest competitive programming platforms.

---

### ✅ Objectives

Each student must:

1. **Extract Data from Codeforces**
   - Use the Codeforces API to gather relevant contest, user, or submission data.
   - **You must restrict your analysis to contests and user activity that occurred between July and December of 2024.**
   -  Also, only use contests that contain the strings "Hello", "Round", and "Good Bye" in their title.
   - Steps:
      - First, you will have to create a list of the contests using the [contest.list](https://codeforces.com/apiHelp/methods#contest.list) endpoint
      - Second, using the information you got in the first step, you can extract:
         - The contest's information and problems using the [contest.standings](https://codeforces.com/apiHelp/methods#contest.standings) endpoint
         - The users that participated in the contest using the [user.ratedList](https://codeforces.com/apiHelp/methods#user.ratedList) endpoint
         - The submissions using the [contest.status](https://codeforces.com/apiHelp/methods#contest.status) endpoint
         - The users' changes in points rating using the [contest.ratingChanges](https://codeforces.com/apiHelp/methods#contest.ratingChanges) endpoint.
      - It can be helpful to create a folder for each one of the contests that contains tables with the results from the endpoints

2. **Describe the Dataset**
   - In your notebook, include a markdown cell that describes:
     - The API endpoints used
     - The structure of your dataset
     - A short explanation of each variable you’re analyzing

3. **Data Wrangling**
   - Clean and transform your data:
     - Handle missing or nested fields
     - Convert timestamps
     - Create derived variables (explained further in the example table section):
        - finished_n
        - relative_time_n
        - time_to_answer_n: can be calculated from the difference between relative_time_n by sorting the values and finding the lagged differences.
        - rating_achieved


4. **Descriptive Figures and Analysis**
   - Histogram for submission times
   - Density kernel figure for users' maxmimum ratings
   - Boxplots of language vs. time_to_answer
   - **Binscatter** for rating vs time_to_answer
   - Basic **linear regression** for rating vs rating_achieved (with scatter plot and regression line)
   - **For each plot, include an interpretation**
---

### 🧾 Example of Cleaned Dataset

Your cleaned dataset may look something like this:

|       | author_handle | finished_1 | finished_2 | finished_3 | ... | 1_language        | 2_language        | ... | relative_time_1 | relative_time_2 | ... | time_to_answer_1 | ... | rating_1 | rating_2 | ... | rating_achieved | contest_id | contest_name                  | contest_start_time | country | city | rating | max_rating |
|------:|:--------------|:-----------|:-----------|:-----------|-----|:------------------|:------------------|-----|-----------------:|-----------------:|-----|------------------:|-----|----------:|----------:|-----|-----------------:|------------:|:------------------------------|--------------------:|:--------|-----:|--------:|-------------:|
| 78442 | ---0_0---     | True       | True       | False      | ... | C++20 (GCC 13-64) | C++20 (GCC 13-64) | ... |              286 |             1965 | ... |               286 | ... |      800 |     1100 | ... |             1900 |        1995 | Codeforces Round 961 (Div. 2) |         1721745300 | India   |  nan |    1615 |         1670 |
| 78443 | --Accepted--  | True       | True       | False      | ... | C++17 (GCC 7-32)  | C++17 (GCC 7-32)  | ... |              675 |             5186 | ... |               675 | ... |      800 |     1100 | ... |             1900 |        1995 | Codeforces Round 961 (Div. 2) |         1721745300 | nan     |  nan |    1260 |         1260 |

**Explanation**:
- `finished_n`: whether the user solved problem _n_ in the contest.
- `n_language`: language used to solve problem _n_.
- `relative_time_n`: time in seconds from contest start to user's submission for problem _n_.
- `time_to_answer_n`: difference between the time to answer question _n_ and whichever question was answered before it.
- `rating_n`: difficulty rating of problem _n_.
- `rating_achieved`: sum of the rating of the problems the user was able to solve.
- `contest_name`, `contest_start_time`: contextual info for labeling.
- `country`, `city`, `rating`, `max_rating`: profile metadata.

Use this structure as a **guiding example**, but feel free to adapt based on the focus of your analysis.

---

### 📌 Deliverables

- Submit a Jupyter or Colab notebook with:
  - Code for API access and analysis
  - Markdown explanations for each section
  - Clear, labeled plots
- If working in a group: include names + a short section on what each person contributed

---

### 💡 Tips for Success

- Start small! Focus on one user group or contest division.
- Use `requests` and `pandas` for API access and processing.
- Think of economic or behavioral interpretations of what you're seeing (e.g., trade-offs, learning curves, productivity).

Deadline April 17, 23:59. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

homework_2 #70

✅ Objectives

🧾 Example of Cleaned Dataset

📌 Deliverables

💡 Tips for Success

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	author_handle	finished_1	finished_2	finished_3	...	1_language	2_language	...	relative_time_1	relative_time_2	...	time_to_answer_1	...	rating_1	rating_2	...	rating_achieved	contest_id	contest_name	contest_start_time	country	city	rating	max_rating
78442	---0_0---	True	True	False	...	C++20 (GCC 13-64)	C++20 (GCC 13-64)	...	286	1965	...	286	...	800	1100	...	1900	1995	Codeforces Round 961 (Div. 2)	1721745300	India	nan	1615	1670
78443	--Accepted--	True	True	False	...	C++17 (GCC 7-32)	C++17 (GCC 7-32)	...	675	5186	...	675	...	800	1100	...	1900	1995	Codeforces Round 961 (Div. 2)	1721745300	nan	nan	1260	1260

homework_2 #70

Description

✅ Objectives

🧾 Example of Cleaned Dataset

📌 Deliverables

💡 Tips for Success

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions