Skip to content

homework_2 #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RodrigoGrijalba opened this issue Apr 9, 2025 · 0 comments
Open

homework_2 #70

RodrigoGrijalba opened this issue Apr 9, 2025 · 0 comments

Comments

@RodrigoGrijalba
Copy link

RodrigoGrijalba commented Apr 9, 2025

📊 Homework 2: Codeforces Data Analysis

For this homework, you’ll analyze data from the Codeforces API. The goal is to apply skills in data extraction, wrangling, visualization, and interpretation—all while exploring behavior on one of the largest competitive programming platforms.


✅ Objectives

Each student must:

  1. Extract Data from Codeforces

    • Use the Codeforces API to gather relevant contest, user, or submission data.
    • You must restrict your analysis to contests and user activity that occurred between July and December of 2024.
    • Also, only use contests that contain the strings "Hello", "Round", and "Good Bye" in their title.
    • Steps:
      • First, you will have to create a list of the contests using the contest.list endpoint
      • Second, using the information you got in the first step, you can extract:
      • It can be helpful to create a folder for each one of the contests that contains tables with the results from the endpoints
  2. Describe the Dataset

    • In your notebook, include a markdown cell that describes:
      • The API endpoints used
      • The structure of your dataset
      • A short explanation of each variable you’re analyzing
  3. Data Wrangling

    • Clean and transform your data:
      • Handle missing or nested fields
      • Convert timestamps
      • Create derived variables (explained further in the example table section):
        • finished_n
        • relative_time_n
        • time_to_answer_n: can be calculated from the difference between relative_time_n by sorting the values and finding the lagged differences.
        • rating_achieved
  4. Descriptive Figures and Analysis

    • Histogram for submission times
    • Density kernel figure for users' maxmimum ratings
    • Boxplots of language vs. time_to_answer
    • Binscatter for rating vs time_to_answer
    • Basic linear regression for rating vs rating_achieved (with scatter plot and regression line)
    • For each plot, include an interpretation

🧾 Example of Cleaned Dataset

Your cleaned dataset may look something like this:

author_handle finished_1 finished_2 finished_3 ... 1_language 2_language ... relative_time_1 relative_time_2 ... time_to_answer_1 ... rating_1 rating_2 ... rating_achieved contest_id contest_name contest_start_time country city rating max_rating
78442 ---0_0--- True True False ... C++20 (GCC 13-64) C++20 (GCC 13-64) ... 286 1965 ... 286 ... 800 1100 ... 1900 1995 Codeforces Round 961 (Div. 2) 1721745300 India nan 1615 1670
78443 --Accepted-- True True False ... C++17 (GCC 7-32) C++17 (GCC 7-32) ... 675 5186 ... 675 ... 800 1100 ... 1900 1995 Codeforces Round 961 (Div. 2) 1721745300 nan nan 1260 1260

Explanation:

  • finished_n: whether the user solved problem n in the contest.
  • n_language: language used to solve problem n.
  • relative_time_n: time in seconds from contest start to user's submission for problem n.
  • time_to_answer_n: difference between the time to answer question n and whichever question was answered before it.
  • rating_n: difficulty rating of problem n.
  • rating_achieved: sum of the rating of the problems the user was able to solve.
  • contest_name, contest_start_time: contextual info for labeling.
  • country, city, rating, max_rating: profile metadata.

Use this structure as a guiding example, but feel free to adapt based on the focus of your analysis.


📌 Deliverables

  • Submit a Jupyter or Colab notebook with:
    • Code for API access and analysis
    • Markdown explanations for each section
    • Clear, labeled plots
  • If working in a group: include names + a short section on what each person contributed

💡 Tips for Success

  • Start small! Focus on one user group or contest division.
  • Use requests and pandas for API access and processing.
  • Think of economic or behavioral interpretations of what you're seeing (e.g., trade-offs, learning curves, productivity).

Deadline April 17, 23:59.

The-Paul2002 added a commit that referenced this issue Apr 15, 2025
Update folder by hw2
legion8423 added a commit that referenced this issue Apr 16, 2025
boceto de tarea 2
Daf1807 added a commit that referenced this issue Apr 17, 2025
HOmework  2
Daf1807 added a commit that referenced this issue Apr 17, 2025
HOmework  2
Daf1807 added a commit that referenced this issue Apr 17, 2025
HOmework  2
Daf1807 added a commit that referenced this issue Apr 17, 2025
HOmework _2
Daf1807 added a commit that referenced this issue Apr 17, 2025
HOmework _2
Daf1807 added a commit that referenced this issue Apr 17, 2025
Homework_2
fabianlo003 added a commit that referenced this issue Apr 17, 2025
tarea 2
NadiaCopello added a commit that referenced this issue Apr 17, 2025
legion8423 added a commit that referenced this issue Apr 17, 2025
subida final
Sebasgp29 added a commit that referenced this issue Apr 18, 2025
josezh07 added a commit that referenced this issue Apr 18, 2025
update my homework
Hide801 added a commit that referenced this issue Apr 18, 2025
Les subo el jupyter y el trabajo
AbigailMontanez added a commit that referenced this issue Apr 18, 2025
JosueChumpitazi added a commit that referenced this issue Apr 18, 2025
UPDATE MI HMW 2
Victor-Arica added a commit that referenced this issue Apr 18, 2025
tarea
JosueChumpitazi added a commit that referenced this issue Apr 18, 2025
Hide801 added a commit that referenced this issue Apr 18, 2025
Solo agregue el link de la base de datos original que pesaba 5gbs . pueden revisar los cambios
Victor-Arica added a commit that referenced this issue Apr 21, 2025
2 intento
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant