league-eda

This is a project for DSC80 at UCSD

Introduction and Question Identification

Introduction :

This dataset contains professional leagues matches from the year 2022. For each game, there are 10 rows representing 10 players and 2 rows representing 2 teams. And there are various columns for statistic of each game such as goldat10, dragon, assistsat10, etc.

Question:

I personally have harder time playing on the Red side (upper right) than on the Blue side (lower left) because of the difference in perspective. Will this be true for professional players? To test this statistically, does the side have an effect on difference in golds of each side at 10 minutes since a game starts?

Why should you care about the question?

It is because side shouldn't affect fairness of a game by resulting difference in gold amount.

Num of rows

149,400 rows

Relevant columns:

gameid, result, side, golddiffat10, golddiffat15

Descriptions of each column:

gameid: unique id for each game
result: whether a team win or not, 0 for lose and 1 for win
side: blue or red, blue for lower left and red for upper right
golddiffat10: goldat10 - opp_goldat10 (=self - opponent goldat10), 10 signifies 10 minutes since game starts
golddiffat15: same as above but 15 minutes since game starts

Cleaning and EDA

Data Cleaning

gameid	result	side	golddiffat10	golddiffat15
ESPORTSTMNT01_2690210	False	Blue	1523	107
ESPORTSTMNT01_2690210	True	Red	-1523	-107
ESPORTSTMNT01_2690219	False	Blue	-1619	-1763
ESPORTSTMNT01_2690219	True	Red	1619	1763
8401-8401_game_1	True	Blue	nan	nan

Since each game is represented with 12 rows (10 players and 2 teams), we dropped players rows to avoid double counting.
Since all statistics from each game are recorded but we are only interested in gameid, result, side, golddiffat10, and golddiffat15, we only selected these columns.
For result column, entries are either 0 for lose and 1 for win. We turned each 0 and 1 to False and True.

Univariate Analysis

It is a histogram of golddiffat10, and it's normally distributed and centered at 0. It goes up to 9000 and low to -9000.

Bivariate Analysis

The plot contains two boxplots of golddiffat10 by Blue and Red team. Both are approximately centered at 0. However, Blue team's median is slightly right of 0, whereas Red team's median is slightly left of 0. Several outliers exist in both Blue and Red team, but Blue team's outliers are skewed more to the right than Red team's. 50% of data points are within -1,000 and 1,000.

Interesting Aggregates

team_lol.pivot_table(index='result', columns='side', values=feature, aggfunc='mean')

result	Blue	Red
Lose	-632.109	-743.577
Win	743.716	632.499

Significance:

If side does not affect golddiffat10, ith entry of two columns should be similar. But since they are not, it shows there might be a possibility that side affects golddiffat10.

Assessment of Missingness

NMAR Analysis

We believe 'url' column is NMAR because a game with lower popularity is less likely to have url. We thought this is a valid reason since people won't likely to search for not popular games. So if we can get amount of prize you get from winning or popularity of different entries in 'league' column, we might be able to explain the missingness.

Missingness Dependency

For each game number, golddiffat10's missingness changes. Thus, it seems like the missingness of golddiffat10 depends on game number.

Since the observed TVD is far right from the right end point of empirical distribution, we reject the null hypothesis. Thus, it suggests that the missingness of golddiffat10 depends on game value.

For each result, golddiffat10's missingness does not change. Thus, it seems like result does not explain the missingness of golddiffat10.

Since the observed TVD is inside the empirical distribution, the area of the empirical distribution to the right of the observed TVD is greater than 0.05. Thus, we keep the null hypothesis, which suggests golddiffat10's missingness does not depend on result values.

Hypothesis Testing

Null hypothesis :

We assume that two sides (Blue and Red) should have about the same gold at 10 minutes since a game starts. Thus, golddiffat10 should have a mean of 0.

Alternative hypothesis :

golddiffat10's mean is larger than 0.

Test statistic :

We chose mean as our test statistic because the distribution of golddiffat10 is roughly normal and we are comparing quantitative difference between two sides. Also, since we are specifically looking for 'larger than 0', we didn't use absolute value to keep direction.

Significance level :

0.01

Resulting p-value :

0.0

Conclusion :

Since p-value (0.0) is less than 0.01, we reject the null hypothesis. This suggests that the Blue side has more goldat10 than the Red side.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

league-eda

This is a project for DSC80 at UCSD

Introduction and Question Identification

Introduction :

Question:

Why should you care about the question?

Num of rows

Relevant columns:

Descriptions of each column:

Cleaning and EDA

Data Cleaning

Univariate Analysis

Bivariate Analysis

Interesting Aggregates

Significance:

Assessment of Missingness

NMAR Analysis

Missingness Dependency

Hypothesis Testing

Null hypothesis :

Alternative hypothesis :

Test statistic :

Significance level :

Resulting p-value :

Conclusion :

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

league-eda

This is a project for DSC80 at UCSD

Introduction and Question Identification

Introduction :

Question:

Why should you care about the question?

Num of rows

Relevant columns:

Descriptions of each column:

Cleaning and EDA

Data Cleaning

Univariate Analysis

Bivariate Analysis

Interesting Aggregates

Significance:

Assessment of Missingness

NMAR Analysis

Missingness Dependency

Hypothesis Testing

Null hypothesis :

Alternative hypothesis :

Test statistic :

Significance level :

Resulting p-value :

Conclusion :

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages