Skip to content

Commit 0feb8af

Browse files
authored
Merge pull request #1 from Wikia/mvp-library
Add MVP version of the library
2 parents ea0ba9c + e60407b commit 0feb8af

16 files changed

+3376
-0
lines changed

.DS_Store

8 KB
Binary file not shown.

.gitignore

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# IDE
2+
.idea/*
3+
example/.DS_Store
4+
example/.ipynb_checkpoints
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
matplotlib==3.1.2
2+
pandas==0.25.3
3+
numpy==1.17.4
4+
scipy==1.3.3
5+
pathlib==1.0.1

LICENCE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2019 FANDOM
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+116
Original file line numberDiff line numberDiff line change
@@ -1 +1,117 @@
11
# AlphaB
2+
3+
## Python library for rendering charts and computing statistics for A/B testing
4+
5+
### Current state
6+
7+
This library is in the very initial state. Currently, it supports only A/B testings (two groups).
8+
9+
### AlphaB allows you to:
10+
11+
* Automatically generate charts from A/B testings
12+
* Compute statistics in order to confirm a statistical significance between groups
13+
14+
15+
## Table of content
16+
17+
* [Requirements](#requirements)
18+
* [How to use it](#how-to-use-it)
19+
20+
## Requirements
21+
22+
You can directly install all of the requirements for AlphaB by running `pip install -r requirements.txt` from the root of the repository.
23+
24+
* [Matplotlib](https://matplotlib.org/) - a library to generate charts from data sets
25+
* [Pandas](https://pandas.pydata.org/) - a library providing high-performance, easy-to-use data structures and data analysis tools
26+
* [Numpy](https://numpy.org/) - a library providing support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions
27+
* [Scipy](https://www.scipy.org/) - a library used for scientific computing and technical computing
28+
* [Pathlib](https://docs.python.org/3/library/pathlib.html) - offers a set of classes to handle filesystem paths
29+
30+
```python
31+
import pandas as pd
32+
import numpy as np
33+
import matplotlib.pyplot as plt
34+
from scipy import stats
35+
from pathlib import Path
36+
```
37+
38+
## How to use it
39+
40+
It is highly recommended to use [Jupyter](https://jupyter.org/) to perform A/B testing analysis in Python, and AlphaB is built to be used in Jupyter Notebooks.
41+
42+
Here is an example usage for AlphaB (this example doesn't include specifying a data set for now):
43+
44+
```python
45+
#!/usr/bin/env python3
46+
47+
from alphab import BucketTest
48+
import pandas as pd
49+
50+
51+
def main():
52+
df = pd.DataFrame()
53+
bucket_test = BucketTest(
54+
df=df,
55+
variable='impressions',
56+
group='design',
57+
x_axis='date',
58+
custom_title='Impressions by design',
59+
custom_ylabel='#',
60+
custom_day_interval=1
61+
)
62+
bucket_test.render()
63+
bucket_test.compute_pvalues()
64+
```
65+
66+
### Arguments
67+
68+
When creating a bucket test, you can specify the following arguments:
69+
70+
* `df` - data frame to be used for the bucket test. It is recommended to group the data frame before passing it (e.g.: When doing a bucket test on the group `design`, you should group the data frame by design and date first)
71+
* `variable` - specifies the values on the y-axis for the chart and statistical significance check
72+
* `group` - the name of the column which the data frame is grouped by
73+
* `x_axis` (default: `date`) - specifies the values on the x_axis for the chart
74+
* `custom_title` (default: `{variable} per {group}`) - specifies the title for the chart
75+
* `custom_day_interval` (default: 1) - specifies the difference between the dates on the x-axis
76+
77+
For the `render()` method, those options can be specified to customize your chart:
78+
79+
* `figure_size_x` (default: 12) - the width of the chart (in inches)
80+
* `figure_size_y` (default: 5) - the height of the chart (in inches)
81+
* `line_width` (default: 3) - the line width in a line chart (in points)
82+
* `title_font_size` (default: 16) - the font size of the title in the figure
83+
* `legend_font_size` (default: 14) - the font size of the legend in the figure
84+
* `rotation` (default: 30) - the rotation of the x ticks (in degrees)
85+
86+
In the `compute_pvalues()` method, you can customize the p-value used to reject a null hypothesis by adjusting the `alpha` value (default: `0.01`).
87+
Recommended values are: `0.01`, `0.05`, `0.1`.
88+
Read more about statistical significance and p-value [here](https://www.statsdirect.com/help/basics/p_values.htm).
89+
[This research paper](http://www.scielo.br/pdf/bpsr/v7n1/02.pdf) is also a good place to start for those who want to better understand those topics.
90+
91+
### Screenshots
92+
93+
A generated chart and statistical significance analysis example:
94+
95+
<p align="center">
96+
<img width="100%" src="example/example_impressions_by_group.png" />
97+
</p>
98+
99+
## Next steps
100+
101+
* Customize the number of groups that are taken into account A/B/C testings, A/B/C/D testings, A/B/C/D/E testings
102+
* Render charts and compute p-values for data from more than one data frame
103+
* Create tests for `render` and `compute_pvalue` methods
104+
* Handle other `x_axis` that date only
105+
* Customize names of images `plt.savefig(Path(""))`
106+
107+
## How to contribute
108+
109+
You can contribute by forking this repository, looking through the [issues](https://github.com/Wikia/AlphaB/issues) of the repository, and opening a PR on your fork. Please make sure to write a clear PR description and to provide examples for how your new feature works.
110+
111+
## Contributors
112+
113+
* The method for checking statistical significance was highly inspired by the work of **Paulina Gralak [@Loczi94](https://github.com/Loczi94)**.
114+
115+
Thanks a lot!
116+
117+
The creator and maintainer: Julia Jakubczak [@veliona](https://github.com/veliona)

alphab/.DS_Store

6 KB
Binary file not shown.

alphab/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# __init__.py
2+
from .alphab import BucketTest

alphab/alphab.py

+95
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
#!/usr/bin/env python3
2+
3+
from pathlib import Path
4+
5+
import matplotlib.dates as mdates
6+
import matplotlib.pyplot as plt
7+
import numpy as np
8+
import pandas as pd
9+
from scipy.stats import f, mannwhitneyu, shapiro, ttest_ind
10+
11+
12+
class BucketTest:
13+
""" BucketTest class computes and renders charts and statistics for bucket testing """
14+
15+
def __init__(self, df: pd.DataFrame, variable: str, group: str, x_axis='date', custom_title='',
16+
custom_day_interval=1, custom_ylabel=''):
17+
""" Create a new bucket test with the given attributes """
18+
self.df = df
19+
self.variable = variable
20+
self.x_axis = x_axis
21+
self.group = group
22+
self.custom_title = custom_title
23+
self.custom_day_interval = custom_day_interval
24+
self.custom_ylabel = custom_ylabel
25+
26+
def render(self, figure_size_x=12, figure_size_y=5, line_width=3, title_font_size=16, legend_font_size=14,
27+
rotation=30):
28+
""" Render renders the charts representing the bucket test """
29+
30+
fig, ax = plt.subplots(figsize=(figure_size_x, figure_size_y))
31+
for group_value in self.df[self.group].unique():
32+
df = self.df[self.df[self.group] == group_value]
33+
df.set_index(self.x_axis, drop=False, inplace=True)
34+
ax.plot(df[self.variable], label=group_value, linewidth=line_width)
35+
36+
# Title customization
37+
if self.custom_title != '':
38+
plt.title(self.custom_title, fontsize=title_font_size)
39+
else:
40+
plt.title('{} per {}'.format(self.variable, self.group), fontsize=title_font_size)
41+
plt.legend(bbox_to_anchor=(1.3, 0.8), frameon=False, fontsize=legend_font_size)
42+
43+
# Y-label customization
44+
plt.ylabel(self.custom_ylabel or self.variable)
45+
46+
plt.ylim(0)
47+
plt.xticks(rotation=rotation)
48+
self.__set_locator_and_formatter__(ax)
49+
plt.show()
50+
plt.savefig(Path('Chart'))
51+
52+
def __set_locator_and_formatter__(self, ax):
53+
# Major locator customization
54+
ax.xaxis.set_major_locator(mdates.DayLocator(interval=self.custom_day_interval))
55+
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
56+
57+
def compute_pvalues(self, alpha=0.01):
58+
""" ComputePValues computes all pvalues, variance etc. for each combination of categories within
59+
the bucket test and renders a table containing the results """
60+
# Create a list with unique values from a data frame
61+
values_df_group = self.df[self.group].unique()
62+
63+
# Create variables for group A and B
64+
group_a = self.df[self.df[self.group] == values_df_group[0]][self.variable]
65+
group_b = self.df[self.df[self.group] == values_df_group[1]][self.variable]
66+
67+
# normality
68+
normality_group_a, normality_pvalue_a = shapiro(group_a)
69+
normality_group_b, normality_pvalue_b = shapiro(group_b)
70+
print('Shapiro group A p-value: ', normality_pvalue_a)
71+
print('Shapiro group B p-value: ', normality_pvalue_b)
72+
73+
# variance
74+
F = np.var(group_a) / np.var(group_b)
75+
critical_value_group_a = len(group_a) - 1
76+
critical_value_group_b = len(group_b) - 1
77+
f_pvalue = f.cdf(F, critical_value_group_a, critical_value_group_b)
78+
print('F test p-value: ', f_pvalue)
79+
80+
if normality_pvalue_a > alpha and normality_pvalue_b > alpha:
81+
if f_pvalue > alpha:
82+
# T-test
83+
ttest_pvalue = ttest_ind(group_a, group_b).pvalue
84+
print('T-test p-value: ', ttest_pvalue)
85+
print('Statistical significance: ', ttest_pvalue <= alpha)
86+
else:
87+
# Welch's test
88+
welch_pvalue = ttest_ind(group_a, group_b, equal_var=False).pvalue
89+
print('Welch p-value: ', welch_pvalue)
90+
print('Statistical significance: ', welch_pvalue <= alpha)
91+
else:
92+
# Mann-Whitney U test
93+
mannwhitneyu_pvalue = mannwhitneyu(group_a, group_b).pvalue
94+
print('Mann-Whitney U test: ', mannwhitneyu_pvalue)
95+
print('Statistical significance: ', mannwhitneyu_pvalue <= alpha)

example/.DS_Store

6 KB
Binary file not shown.

example/Chart.png

1.24 KB
Loading

0 commit comments

Comments
 (0)