Wikia
diff --git a/‎.DS_Store
8 KB b/‎.DS_Store
8 KB
diff --git a/‎.gitignore
+4 b/‎.gitignore
+4
diff --git a/‎.ipynb_checkpoints/requirements-checkpoint.txt
+5 b/‎.ipynb_checkpoints/requirements-checkpoint.txt
+5
diff --git a/‎LICENCE
+21 b/‎LICENCE
+21
diff --git a/‎README.md
+116 b/‎README.md
+116
diff --git a/‎alphab/.DS_Store
6 KB b/‎alphab/.DS_Store
6 KB
diff --git a/‎alphab/__init__.py
+2 b/‎alphab/__init__.py
+2
diff --git a/‎alphab/alphab.py
+95 b/‎alphab/alphab.py
+95
diff --git a/‎example/.DS_Store
6 KB b/‎example/.DS_Store
6 KB
diff --git a/‎example/Chart.png
1.24 KB b/‎example/Chart.png
1.24 KB
@@ -0,0 +1,4 @@
+# IDE
+.idea/*
+example/.DS_Store
+example/.ipynb_checkpoints
@@ -0,0 +1,5 @@
+matplotlib==3.1.2
+pandas==0.25.3
+numpy==1.17.4
+scipy==1.3.3
+pathlib==1.0.1
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2019 FANDOM
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -1 +1,117 @@
 # AlphaB
+
+## Python library for rendering charts and computing statistics for A/B testing
+
+### Current state
+
+This library is in the very initial state. Currently, it supports only A/B testings (two groups).
+
+### AlphaB allows you to:
+
+* Automatically generate charts from A/B testings
+* Compute statistics in order to confirm a statistical significance between groups
+
+
+## Table of content
+
+* [Requirements](#requirements)
+* [How to use it](#how-to-use-it)
+
+## Requirements
+
+You can directly install all of the requirements for AlphaB by running `pip install -r requirements.txt` from the root of the repository.
+
+* [Matplotlib](https://matplotlib.org/) - a library to generate charts from data sets
+* [Pandas](https://pandas.pydata.org/) - a library providing high-performance, easy-to-use data structures and data analysis tools
+* [Numpy](https://numpy.org/) - a library providing support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions
+* [Scipy](https://www.scipy.org/) - a library used for scientific computing and technical computing
+* [Pathlib](https://docs.python.org/3/library/pathlib.html) - offers a set of classes to handle filesystem paths
+
+```python
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy import stats
+from pathlib import Path
+```
+
+## How to use it
+
+It is highly recommended to use [Jupyter](https://jupyter.org/) to perform A/B testing analysis in Python, and AlphaB is built to be used in Jupyter Notebooks. 
+
+Here is an example usage for AlphaB (this example doesn't include specifying a data set for now):
+
+```python
+#!/usr/bin/env python3
+
+from alphab import BucketTest
+import pandas as pd
+
+
+def main():
+    df = pd.DataFrame()
+    bucket_test = BucketTest(
+        df=df,
+        variable='impressions',
+        group='design',
+        x_axis='date',
+        custom_title='Impressions by design',
+        custom_ylabel='#',
+        custom_day_interval=1
+    )
+    bucket_test.render()
+    bucket_test.compute_pvalues()
+```
+
+### Arguments
+
+When creating a bucket test, you can specify the following arguments:
+
+* `df` - data frame to be used for the bucket test. It is recommended to group the data frame before passing it (e.g.: When doing a bucket test on the group `design`, you should group the data frame by design and date first)
+* `variable` - specifies the values on the y-axis for the chart and statistical significance check
+* `group` - the name of the column which the data frame is grouped by
+* `x_axis` (default: `date`) - specifies the values on the x_axis for the chart
+* `custom_title` (default: `{variable} per {group}`) - specifies the title for the chart
+* `custom_day_interval` (default: 1) - specifies the difference between the dates on the x-axis
+
+For the `render()` method, those options can be specified to customize your chart:
+
+* `figure_size_x` (default: 12) - the width of the chart (in inches)
+* `figure_size_y` (default: 5) - the height of the chart (in inches)
+* `line_width` (default: 3) - the line width in a line chart (in points)
+* `title_font_size` (default: 16) -  the font size of the title in the figure 
+* `legend_font_size` (default: 14) -  the font size of the legend in the figure
+* `rotation` (default: 30) - the rotation of the x ticks (in degrees)
+
+In the `compute_pvalues()` method, you can customize the p-value used to reject a null hypothesis by adjusting the `alpha` value (default: `0.01`).
+Recommended values are: `0.01`, `0.05`, `0.1`.
+Read more about statistical significance and p-value [here](https://www.statsdirect.com/help/basics/p_values.htm).
+[This research paper](http://www.scielo.br/pdf/bpsr/v7n1/02.pdf) is also a good place to start for those who want to better understand those topics.
+
+### Screenshots
+
+A generated chart and statistical significance analysis example:
+
+<p align="center">
+  <img width="100%" src="example/example_impressions_by_group.png" />
+</p>
+
+## Next steps
+
+* Customize the number of groups that are taken into account A/B/C testings, A/B/C/D testings, A/B/C/D/E testings
+* Render charts and compute p-values for data from more than one data frame
+* Create tests for `render` and `compute_pvalue` methods
+* Handle other `x_axis` that date only
+* Customize names of images `plt.savefig(Path(""))`
+
+## How to contribute
+
+You can contribute by forking this repository, looking through the [issues](https://github.com/Wikia/AlphaB/issues) of the repository, and opening a PR on your fork. Please make sure to write a clear PR description and to provide examples for how your new feature works.
+
+## Contributors
+
+* The method for checking statistical significance was highly inspired by the work of **Paulina Gralak [@Loczi94](https://github.com/Loczi94)**.
+
+Thanks a lot!
+
+The creator and maintainer: Julia Jakubczak [@veliona](https://github.com/veliona)
@@ -0,0 +1,2 @@
+# __init__.py
+from .alphab import BucketTest
@@ -0,0 +1,95 @@
+#!/usr/bin/env python3
+
+from pathlib import Path
+
+import matplotlib.dates as mdates
+import matplotlib.pyplot as plt
+import numpy as np
+import pandas as pd
+from scipy.stats import f, mannwhitneyu, shapiro, ttest_ind
+
+
+class BucketTest:
+    """ BucketTest class computes and renders charts and statistics for bucket testing """
+
+    def __init__(self, df: pd.DataFrame, variable: str, group: str, x_axis='date', custom_title='',
+                 custom_day_interval=1, custom_ylabel=''):
+        """ Create a new bucket test with the given attributes """
+        self.df = df
+        self.variable = variable
+        self.x_axis = x_axis
+        self.group = group
+        self.custom_title = custom_title
+        self.custom_day_interval = custom_day_interval
+        self.custom_ylabel = custom_ylabel
+
+    def render(self, figure_size_x=12, figure_size_y=5, line_width=3, title_font_size=16, legend_font_size=14,
+               rotation=30):
+        """ Render renders the charts representing the bucket test """
+
+        fig, ax = plt.subplots(figsize=(figure_size_x, figure_size_y))
+        for group_value in self.df[self.group].unique():
+            df = self.df[self.df[self.group] == group_value]
+            df.set_index(self.x_axis, drop=False, inplace=True)
+            ax.plot(df[self.variable], label=group_value, linewidth=line_width)
+
+        # Title customization
+        if self.custom_title != '':
+            plt.title(self.custom_title, fontsize=title_font_size)
+        else:
+            plt.title('{} per {}'.format(self.variable, self.group), fontsize=title_font_size)
+        plt.legend(bbox_to_anchor=(1.3, 0.8), frameon=False, fontsize=legend_font_size)
+
+        # Y-label customization
+        plt.ylabel(self.custom_ylabel or self.variable)
+
+        plt.ylim(0)
+        plt.xticks(rotation=rotation)
+        self.__set_locator_and_formatter__(ax)
+        plt.show()
+        plt.savefig(Path('Chart'))
+
+    def __set_locator_and_formatter__(self, ax):
+        # Major locator customization
+        ax.xaxis.set_major_locator(mdates.DayLocator(interval=self.custom_day_interval))
+        ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
+
+    def compute_pvalues(self, alpha=0.01):
+        """ ComputePValues computes all pvalues, variance etc. for each combination of categories within
+        the bucket test and renders a table containing the results """
+        # Create a list with unique values from a data frame
+        values_df_group = self.df[self.group].unique()
+
+        # Create variables for group A and B
+        group_a = self.df[self.df[self.group] == values_df_group[0]][self.variable]
+        group_b = self.df[self.df[self.group] == values_df_group[1]][self.variable]
+
+        # normality
+        normality_group_a, normality_pvalue_a = shapiro(group_a)
+        normality_group_b, normality_pvalue_b = shapiro(group_b)
+        print('Shapiro group A p-value: ', normality_pvalue_a)
+        print('Shapiro group B p-value: ', normality_pvalue_b)
+
+        # variance
+        F = np.var(group_a) / np.var(group_b)
+        critical_value_group_a = len(group_a) - 1
+        critical_value_group_b = len(group_b) - 1
+        f_pvalue = f.cdf(F, critical_value_group_a, critical_value_group_b)
+        print('F test p-value: ', f_pvalue)
+
+        if normality_pvalue_a > alpha and normality_pvalue_b > alpha:
+            if f_pvalue > alpha:
+                # T-test
+                ttest_pvalue = ttest_ind(group_a, group_b).pvalue
+                print('T-test p-value: ', ttest_pvalue)
+                print('Statistical significance: ', ttest_pvalue <= alpha)
+            else:
+                # Welch's test
+                welch_pvalue = ttest_ind(group_a, group_b, equal_var=False).pvalue
+                print('Welch p-value: ', welch_pvalue)
+                print('Statistical significance: ', welch_pvalue <= alpha)
+        else:
+            # Mann-Whitney U test
+            mannwhitneyu_pvalue = mannwhitneyu(group_a, group_b).pvalue
+            print('Mann-Whitney U test: ', mannwhitneyu_pvalue)
+            print('Statistical significance: ', mannwhitneyu_pvalue <= alpha)
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+# __init__.py`
	`2`	`+from .alphab import BucketTest`