Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Renaming #54

Merged
merged 8 commits into from
Jan 31, 2025
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,10 @@ If you spot small typos or grammatical errors in documentation, you can fix them

## Get Started!

Ready to contribute? Here's how to set up `salesanalyzer` for local development.
Ready to contribute? Here's how to set up `salesanalyzer_mds` for local development.

1. Download a copy of `salesanalyzer` locally.
2. Install `salesanalyzer` using `poetry`:
1. Download a copy of `salesanalyzer_mds` locally.
2. Install `salesanalyzer_mds` using `poetry`:

```console
$ poetry install
Expand Down Expand Up @@ -85,5 +85,5 @@ Before you submit a pull request, check that it meets these guidelines:

## Code of Conduct

Please note that the `salesanalyzer` project is released with a
Please note that the `_mds_` project is released with a
Code of Conduct. By contributing to this project you agree to abide by its terms.
44 changes: 29 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,34 @@
# salesanalyzer
# salesanalyzer_mds

[![Documentation Status](https://readthedocs.org/projects/salesanalyzer/badge/?version=latest)](https://salesanalyzer.readthedocs.io/en/latest/?badge=latest)
[![ci-cd](https://github.com/UBC-MDS/salesanalyzer/actions/workflows/ci-cd.yml/badge.svg)](https://github.com/UBC-MDS/salesanalyzer/actions/workflows/ci-cd.yml)

A python package that helps with the analysis on a sales data. The packagage will contain functions to be used as tools for identifying market segment, predicting future sales and analyzing seasonal revenue trends. <br>

The sales_analyzer package will be an addition to the Python ecosystem as a specialized tool for analyzing retail sales data, targeting small to medium-sized businesses that may not have the resources for an in-house data analytics team and who could benefit from ready-to-use functions for common sales-related tasks. While existing packages such as `Pandas` and `Scikit-learn` provide general tools for data manipulation and machine learning predictions, `salesanalyzer` aims to streamline the process by offering a suite of pre-built, retail-specific analytical functions.
The sales_analyzer package will be an addition to the Python ecosystem as a specialized tool for analyzing retail sales data, targeting small to medium-sized businesses that may not have the resources for an in-house data analytics team and who could benefit from ready-to-use functions for common sales-related tasks. While existing packages such as `Pandas` and `Scikit-learn` provide general tools for data manipulation and machine learning predictions, `salesanalyzer_mds` aims to streamline the process by offering a suite of pre-built, retail-specific analytical functions.

## Installation

```bash
$ pip install salesanalyzer
$ pip install salesanalyzer_mds
```

## Functions

- `segment_revenue_share`: Segments products into three categories: cheap, medium, expensive, based on price, and calculates their respective share in total revenue.
- `predictSales`: Predicts future sales based on the provided historical data and the target.
- `sales_summary_statistics`: Calculates a variety of summary statistics that provide insights into overall sales performance,
customer behavior, and product performance.

## Usage

`salesanalyzer` can be used to extract sales data insights from available data.
`salesanalyzer_mds` can be used to extract sales data insights from available data.
1. Set up imports

```
from salesanalyzer.sales_summary_statistics import sales_summary_statistics
from salesanalyzer.segment_revenue_share import segment_revenue_share
from salesanalyzer.predict_sales import predict_sales
from salesanalyzer_mds.sales_summary_statistics import sales_summary_statistics
from salesanalyzer_mds.segment_revenue_share import segment_revenue_share
from salesanalyzer_mds.predict_sales import predict_sales
import pandas as pd # additional import to handle your sales data
```

Expand All @@ -35,10 +37,13 @@ import pandas as pd # additional import to handle your sales data
3. Retrieve the insights:

**Summary statistics**

```
sales_summary_statistics(your_sales_data)
```
The `sales_summary_statistics` returns a pandas DataFrame with:

The `sales_summary_statistics()` function returns a pandas DataFrame with:

- 'total_revenue': The total revenue generated by all sales.
- 'unique_customers': The number of unique customers.
- 'average_order_value': The average value of an order (sum of revenue per invoice).
Expand All @@ -47,15 +52,22 @@ The `sales_summary_statistics` returns a pandas DataFrame with:
- 'average_revenue_per_customer': The average revenue generated by each customer.

**Segment revenue share**

```
segment_revenue_share(your_sales_data,
price_col='UnitPrice',
quantity_col='Quantity') # replace column names with your data column names
quantity_col='Quantity',
price_thresholds=None) # replace column names with your data column names
```
The `segment_revenue_share` returns a pandas DataFrame showing the total revenue share for each price segment:
'cheap', 'medium', 'expensive'.

The `segment_revenue_share()` funtion returns a pandas DataFrame showing the total revenue share for each price segment:
'cheap', 'medium', 'expensive'. Custom price thresholds can be set by the user other set automatically.

- Custom price thresholds can be set using the `price_thresholds` parameter.
- If not specified, thresholds are automatically determined based on the data.

**Predict sales**

```
predict_sales(your_sales_data,
new_data, # new sales data to base the predictions on
Expand All @@ -64,19 +76,21 @@ predict_sales(your_sales_data,
target = 'Quantity',
date_feature = 'InvoiceDate')
```
The `predict_sales` returns a DataFrame with prediction values, and a printed out MSE score.

The `predict_sales()` function returns a DataFrame with prediction values, and a printed out MSE score.

## Developer notes:
### Running The Tests

Run the following command in the terminal from the project's root directory to execute the tests:

```bash
pytest tests/
```

To assess the branch coverage for this package:
```bash
pytest --cov=salesanalyzer --cov-branch
pytest --cov=salesanalyzer_mds --cov-branch
```

## Dependencies
Expand All @@ -103,8 +117,8 @@ Interested in contributing? Check out the contributing guidelines. Please note t

## License

`salesanalyzer` was created by Yeji Sohn, Daria Khon, Franklin Aryee. It is licensed under the terms of the MIT license.
`salesanalyzer_mds` was created by Yeji Sohn, Daria Khon, Franklin Aryee. It is licensed under the terms of the MIT license.

## Credits

`salesanalyzer` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).
`salesanalyzer_mds` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter).
101 changes: 87 additions & 14 deletions docs/example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Example usage of salesanalyzer package\n",
"# Example Usage\n",
"\n",
"Welcome to the `sales_analyzer` package! This package is designed to help small-sized businesses analyze their retail sales data efficiently, without needing extensive data analytics expertise. If you've ever felt overwhelmed by tools like Pandas or Scikit-learn, or wished for more retail-specific functions, you're in the right place.\n",
"Welcome to the `salesanalyzer_mds` package! This package is designed to help small-sized businesses analyze their retail sales data efficiently, without needing extensive data analytics expertise. If you've ever felt overwhelmed by tools like Pandas or Scikit-learn, or wished for more retail-specific functions, you're in the right place.\n",
"\n",
"In this notebook, we'll walk through how to use the `salesanalyzer` package to extract valuable insights from your sales data. We’ll demonstrate key functionalities using real-world examples, so you can start improving your business decisions right away!"
"In this notebook, we'll walk through how to use the `salesanalyzer_mds` package to extract valuable insights from your sales data. We’ll demonstrate key functionalities using real-world examples, so you can start improving your business decisions right away!"
]
},
{
Expand All @@ -17,7 +17,7 @@
"source": [
"## Imports\n",
"\n",
"Let us begin by setting up all our imports for this demonstration, which includes all 3 `salesanalyzer` functions:\n",
"Let us begin by setting up all our imports for this demonstration, which includes all 3 `salesanalyzer_mds` functions:\n",
"- `sales_summary_statistics`: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.\n",
"- `segment_revenue_share`: Segments products into three categories: cheap, medium, expensive, based on price, and calculates their respective share in total revenue.\n",
"- `predict_sales`: Predicts future sales based on the provided historical data and the target.\n",
Expand Down Expand Up @@ -45,7 +45,7 @@
"\n",
"Next, let us create a sample data to work with. \n",
"> Note:\n",
"> `salesanalyzer` package is not limited to the sample data columns and can be customized to suit your specific requirements."
"> `salesanalyzer_mds` package is not limited to the sample data columns and can be customized to suit your specific requirements."
]
},
{
Expand Down Expand Up @@ -179,7 +179,7 @@
"source": [
"## Get Summary Statistics\n",
"\n",
"One of the key features of `salesanalyzer` is its ability to quickly generate sales summary. Use the `analyze_sales_trends()` function to generate insights like total revenue, average order value, and top selling products.\n",
"One of the key features of `salesanalyzer_mds` is its ability to quickly generate sales summary. Use the `analyze_sales_trends()` function to generate insights like total revenue, average order value, and top selling products.\n",
"> Use help(sales_summary_statistics) for more information about the function"
]
},
Expand Down Expand Up @@ -266,7 +266,7 @@
"source": [
"## Get Revenue Share for each Product Category\n",
"\n",
"Another feature of `saleanalyzer`, the `segment_revenue_share()` function, segments products into three categories (cheap < medium < expensive) — based on their price, and calculates the respective share of total revenue contributed by each segment. This function is particularly useful for analyzing product sales data and understanding revenue distribution across different pricing tiers.\n",
"Another feature of `saleanalyzer`, the `segment_revenue_share()` function, segments products into three categories (cheap < medium < expensive) — based on their price, and calculates the respective share of total revenue contributed by each segment. By default, the price thresholds are set automatically, but users can define custom thresholds to categorize products according to their specific business needs. This function is particularly useful for analyzing product sales data and understanding revenue distribution across different pricing tiers.\n",
"> Use help(sales_summary_statistics) for more information about the function"
]
},
Expand Down Expand Up @@ -337,10 +337,83 @@
}
],
"source": [
"# Using default price thresholds\n",
"revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity')\n",
"revenue_share"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>PriceSegment</th>\n",
" <th>TotalRevenue</th>\n",
" <th>RevenueShare (%)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>cheap</td>\n",
" <td>1150</td>\n",
" <td>9.24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>medium</td>\n",
" <td>3600</td>\n",
" <td>28.92</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>expensive</td>\n",
" <td>7700</td>\n",
" <td>61.85</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" PriceSegment TotalRevenue RevenueShare (%)\n",
"0 cheap 1150 9.24\n",
"1 medium 3600 28.92\n",
"2 expensive 7700 61.85"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Using user-defined price thresholds\n",
"revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity', price_thresholds=(300, 500))\n",
"revenue_share"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -356,7 +429,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -409,7 +482,7 @@
"1 1.33"
]
},
"execution_count": 5,
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -441,7 +514,7 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 7,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -494,7 +567,7 @@
"1 1.88"
]
},
"execution_count": 6,
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -507,13 +580,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### This is the end of the tutorial, where you have seen how to get sales data insights using our package."
"This is the end of the tutorial, where you have seen how to get sales data insights using our package."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "salesanalyzser",
"display_name": "salesanalyzer",
"language": "python",
"name": "python3"
},
Expand All @@ -527,7 +600,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
"version": "3.11.9"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[tool.poetry]
name = "salesanalyzer_mds"
version = "2.0.1"
description = "A package for doing great things!"
description = "A Python package for sales forecasting, statistical analysis, and data-driven insights"
authors = ["Daria Khon, Franklin Aryee, Yeji Sohn"]
license = "MIT"
readme = "README.md"
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading