From 72e50f7f998b55be6531fb83c2757bd80a8fbfff Mon Sep 17 00:00:00 2001 From: Franklin Aryee Date: Thu, 30 Jan 2025 16:21:49 -0800 Subject: [PATCH 1/8] Changed all instances of salesanalyzer to salesanalyzer_mds --- CONTRIBUTING.md | 8 ++++---- README.md | 20 ++++++++++---------- docs/example.ipynb | 19 ++++++++++++------- 3 files changed, 26 insertions(+), 21 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 394a988..b07fa0d 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -47,10 +47,10 @@ If you spot small typos or grammatical errors in documentation, you can fix them ## Get Started! -Ready to contribute? Here's how to set up `salesanalyzer` for local development. +Ready to contribute? Here's how to set up `salesanalyzer_mds` for local development. -1. Download a copy of `salesanalyzer` locally. -2. Install `salesanalyzer` using `poetry`: +1. Download a copy of `salesanalyzer_mds` locally. +2. Install `salesanalyzer_mds` using `poetry`: ```console $ poetry install @@ -85,5 +85,5 @@ Before you submit a pull request, check that it meets these guidelines: ## Code of Conduct -Please note that the `salesanalyzer` project is released with a +Please note that the `_mds_` project is released with a Code of Conduct. By contributing to this project you agree to abide by its terms. diff --git a/README.md b/README.md index 4965e9c..7b44ef8 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,15 @@ -# salesanalyzer +# salesanalyzer_mds [![Documentation Status](https://readthedocs.org/projects/salesanalyzer/badge/?version=latest)](https://salesanalyzer.readthedocs.io/en/latest/?badge=latest) A python package that helps with the analysis on a sales data. The packagage will contain functions to be used as tools for identifying market segment, predicting future sales and analyzing seasonal revenue trends.
-The sales_analyzer package will be an addition to the Python ecosystem as a specialized tool for analyzing retail sales data, targeting small to medium-sized businesses that may not have the resources for an in-house data analytics team and who could benefit from ready-to-use functions for common sales-related tasks. While existing packages such as `Pandas` and `Scikit-learn` provide general tools for data manipulation and machine learning predictions, `salesanalyzer` aims to streamline the process by offering a suite of pre-built, retail-specific analytical functions. +The sales_analyzer package will be an addition to the Python ecosystem as a specialized tool for analyzing retail sales data, targeting small to medium-sized businesses that may not have the resources for an in-house data analytics team and who could benefit from ready-to-use functions for common sales-related tasks. While existing packages such as `Pandas` and `Scikit-learn` provide general tools for data manipulation and machine learning predictions, `salesanalyzer_mds` aims to streamline the process by offering a suite of pre-built, retail-specific analytical functions. ## Installation ```bash -$ pip install salesanalyzer +$ pip install salesanalyzer_mds ``` ## Functions @@ -20,13 +20,13 @@ $ pip install salesanalyzer ## Usage -`salesanalyzer` can be used to extract sales data insights from available data. +`salesanalyzer_mds` can be used to extract sales data insights from available data. 1. Set up imports ``` -from salesanalyzer.sales_summary_statistics import sales_summary_statistics -from salesanalyzer.segment_revenue_share import segment_revenue_share -from salesanalyzer.predict_sales import predict_sales +from salesanalyzer_mds.sales_summary_statistics import sales_summary_statistics +from salesanalyzer_mds.segment_revenue_share import segment_revenue_share +from salesanalyzer_mds.predict_sales import predict_sales import pandas as pd # additional import to handle your sales data ``` @@ -76,7 +76,7 @@ pytest tests/ To assess the branch coverage for this package: ```bash -pytest --cov=salesanalyzer --cov-branch +pytest --cov=salesanalyzer_mds --cov-branch ``` ## Dependencies @@ -103,8 +103,8 @@ Interested in contributing? Check out the contributing guidelines. Please note t ## License -`salesanalyzer` was created by Yeji Sohn, Daria Khon, Franklin Aryee. It is licensed under the terms of the MIT license. +`salesanalyzer_mds` was created by Yeji Sohn, Daria Khon, Franklin Aryee. It is licensed under the terms of the MIT license. ## Credits -`salesanalyzer` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter). +`salesanalyzer_mds` was created with [`cookiecutter`](https://cookiecutter.readthedocs.io/en/latest/) and the `py-pkgs-cookiecutter` [template](https://github.com/py-pkgs/py-pkgs-cookiecutter). diff --git a/docs/example.ipynb b/docs/example.ipynb index af06b4e..bda0d37 100644 --- a/docs/example.ipynb +++ b/docs/example.ipynb @@ -4,11 +4,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Example usage of salesanalyzer package\n", + "# Example Usage\n", "\n", - "Welcome to the `sales_analyzer` package! This package is designed to help small-sized businesses analyze their retail sales data efficiently, without needing extensive data analytics expertise. If you've ever felt overwhelmed by tools like Pandas or Scikit-learn, or wished for more retail-specific functions, you're in the right place.\n", + "Welcome to the `salesanalyzer_mds` package! This package is designed to help small-sized businesses analyze their retail sales data efficiently, without needing extensive data analytics expertise. If you've ever felt overwhelmed by tools like Pandas or Scikit-learn, or wished for more retail-specific functions, you're in the right place.\n", "\n", - "In this notebook, we'll walk through how to use the `salesanalyzer` package to extract valuable insights from your sales data. We’ll demonstrate key functionalities using real-world examples, so you can start improving your business decisions right away!" + "In this notebook, we'll walk through how to use the `salesanalyzer_mds` package to extract valuable insights from your sales data. We’ll demonstrate key functionalities using real-world examples, so you can start improving your business decisions right away!" ] }, { @@ -17,7 +17,7 @@ "source": [ "## Imports\n", "\n", - "Let us begin by setting up all our imports for this demonstration, which includes all 3 `salesanalyzer` functions:\n", + "Let us begin by setting up all our imports for this demonstration, which includes all 3 `salesanalyzer_mds` functions:\n", "- `sales_summary_statistics`: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.\n", "- `segment_revenue_share`: Segments products into three categories: cheap, medium, expensive, based on price, and calculates their respective share in total revenue.\n", "- `predict_sales`: Predicts future sales based on the provided historical data and the target.\n", @@ -45,7 +45,7 @@ "\n", "Next, let us create a sample data to work with. \n", "> Note:\n", - "> `salesanalyzer` package is not limited to the sample data columns and can be customized to suit your specific requirements." + "> `salesanalyzer_mds` package is not limited to the sample data columns and can be customized to suit your specific requirements." ] }, { @@ -179,7 +179,7 @@ "source": [ "## Get Summary Statistics\n", "\n", - "One of the key features of `salesanalyzer` is its ability to quickly generate sales summary. Use the `analyze_sales_trends()` function to generate insights like total revenue, average order value, and top selling products.\n", + "One of the key features of `salesanalyzer_mds` is its ability to quickly generate sales summary. Use the `analyze_sales_trends()` function to generate insights like total revenue, average order value, and top selling products.\n", "> Use help(sales_summary_statistics) for more information about the function" ] }, @@ -507,8 +507,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "#### This is the end of the tutorial, where you have seen how to get sales data insights using our package." + "This is the end of the tutorial, where you have seen how to get sales data insights using our package." ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] } ], "metadata": { From ee324adfcaedbea3584df5f4e3f2739c0a4a59f4 Mon Sep 17 00:00:00 2001 From: Franklin Aryee Date: Thu, 30 Jan 2025 16:25:03 -0800 Subject: [PATCH 2/8] Updated description --- pyproject.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyproject.toml b/pyproject.toml index 4ef8eff..0135d15 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,7 +1,7 @@ [tool.poetry] name = "salesanalyzer_mds" version = "2.0.1" -description = "A package for doing great things!" +description = "A Python package for sales forecasting, statistical analysis, and data-driven insights" authors = ["Daria Khon, Franklin Aryee, Yeji Sohn"] license = "MIT" readme = "README.md" From ff46ad63b2c48245b55eafef28bdd96a528fcca1 Mon Sep 17 00:00:00 2001 From: Franklin Aryee Date: Thu, 30 Jan 2025 16:33:08 -0800 Subject: [PATCH 3/8] Added ci-cd badge --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 7b44ef8..291fb34 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,7 @@ # salesanalyzer_mds [![Documentation Status](https://readthedocs.org/projects/salesanalyzer/badge/?version=latest)](https://salesanalyzer.readthedocs.io/en/latest/?badge=latest) +[![ci-cd](https://github.com/UBC-MDS/salesanalyzer/actions/workflows/ci-cd.yml/badge.svg)](https://github.com/UBC-MDS/salesanalyzer/actions/workflows/ci-cd.yml) A python package that helps with the analysis on a sales data. The packagage will contain functions to be used as tools for identifying market segment, predicting future sales and analyzing seasonal revenue trends.
From c216537942ad3c7dbdc3d5574c38eb91c4ce01e1 Mon Sep 17 00:00:00 2001 From: Franklin Aryee Date: Thu, 30 Jan 2025 18:12:51 -0800 Subject: [PATCH 4/8] feat: Updated segment function to include user-defined thresholds --- .../__pycache__/__init__.cpython-311.pyc | Bin 0 -> 375 bytes .../__pycache__/predict_sales.cpython-311.pyc | Bin 0 -> 5852 bytes .../sales_summary_statistics.cpython-311.pyc | Bin 0 -> 5684 bytes .../segment_revenue_share.cpython-311.pyc | Bin 0 -> 4805 bytes .../segment_revenue_share.py | 98 +++++++++--------- 5 files changed, 47 insertions(+), 51 deletions(-) create mode 100644 src/salesanalyzer_mds/__pycache__/__init__.cpython-311.pyc create mode 100644 src/salesanalyzer_mds/__pycache__/predict_sales.cpython-311.pyc create mode 100644 src/salesanalyzer_mds/__pycache__/sales_summary_statistics.cpython-311.pyc create mode 100644 src/salesanalyzer_mds/__pycache__/segment_revenue_share.cpython-311.pyc diff --git a/src/salesanalyzer_mds/__pycache__/__init__.cpython-311.pyc b/src/salesanalyzer_mds/__pycache__/__init__.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..c21efb05acf76fd4698c0acf8c9fad0059cf9d62 GIT binary patch literal 375 zcmZ3^%ge<81lNVh1hVlKcYHilXMFrhhR-0=ezoa`76VPwPb*5y%g)KnODw8P zP1X0wOezAh^nLS_GILTDT=J7kb5rw5iuIH8bL$cd3UX5GoOAL^Q-Bi1`o1o~`c66d z$=SLl`YyrFp7EweCi*BY(=RScMil|MRX;vHGcU6wK3=b&@)w6qZhlH>PO4oIDA+(= pFV+SUAD9^#86PmHT|h-Q_(VI>FNkSgf<+ua^#E)5ZrA_- literal 0 HcmV?d00001 diff --git a/src/salesanalyzer_mds/__pycache__/predict_sales.cpython-311.pyc b/src/salesanalyzer_mds/__pycache__/predict_sales.cpython-311.pyc new file mode 100644 index 0000000000000000000000000000000000000000..37b9e671e18e26e741a2b10ad7ee597869ae83b7 GIT binary patch literal 5852 zcmbstU2oggmDHChiIQyD&WEkUjFZ+@BTN2>)7Ep`IQ~fU;k0WOCvfX(j7Zx|C{iP- zrgo`9C<-j=0@QnI2DpGXw3&yrKpysxpRh+VBM^arfdE4=^vy-Omp<)WQWPa6@zR#K zzB>1wd(YQB=iGDsw@4&}!S}FzV_9m&uz#b0^LeVkllKAm0%I@+=dlGGem%S=?_KcX zXwJ*~^8N+C1^f6wzGb1sg8e*^4=w~PIKYSU;f1gTxA0^>vJe3{!~{RZ7orS)5le=@ zL(DEDakG_^vqCY?N?b0>D60h)%_$Pi30Z}em8@LgIR)nS=IQHfHYf7MJYXbRke5UW zwgD>gmcYI#Dl0Nt&PpWqD? z@t;A_vnxf9LOAkb*shX!cctTfVLHBD>G%}O?`)ZNmB-&qNA*l&i!Ya2l!&X5im5I8 zYs1DJVEI@Lg)xCf>?ioGJ{8-nvyQv=4uE%S!xETvUkhU8vzdQB!`sr30GAoA9k_>t?6B!iChZ8fnx7?B2S$$6et< zPs!J&Wa_X{wb>@|=Td5xzg-z}DKYKJh&xi_Oh==HjU&7i@<$}B~L*{U0;8=SI2RmoJ@K*3^D&(R9IY+EktZ1|ZRaw}l!99X)UheAustTIN) zic}N(0+<+yjZV?VS2eK30&}vCJkQ?v|MW!^6b_i6lc7+BW>dC~lwG2WJVrrN#VfZKw6uYNL^)-kr@hVISOqBmw1Xg^#RQnS&EaX!Sslw zx7EGF!YK+(&S+ukz0dYlBSm^(#PXP9PW7NBSJY7)4LSKsN%ZmaB zLD#nkFtE;9_@TkU6g50Jh$f+46%xqI$#Xf?F|vtMW7NptU_FUy z`R%wZkb1W&$YPDpv;5myXes2btNP%HI;m?kss_(mk>2knj1=W)0Z@}LDe76AQ&nHL z?SPIKol`Xo^6(7c$Ecz7;Bd;~jqvEJCPFw_`@Cz&K|l^48%-Y@w5HaET-m6CllLfj zMJ)0R@>y^Z@DxZ5(@vIK%Qz~$=!Qd1BB`0Zo<2%VYcJKQ?q;EpUmW(kjbl&U05K zfHHI9RgPz=X)y;rE+}#)C+0t<3y?`Zp5nzK0|+@YH+?a4iWhU&2aaW?FHW7wjt(Eo zSnfy*G#m^oWn?MWIFrpYa;71;(uLI@L*vMafJb3s?NB3j92n5NJdt*nTn7zg3c50; zC8q3luk*`Go1tPO&=1% z^n%k!=mBH~e*-s&jS4VuCkV6|Rz#F6ty{+QFiMjfnF01gg%ubH6}l1>HJF6X(Tpr{ ziu<1c@G|G00up+8n_fVhEmgfG^x!l7R$`a-0-WqLLvE0pgcH4HYjeby9d%LYT&1!Z zu!GJdk>t*_gkGLzz`jyV{ECUMn*J+UAa1r?L9%j!82|uoI*-JGLXt$P>f0GtnuIy@@|1rKM^VQ9 z`!RO}kP#k2L{8a6PU_J~ zBRZ)OlZesMmpMJ$YlM5zisKfIV|sMdh>mK+=%YAHd;FJiyT2LOj`eJYwvHRrupS#R zVk59Acp2Z0cWa4DdVJo9&uirTju%bt1h9DGVO#HmwqCvMh|zYWeD+aG`}bJLKeip) ze>?T9SC6HPSPGGt1QH4Dz#Dq}q!B-1*OFJ&a((7R3=s7)DNrvcshhggN!_mJp$K=f$TYj)F#*A3!zjd=YL5!c!eYcG%K z?PCV*x1$=(NK@8+mZCJi!Ko_S0j z+LASLL?=fKa-=*{@dbZ>=94owm)CFTz8=HZqxpIsMdF_?ezth~xE|>>BE99=?NH+O z{N}tKN*SS4dFoND z!q9BB2Wh&%rBMwn+u3lhe4U4DR7hLVo|Rdi&7m6E%5`SU!J;HC!p(w`pV{fmy7>aG zdOIh!IyhF4+5A(~kTT4)G@IMASkwxqCCSI9f@jFSoDvAHv* zB=n2ZCQ#nQbnYF+P72`X~n+mx%1%}n`BG0 zJNDeUALsn;J!gLR+G;&n(sNYU_fMjDQE`8iV?>#kmzBoL zA}@c4`6Q<_P39yyljiVrTxpyZZ)BkIVn*@AIVp+xtjuRbrAcB0PD-$h%qY#*b&y0d zBPfko0-|P{lRA)E^6(mzTJW+am|Ee)oSegA(rH)(PZ!)axL@A2Q`GOk2QqE8u2XW= zdWteVaUW_QS6d{nOi=WTrSAj6<0{G*be_&P5;)7-YIUs~>$s*`g8ee_gMOK| z@>H$-9JNm6=`KT%lp4&^ii3h0);SLC$To|9qbX_T>OAPK z>r-XEDXrydb@n>0)y8#d!{S*JJ@2Ts!0M%qw;4&ZiDl+$N4whew!PQ*{{PjxXV>2C zwYKlM_ZHJ)GgoJO^RB(y{jWW;8~h^&cB3I>ecorawSM5DiGliJ-G zyda%abN%E}bHqe*1wegS0A!?S9PBxVnKTzZiO!&E2_-Y4%+064 zs5xH4!$cAFNiqiGKZZey;0KgRAu?cXm{vcKT0;{_8G(&f9a|mz-q3(8vSzr^tIh)x z*=pn7zzt0_9XH`DPTCuqu8723dY;23V4aiq2C7?&EFe0AFR5D#D~YWCa1KqTb+Hj*Y;~B)cC8sD`+7R$ewIk8T5DR4OMX z$?gkyl+Na4PExsLP6ji^=<^kY)QFPMr+7#sW=)hq3woqtNwi=DK^vm>PWn+ui$MwF z{YyHA_T~uD>Vs5OQ;#nRHB2ScFr_OSkw>i~7+|#)%=#c)k_saxk#MAwC0qrDP?yj{ zT|$5931cehJs4b2LkxLui~PbO2T{h5vrIWKawo)EpX~}*$0Qf45lLsj)J-Ox6*x;MHds6{F@e|=I)$>V8G`$Ms|FO(My}9D`X(K^|MlQN z1dT+{Py`K*T#eMqT{g}EnjlCXAAn?Vgd_t4_2QG&vw|uQ(jW%V^>Wk3If96VrY~L@ z7#Ivikd`}M%?(v^!=ZYJxOrqyK|a)$Idq2T7=^ReLQTtL=xWu{)o?YdFVG3qf!dm8 zRGIUsk4>2x5T8YvFqJIqpr^Ie%rvA=i|R|4kg?uc=Ytj^2DArBhfa(PL@i~#Wyq9J zb0>Z|79D|{wm?H>CjwWgzEh*YIu-7X4h?5cs?CD3EB+{&;)g%nQnWT(nmtQxH3$Uvs z>%qiIWHp337a;SUoBwA^J<%}F#71t$gCmVJuEj80a@zpOYN}ItV%W|P~ z??hdL*GgcgR-M>Nw}Gcu)P!KgafR%f@jEceuuo~?!4&YT2p>NhvaHfr-MK1Fs^ApH z8LZfsGQ6nR1Wr^OyvTAl6(?+5m$RJWNC_E6R$QWHmSTgKrMMPwCYPOGR-8P0GtJyo z>}ie>75ZZZzIBkzVR)a)F-#86n4AJJIl5wU(!}JzqxcNxlC34#<|^K_=60h32Z-Q+?Y$Q%dSYcy3^?VuNdM5?tOkrziocPH;m-J5!P?9s7@eNXy`4uU}jj&2@^lnz8H z?l$Nt&<5K%7OQu zPnQGdHUk$*feXdJOgS)9uyerjLaADeIgs^~jfCjZc;FIw_NpVQCBp5+U^ zXwerh`{HZUTOL2S7Iy;u!VXNrG$vHU@QsVYPLhysT49Ic)D{ffOX`A4hgW2rYlri!T5tacv_|_m#$WZ&ulwbnr`2Qud{Uf wgLj^EYq{^|v}a+S(=wHJquVwjQgp zbLZZ3&OP7f%WfzwVDTlv5-gb}7D)$z|5zt$;mBL2 z1~6;o%&ZN5cFw|5?I%l4*72!z(aMqwM8x$1Bg|=z4+UPDM|_ggsB3vfka%fDbBauw z6B8^WF`6TX4-?63T65p21L$wQ+JUr;IB_|fW<|}0I4O^Wq@}?N{I|i+48K>mtOW6y z>ofP4M1i=~1V9}UKCKUyd_WS!L-^Ik&&dK=FfE(bECq`M_>BP+npj4RRnX#^wO~bl z$x!HR7+G&hL{uzc8jEn<203?qyf1-hpHKV1}T*HwZ1 zieSwJ6U>%xEW0I>ye%egQH}MjrcR2ww>9<~|Bd=E(iHttR|Ad!y}`ZKy&CwKCd& z%~r4#+w|XpV=3cpo31xPI(9+2H(VR;B@^rXlv=aDowP;BBYQ`|ZlwIkWqbz>5Jlf! zJcuH<5t8+^`S^kTFpPRB`riVkb3Yi?!+Lr0QHSyVHL5_d-v5q=HAlh0`flqP3VDtA z1;;96TJ`|;3xQ<)aN0Qm>=dYCU|+5RY=8}Zy(_D%Q7M8A{f1m~0$y-myj^-MhW3Lh zI9cDLt_R?U51YCDZ8V6r0{K0hr$I<34P${zWjH|+=^VO6f0XX8xh-Bbofg!nb8(&FMb5~czW)Ozo#)OYltnFqAUfuS z7jk*2jZd&Um&r*hH1LDHn_uEV(a;uw--r#zp)kk4=sp*1~4NX>q|mKf$zhko&oYfpOr zGL^dXfwTPE+~22?&2|vdBu<66M7$b+>DMOQOx`1J>k!C<8(VlcHQ!AO&>}%Hw9IK% z-Ay!G%}F#z(?$M3M9i914;ZauSCnX-O|Vv!h`9Bn*VkYFDvo1Lj4vTZxSi$&2CZ;h z{4#$N!76?wdy`Lt<+4c-6|!$UnazB{3H0~ratQfC{L86{d6MC8*%&lm?MnYfZ61q^UZ-AYEYnik- zIk+3)5{M^8VES&IsJMJvS4*xw+0|F+?AaQvc>HoGQu0J(Po&a$aO?E*&fcvvuK<_; zBf95tNvM|_gu6wAKV3BU4V?Sy7RJV|dlbUfe8Iw$f}cKK`|4l|dfgZ}7I}ezA|@1u zf9N1yn-Olp5bm%LZgGI3bBRx<6S}WOx-_dEsG74C-I`hCB!pW9jcNp>X5mGF(aaDw znkmP^BFAf1JvcNUFDB|Pnqcvk=4i&8W@8}6Q-bCavq-94Dr!_CQEFCfm*?#padRbv1Uq1_az~UzEFl2+z($?LW+lcTx2&%GBaNZG_u<|J417vTLB^8rbQ1 zd{K=|Dy}KjHH8VB117t+$K>$Y+M{@9RPT&T%{+B^3}~0t$XUfTqq=4=+65hLLJrT? z9>sfJ^`4ih^B|l5o$`Srr2|JQzMkzd)pu;i`FKJdm{xq#RWsr5dr6r5&TsLweyQpt zsIYABtHh@iD!S1jcSqrQ9zIeWQ^T=xc&rp2Q^IG|@RH^WeRA@4x%Z(XkoY?k>*B z;gcI7#WAcphN}+3>aTi<;F02>8aPo7jFbW+N?=qCjILh+@jL0_;gY{k_V>ZXuJE>3 zb@uPX6z6Eg<$Y-;ogJWUXGbLxty+PQ>_4hlj#b@2t{mtu1^S;HsC+P4HItEP5&&U- zNsz&_Dmi z!rx;$1w8~`Qxh5JI)sggy{vBvH!oi literal 0 HcmV?d00001 diff --git a/src/salesanalyzer_mds/segment_revenue_share.py b/src/salesanalyzer_mds/segment_revenue_share.py index 4753f93..88fd890 100644 --- a/src/salesanalyzer_mds/segment_revenue_share.py +++ b/src/salesanalyzer_mds/segment_revenue_share.py @@ -3,10 +3,11 @@ def segment_revenue_share(sales_data: pd.DataFrame, price_col: str = 'UnitPrice', - quantity_col: str = 'Quantity') -> pd.DataFrame: + quantity_col: str = 'Quantity', + price_thresholds: tuple = None) -> pd.DataFrame: """ Segments products into three categories—cheap, medium, and expensive— - based on price, and calculates their respective share in total revenue. + based on price and calculates their respective share in total revenue. Parameters: ----------- @@ -16,6 +17,9 @@ def segment_revenue_share(sales_data: pd.DataFrame, Column containing product prices. Default is 'UnitPrice'. quantity_col : str Column containing quantities sold. Default is 'Quantity'. + price_thresholds : tuple, optional + User-defined price thresholds (cheap_threshold, expensive_threshold). + If None, quantiles (0.33, 0.67) are used. Returns: -------- @@ -26,63 +30,58 @@ def segment_revenue_share(sales_data: pd.DataFrame, Raises: ------- ValueError: - If the input DataFrame is empty or specified columns contain - any missing data + If the input DataFrame is empty or specified columns contain missing data. KeyError: If any of the specified columns are missing in the DataFrame. TypeError: If any of the columns contain invalid data types. - - Example: - -------- - >>> sales_data = pd.DataFrame({ - ... 'UnitPrice': [10, 20, 50, 70, 100, 30, 40], - ... 'Quantity': [2, 3, 1, 5, 4, 6, 3] - ... }) - - >>> result = segment_revenue_share(sales_data) - >>> print(result) - PriceSegment TotalRevenue RevenueShare (%) - 0 cheap 80 6.78 - 1 medium 350 29.66 - 2 expensive 750 63.56 """ + # Check if input dataframe is empty if sales_data.empty: raise ValueError("Input DataFrame is empty.") - # Check if required columns are missing + # Check if required columns exist required_columns = {price_col, quantity_col} missing_columns = required_columns - set(sales_data.columns) if missing_columns: raise KeyError(f"Missing columns in input DataFrame: {missing_columns}") - # Check if required columns contain any missing data + # Check for missing values if sales_data[price_col].isna().any() or sales_data[quantity_col].isna().any(): raise ValueError(f"{price_col} or {quantity_col} column contains missing values.") - # Check if invalid data types are present in required columns + # Check for valid numeric types if not pd.api.types.is_numeric_dtype(sales_data[price_col]): raise TypeError(f"{price_col} must contain numeric data.") if not pd.api.types.is_numeric_dtype(sales_data[quantity_col]): raise TypeError(f"{quantity_col} must contain numeric data.") # Calculate revenue as price * quantity - sales_data['Revenue'] = sales_data[price_col] * sales_data[quantity_col] - - # Sort the prices - sorted_prices = sales_data[price_col].sort_values() - - # Calculate price thresholds for segmentation - cheap_threshold = sorted_prices.quantile(0.33) - expensive_threshold = sorted_prices.quantile(0.67) + sales_data = sales_data.assign( + Revenue=sales_data[price_col] * sales_data[quantity_col] + ) - # Categorize prices by threshold - sales_data['PriceSegment'] = sales_data[price_col].apply( - lambda price: 'cheap' if price <= cheap_threshold else - 'medium' if price <= expensive_threshold else - 'expensive' - ) + # Determine price price_thresholds + if price_thresholds is not None: + cheap_threshold, expensive_threshold = price_thresholds + else: + sorted_prices = sales_data[price_col].sort_values() + cheap_threshold = sorted_prices.quantile(0.33) + expensive_threshold = sorted_prices.quantile(0.67) + + # Categorize prices based on price_thresholds + def categorize_price(price): + if price <= cheap_threshold: + return 'cheap' + elif price <= expensive_threshold: + return 'medium' + else: + return 'expensive' + + sales_data = sales_data.assign( + PriceSegment=sales_data[price_col].apply(categorize_price) + ) # Calculate revenue share for each segment revenue_share = ( @@ -91,26 +90,23 @@ def segment_revenue_share(sales_data: pd.DataFrame, .reset_index() .rename(columns={'Revenue': 'TotalRevenue'}) ) + total_revenue = revenue_share['TotalRevenue'].sum() - # Handle cases where total revenue is 0 - if total_revenue == 0: - revenue_share['RevenueShare (%)'] = 0.0 - else: - revenue_share['RevenueShare (%)'] = ( - (revenue_share['TotalRevenue'] / total_revenue) * 100 - ) + # Prevent division by zero + revenue_share['RevenueShare (%)'] = ( + ((revenue_share['TotalRevenue'] / total_revenue) + * 100 if total_revenue > 0 else 0.0) + ) - # Round to 2 decimal places - revenue_share = revenue_share.round( - {'TotalRevenue': 2, 'RevenueShare (%)': 2} - ) + # Round values for better readability + revenue_share = revenue_share.round({'TotalRevenue': 2, + 'RevenueShare (%)': 2}) - # Ensure segments are in order: cheap, medium, expensive + # Ensure all segments are included, even if they have zero revenue segment_order = ['cheap', 'medium', 'expensive'] - revenue_share['PriceSegment'] = pd.Categorical( - revenue_share['PriceSegment'], categories=segment_order, ordered=True) - - revenue_share = revenue_share.sort_values(by='PriceSegment').reset_index(drop=True) + revenue_share = (revenue_share.set_index('PriceSegment') + .reindex(segment_order, fill_value=0) + .reset_index()) return revenue_share From 52de6b206004b1a308684bf39301341dedb75d30 Mon Sep 17 00:00:00 2001 From: Franklin Aryee Date: Thu, 30 Jan 2025 18:13:45 -0800 Subject: [PATCH 5/8] test: Updated associated to test to account for price thresholds --- ...predict_sales.cpython-311-pytest-8.3.4.pyc | Bin 0 -> 10934 bytes ...ry_statistics.cpython-311-pytest-8.3.4.pyc | Bin 0 -> 5928 bytes ...revenue_share.cpython-311-pytest-8.3.4.pyc | Bin 0 -> 10920 bytes tests/test_segment_revenue_share.py | 103 +++++++++++++----- 4 files changed, 76 insertions(+), 27 deletions(-) create mode 100644 tests/__pycache__/test_predict_sales.cpython-311-pytest-8.3.4.pyc create mode 100644 tests/__pycache__/test_sales_summary_statistics.cpython-311-pytest-8.3.4.pyc create mode 100644 tests/__pycache__/test_segment_revenue_share.cpython-311-pytest-8.3.4.pyc diff --git a/tests/__pycache__/test_predict_sales.cpython-311-pytest-8.3.4.pyc b/tests/__pycache__/test_predict_sales.cpython-311-pytest-8.3.4.pyc new file mode 100644 index 0000000000000000000000000000000000000000..313932e27da71e7fe4a9f5a9f4fb2c14ee43547e GIT binary patch literal 10934 zcmeHNU2Gf25xygjKSxqOku1fst>m1(s%iatbu5x{U#IE7#AT1W#J1q!4uMV;oac_Is{6%Ghcpa{@B6mXyv4gA!Z zUEcAIB5gTQ;yn0B4)|j`(j9m4+@uio@R;LDdMABRA5HT~f7DN<#Av|9se$uBvq;5#gXg$+;K#0L zNDW=9G4JgEH{j+SixI6=1jRGqiqpEUN)Tw15(3(+ z@F%%&%@?SDJj@$48KNqQxE|Bwq^e~bbyd@~4x}+f*5ycCyI_PeB&|$=OiG?q4I!S; zXNE7Z#ROf36s(XLpzUGel(N|zHU$u!*Zs+3M8X9%ox z9Kj_f95lR{8C+BiFOd_Psu_XPa&k%?Bm}&V>!MLZ*Id(QM#BX;nNVU-vQ@3ggLapA zAtuKfX-Z&|Oc6u07IQ;91wBsdT;x`8sNbFM_^GB6tpoZj^-404lF5v!b_^%R2&6iW zrN*Z`N{8COl zn-|X(#ItuemuHw~^fP}`u6g^rtxG?8{#fkkH*h2;p2&+Q3gQVgaENyE5Kn!0y>+Sc@N)B0*N2z72e6liXfLA-rtd&6 zoAOK-dl_ZQ`Sz0S>0&QC3tIAhmGNW+C|S=C-A-=g`7IY9Y#C z8Z9*CnJ!w0GUa@G$@X;7LT4dNz-HkKIdM2I4j05>%sN0x$%C|y-}9h(2kyi`QGp6f&OJnWT4soL9;Nh$@f7k zkF+l`*z5kVR~X#q`>>w}`q7rik!JVD&BBpQzK>gZqixa1uWktCh)a_P;HP-EHS;-exiYvFmsbN`R z?>(p8V2|Kvp``F{$EDVI}9@m%#}VSLTG_1029I&|wy{%qNPhO?w!@QU{$KgZ9x z6yIwwwiJJhsmKj*&T%s*=v%E!ZdP!XG{?{K+GRxqbhEZTcY4wCwl0WvNr8GcZxJQ>rV|*@6Cv<>hIx=n0@j5^QIzQpYmosH+Xd#rfjJ|P&=M(4ms7CM=LJkQmDCMa zMp^V2?#xUlk}f3ONcJGzfqrB)7;PUvv?idJNBaGA-+iF2fb+G&}%;rc@1AL`fnzliU^iCe)OAF+=ue zBNSnP#{l@oQfazNX`-V@A0QVwsqK$g@?D_5vJ=R^;WN+WYDeIEb8sR1=Cy!!(H#-F!kIr_>$>gG&;A}^pM)C|0hdXgWr|x_i z5|%sLR=E>{hT=5~+tN;#W?_MLP74bE3Y9K8JFK|C+3qjUp%V-L`NOsUeROL-xE>4~ z(+m$CpJWGQ+Fed|0pmfP-3>Y`eLzfSkLGGeZ%r*|mwMjrxz==(f9t^21Np`s`MUN( zUHeVtZ*|=uQ1+$H4?jk6G--mu2=4O|3|rv0ovr^}wxqhXXM(Tk3EWAi-*IXMc# zIHI*bLXQHZuG#eV-s_V~Jx4y>G z!FIUK1>5sL>32zJF9>)G)XrJhXYrVbjbWFwq=3l;FO6HgRg8OUzCBi^X=7N$Lp!Lk ztZdd4%XY|Og2i03WmdV&07jSTTyX1juJ3rE=S6t_^R*)d;0YrOyYVhKQV8N*j!{Sm zBTK?aK^VzXidiOSVYkI&A~uF4goRQ~CIaPJN+1JiCRxlitFg;~0Q)Y}$L7!8+>tYd zgHd>b^0lJ{;0dE>`)EEmS_tA@j!{Smqf5eQK^VK2Ql~ zsNz*R{&R-B6e^V>Vq1t}KQwaA;Si?UH7q-;YZRf<+U(ijD7_Rim8~QIl`2|M1uCtr zZfDD^y6t_jMqgN7DPI98RH|%wSx{J|ebu27E(> zG!mm>A(4Rh9pVksLyT(K$H`WL0S0 zde6=w!=TfHdVi`ETrc&%i1GCaJO=>!D8hzFn#I@QIe zrjep_Nv~3ho_F0#RKx9&UvEC-a^H3dhrGVqA`dkDxZyXqL+Gq&w8Yq6NQCWHz(z)d zsM90?Z~BQA%kUL>bbd9SbR7T3CKWB>IQM5WI64^Ki=eNRu-zFyQDr3tuN*QL3-NSj zhVA9hZ5+CxL#KDTp@T09(5($R1ydiF^~GI(-z2=bm{j+ZGa!sRj@l9sa4gR+V^386 z>6XXx=?9e(|V+{Rt`P^1uw%!|wQEmvyasLj_)g?eeeX1V#9 zYvZ}5T{*4+j;$}fjQ>qfBdu$h58d%@;qf@sM)!O9mw;oV4YoAnH@JBiE~^P0g8v63 CP3Io~ literal 0 HcmV?d00001 diff --git a/tests/__pycache__/test_sales_summary_statistics.cpython-311-pytest-8.3.4.pyc b/tests/__pycache__/test_sales_summary_statistics.cpython-311-pytest-8.3.4.pyc new file mode 100644 index 0000000000000000000000000000000000000000..5d887c3bc3e763ea569d91f684dee49e75861e47 GIT binary patch literal 5928 zcmd5=TWr+W89tuzjO|G#8A3=HAmDHbnJmkYaK9`B64=7RuoV}Hw8%1JlNdagJ@x>J z!YWcxwQ5l}s@f7&d5HFb!a}Qk>O-YU1@@tD^`Ma$SzW18#RG3aEE3|W|9?F9B%oHR zw8!@6<8%9epZ{|HbN&zxhX|x+%V$UTLczpq?seScqLpOT;p$%yYTU#TtG%zw7@Mg z(y9dkBborXOcMdyGzoCI#+@Ot&LuMwYL4_4RHGC$1 z#LOhGez+q(d}iplvU6ZZoC#A+tLFHGPUE(2L1#o|DJgHwj=N2knu4(v4Y;K1fxRF+ z)^UI-^1La0`#`aM{d8hFQCtyo*j?;cKW$E%#T6ep>@IeG;;?(ZDHsjT6M&Z_*eraH z0G|PY$}n1;?jQ}OJAa1EyLIQGuUW^Yz)bej%uU2oK!7+pL{fxwa4X(yjeBv1d8`LS4d{eTqTgM8mV^cr+F1%@OEFX>+!ttqNuY)8Y+&b}$?|#a6 z&-MIq{lhKa(0t#%Lf^hTzrVomcgnXX#EPK;6*M#^#K(}S7vdsfsfYr^QkhDP8|5?~ zJbrli(i4Dpa5h*k*!PK26ExA_9!mASZu)F=5~VcBP5QKu*7UIXF3j|Uib{Y$!ywR> ziJeShXF(eBb;BN$(;1plO&Lu??o~}wF3C{Nf?U8hW32rs8~&ev8f31g^!2Mj+5zRG zC!|MTfBoy<4BijAVLM~1rb6{=dOD}OkzCsNET=2Uk|VL)V(BLMiBTo%c`OB-vFDLiFIHQou%1K6jJl=EFi4cq4$9M2LVlLTDkP^kv5{I%by5d_2?qMMqxRRFF2!w~|2k zbsGtEPG6h3@y)rv*$#g)&z~&tCn4kA?@<2;7{whjN#=mlg-PS;RdTM@geuMJSMi%% zosVl60hC&(bR<_QYRn9&)}YX+(75;o##Hg3bLfc|BrTu?!I*@4&Tsl{T!%|((l_bX zL``}aT3}2~U`)*uy$!~s$`+_FtZX(cdlu*yERmjv#AVwvo53#!*#b;IfFh=$bO3Xn ze$wrbPsBu*AJbLM<&&B0c!O5bUD$j#0?tc}XG-a|d8CyZ(o9k59%y7?OVH7hl5Tsq zk}fjRLFkVRTPVdTGB1)+yc{}*)zb}Ol}ByR-AQ18vF;hc?ARZ1EMOEa{sE{r<~ zM`~3Q!afN2>Xit5Ess}yPmZLUHjsW0#dY5e{{?cLy6h<`0yTR-@+BYm+j6FA0<<4^@>!2HR_TG~9!MnXd-SG4n^c{r9 z3V~g~q7@>c$nCD*PB^ijI?~0wbg>{^oF`nMpTT#cJhv9Fytu6(Zga$K^Zs%jYw!}A zeqU){?b(of^X5%M3HyW?aB=d7u=)^^@4J`A%jy0oFCHw22OaSs^58XKt&oR9$U{vT zt7w1{4~?ys+2^FwUy z!$lns;$%Q+w!sh8Z>y@&wvBX3Neew}s`uN>v^B6MI)d*_t(UsQz>q;V6P10Cu^{aSoJ}$Drcv ztlSl>5auL9DQqB0#dZ7N_jqJ)AVtUZ#-g79E!+`XUx1Y`kG#5Tc1y7%Ie#PY6H63r_?NFxKvhmvV*) zXCs!|JeVCTgSJ0hm^3D*h=;5)Ub6|sua0Gxhw>o*+Q;Tx3BJ)Oi5|h)}_wJ zyodnR7vcsMrZBB`TfC30IAriU(3~ z2x;mE>3L`eN=mGM0KoinToF}v@yj8n8}1?rPqC{=mOJIUNIrDx?|*T^t+n$$ffHbF w%n2|3#1|BFK{6i>MAb|;lKbX`}rHUvV6fN>6LHwgYQMACUuz?l>7)VhBhR{DW6a$8T z^_)Aivpf561t(||w6im3@8jG%cV@nG&OPVezv1~H1IK}h?WKothWQW7)DD-Gc>1@H zc%P9OnN2ebEd6(7+({S9WVi(nYu0!dytKx*;D>)#I*szS8HB9SsmLitWXN<(Q}RhBYJR!b1&mXa+f33XW_O41$( zBs_4r;d*)o$oq`O!k=}l*y$yBAwOn;x_->c?rqmSsCj7D!a%s<;p7_N-L@$ zOPVAs=7^vvs+P(wiJm31)6ZY}#7uCrR!O*)=&2?4&5E8U__3bEkUWQE1jz{?4_rpz z&1_2hCP^g~!~c3g%4#WXO>`Nqylez6!Hr+RckqCD3f#;W?~T6+FGd}Q7n6M_oytmN zO;N_Lrfw2QjbF>%Ou?Wp<&p(_PwIFwmw8vp!-Ks0aynO#p+X(McIo=~#dI$D&gk^` zrRy)xB~G509;eSKWu^35Q6b~_m@1`-wy%6Fzh;C~`i2ttZU{b^4=!~Y$OiM7mkCGi zo~y>Ebb3}}BRW0zJnsG8JqF09j63Z6EfUCje53~&Chczk882M)OTGZ5&M#SFoL_96 z)sWLVR%-BP8ox87-SD-!ZFf^!SIM>0(7M(O!^oapuakF&tJl_3vTxV#_{HsgTwL>k~>^Dwh>TK$ZYvX=U`L zq$;xbz-@Sv%ZijYyctDK6*5LZSgSta;V9=Itm z4KGx{i!l87WJyhtRjd^Rf%IZ0T?qF?Hb*j2I+298LSBGk>|gy5$OiKpeqiTpnIEt4 z;~TI1Cfa?sRP8^b)3e(Df=Cye&!KB z^UF&*KU3!CD*PObg4?%wYisqstaDRkZmPmfLBc%0M3knfD9J1cOoOPEm=e3|x(8zO z&=#F_S4WWoF#=)YDD{9yd0RxvU2=DTe~YMe5GjN;P6d#n48gU*fwL0w0JpspjqjaNPkGF?4? z3^D*uV;{CT`9~;nnv?p(KxSpvl54lwL@*joE39M1R=1kJR>m|Lq>~KGIIz`J4~m+@ zpHt?r^g1)hOu~1&?OJue&D>^L=56T9tPwrM>2GA!0;tTzLN-a&IgwH-Oi*W$IIA>C-3Nu~`A2J1F zn-@J~0RG7-B&UHG{#rjKR$QiHidan(&>g}CZA(oPnYd|SKuD68fa5G&>NXJ2Lvi78 zeCSbp=zj5jT8|Hv<7X@Jvm5g!a8}C!p%M`AYmQNZz{!^VqZR+C?jJ?qnB}w*fis4{ zX(+2VZAw|d>C8%Slu5Yk!&Wc9fTB+7rExym>UXUF;i#FD9q>}u3PNg8n3v`SbLu51 zt0`3z-Ajw~;AZESj(qPwU;o*2p;^)W1m6w00A!woDLn}kG)i~v`I_^-B`<8_6ptVY%V*R1(y}3Pt?~I-3BWW#Erow0YPYS4EE*|G`%K<#0Kltpos!@ zzcsTJOl1LHDY*$X|U!j2%qdJthfzKtt2fUT7SM=OD&h>zoxpki7n`^Ac1)cqpj%Pgmrh_B;_uZFUE z)25V#ugm_h-gO)?~?)sdDZ>Qldb#~dm8*uVbFX8vyr%SVAUtp5KOK35_K`^qR;)~5n zb6LvD7(GIlQUD7J)>KNT1=9i7Qc!~i7UC^Ag0|5#vFHhi?oVn@NnXPmO4>OE^PSx`+<}j9wNQLt@H?=~BP<7qfNQ zs0FH+w;KxEdeNruOhkk+S`*thHVuL`#71ZOP2gHa_J&x{FZ*@=N}0b>;jgrUTh#mU ztR5QG>G{78a2UOP*b`OL9yG>*8vWoU^n}|4$DNbZHi1|))z(q(&$a|02$1%IJ@B>I zgU;g6qUU$o>-gYqsL3954#a--KL2}h2!lP~i>I3HK{7{RwL(j;)i9naq7A44KTmRU zA*q@0W@57@`f8#c^^J_ibE*$x&o*=X{x;tnibq zGBcw0>3h&eYcV6>PKQ*K^vmYWXqC`(>cVOhF`L04R? z1M5Ofk3EArHcP&RB7#J8lgse;gif}uU}>vx%|DxPkw2yV8NDeP*4a`?SU8~fRlh`z z;-<^d)PQ^&db$gjY5)OgI&f(Fg=+k`PS0xW@aEj+9LUrN%G3zT)IpS~gW%HkQK8ZY zPnG%875?-en^d7)M@94WUx0QBnaJ!NM_5F5HHW`0HLXVD`gEgL@(e9J+s&anM#HIx zb+oZ^jX^rFXZvfc;9tqUdaek5ORGha{dErTSllbNHCyLgOo?mq#XKcX$@LC`mRa|f zyz0*zu{E*K?gAh%8EA_k?R~!eCR4 z!R0T4=h6_3gsl(xnc^T-y7XRjr^47d*!du()fq_CnEo2NQ~QBzFnWC4Jhv`vN_K|2 ziKgvPfbFY|l}LCS6k{t0PGm?w(*Qrfmev4cV>&&zCO2OBC=n& z&Yu_A;>Dqn{Mt!T4Sv^32X=%{1lLEz;JR?$9oq*fYg{0s*jY_~4 z7-&0zHs?g!eQbf1)vSqA@ewm^o!;%v+r?2W@619X(7o zHkxW|G_4{Qz_+yzbEktjm2Vj4_ke8&E>!?v+R;vwfuF>8zH@)@M~6Q+44HC#yb>R; zQsDi>vm@Q_{Sp5G55OzO#wxL~j`2{K^1oQ-$1407Oip5bTfv>PmHzP$kN<3;a^iBi z|8hAtSBcGST>EvXdvmN3I;wML^w7~tXox0;^w3Z_G+YS{)9mp5^N&JjFiFo}ac3&E zv{e`a7D$$9ja45y>dX!~v%_W7j%0Z&65a2YnP-iOHOi%`Cv&L*- zmNaz^g^efGGU5TlUy~qG0O%O|6Q=l$nIH#|7$eGU0s$T)U>6xatBeq6LnHQvHxB_) zNu{D~M9j<>>T6tYL>piXJGP}IGBRkUc3S{VW#C&(t!l73PM06)9v(7@H^b#y>?caQQdh~nK|8g?s*xP|ITxJE)NU);(&bWW?Wshf-at&-eYVcbYs>gBOD7W z=r)OPEG%H!B;sRXS;{6c9}BApHi<=9agVXdF&?uvfg3^ACL?x=mW_niLwigsp>X*x DEJ$=W literal 0 HcmV?d00001 diff --git a/tests/test_segment_revenue_share.py b/tests/test_segment_revenue_share.py index 14b0e19..9b608fe 100644 --- a/tests/test_segment_revenue_share.py +++ b/tests/test_segment_revenue_share.py @@ -6,6 +6,7 @@ @pytest.fixture def sample_data(): + """Sample sales data for testing""" return pd.DataFrame({ 'UnitPrice': [1.50, 2.50, 3.50, 4.50, 5.50], 'Quantity': [10, 20, 30, 40, 50] @@ -13,35 +14,57 @@ def sample_data(): def test_normal_case(sample_data): + """Test default segmentation (quantile-based)""" result = segment_revenue_share(sample_data) + + expected = pd.DataFrame({ + 'PriceSegment': ['cheap', 'medium', 'expensive'], + 'TotalRevenue': [65.0, 105.0, 455.0], + 'RevenueShare (%)': [10.4, 16.8, 72.8] + }) + + expected = expected.sort_values(by="PriceSegment").reset_index(drop=True) + result = result.sort_values(by="PriceSegment").reset_index(drop=True) + + assert_frame_equal(result, expected, atol=1e-2) + + +def test_custom_thresholds(sample_data): + """Test segmentation with user-defined price thresholds""" + custom_thresholds = (3.0, 4.0) + result = segment_revenue_share( + sample_data, + price_thresholds=custom_thresholds + ) + expected = pd.DataFrame({ 'PriceSegment': ['cheap', 'medium', 'expensive'], 'TotalRevenue': [65.0, 105.0, 455.0], 'RevenueShare (%)': [10.4, 16.8, 72.8] }) - expected['PriceSegment'] = pd.Categorical( - expected['PriceSegment'], categories=['cheap', 'medium', 'expensive'], ordered=True - ) - assert_frame_equal( - result.sort_values('PriceSegment').reset_index(drop=True), - expected.sort_values('PriceSegment').reset_index(drop=True), - atol=1e-2 - ) + + expected = expected.sort_values(by="PriceSegment").reset_index(drop=True) + result = result.sort_values(by="PriceSegment").reset_index(drop=True) + + assert_frame_equal(result, expected, atol=1e-2) def test_empty_dataframe(): + """Ensure function raises ValueError on empty DataFrame""" empty_df = pd.DataFrame(columns=['UnitPrice', 'Quantity']) with pytest.raises(ValueError): segment_revenue_share(empty_df) def test_missing_columns(sample_data): + """Ensure function raises KeyError when required columns are missing""" missing_col_df = sample_data.drop(columns=['Quantity']) with pytest.raises(KeyError): segment_revenue_share(missing_col_df) def test_missing_values(): + """Ensure function raises ValueError if NaN values exist""" missing_values_df = pd.DataFrame({ 'UnitPrice': [2.55, None, 3.39], 'Quantity': [6, 6, None] @@ -51,6 +74,7 @@ def test_missing_values(): def test_invalid_data_types(): + """Ensure function raises TypeError for non-numeric data""" invalid_df = pd.DataFrame({ 'UnitPrice': ['a', 2.50, 'c'], 'Quantity': [1, 'b', 3] @@ -60,63 +84,88 @@ def test_invalid_data_types(): def test_zero_quantity(sample_data): + """Ensure function handles case where all quantities are zero""" zero_quantity_df = sample_data.copy() zero_quantity_df['Quantity'] = 0 result = segment_revenue_share(zero_quantity_df) + expected = pd.DataFrame({ 'PriceSegment': ['cheap', 'medium', 'expensive'], 'TotalRevenue': [0.0, 0.0, 0.0], 'RevenueShare (%)': [0.0, 0.0, 0.0] }) - expected['PriceSegment'] = pd.Categorical( - expected['PriceSegment'], categories=['cheap', 'medium', 'expensive'], ordered=True - ) - assert_frame_equal( - result.sort_values('PriceSegment').reset_index(drop=True), - expected.sort_values('PriceSegment').reset_index(drop=True) - ) + + expected = expected.sort_values(by="PriceSegment").reset_index(drop=True) + result = result.sort_values(by="PriceSegment").reset_index(drop=True) + + assert_frame_equal(result, expected) + + +def test_extreme_thresholds(sample_data): + """Test when price thresholds are extreme (all items fall into a single category)""" + result = segment_revenue_share(sample_data, price_thresholds=(10, 20)) + + expected = pd.DataFrame({ + 'PriceSegment': ['cheap', 'medium', 'expensive'], + 'TotalRevenue': [625.0, 0.0, 0.0], # Everything is "cheap" + 'RevenueShare (%)': [100.0, 0.0, 0.0] + }) + + expected = expected.sort_values(by="PriceSegment").reset_index(drop=True) + result = result.sort_values(by="PriceSegment").reset_index(drop=True) + + assert_frame_equal(result, expected) def test_single_row(): + """Ensure function correctly categorizes a single product""" single_row_df = pd.DataFrame({ 'UnitPrice': [5.00], 'Quantity': [10] }) result = segment_revenue_share(single_row_df) + expected = pd.DataFrame({ - 'PriceSegment': ['cheap'], - 'TotalRevenue': [50.0], - 'RevenueShare (%)': [100.0] + 'PriceSegment': ['cheap', 'medium', 'expensive'], + 'TotalRevenue': [50.0, 0.0, 0.0], + 'RevenueShare (%)': [100.0, 0.0, 0.0] }) - expected['PriceSegment'] = pd.Categorical( - expected['PriceSegment'], categories=['cheap', 'medium', 'expensive'], ordered=True - ) + + expected = expected.sort_values(by="PriceSegment").reset_index(drop=True) + result = result.sort_values(by="PriceSegment").reset_index(drop=True) + assert_frame_equal(result, expected) def test_identical_prices(): + """Ensure function handles case where all products have the same price""" identical_prices_df = pd.DataFrame({ 'UnitPrice': [10.00, 10.00, 10.00], 'Quantity': [1, 2, 3] }) result = segment_revenue_share(identical_prices_df) + expected = pd.DataFrame({ - 'PriceSegment': ['cheap'], - 'TotalRevenue': [60.0], - 'RevenueShare (%)': [100.0] + 'PriceSegment': ['cheap', 'medium', 'expensive'], + 'TotalRevenue': [60.0, 0.0, 0.0], + 'RevenueShare (%)': [100.0, 0.0, 0.0] }) - expected['PriceSegment'] = pd.Categorical( - expected['PriceSegment'], categories=['cheap', 'medium', 'expensive'], ordered=True - ) + + expected = expected.sort_values(by="PriceSegment").reset_index(drop=True) + result = result.sort_values(by="PriceSegment").reset_index(drop=True) + assert_frame_equal(result, expected) def test_large_data(): + """Ensure function correctly processes large datasets""" large_data = pd.DataFrame({ 'UnitPrice': [i for i in range(1, 101)], 'Quantity': [i for i in range(1, 101)] }) result = segment_revenue_share(large_data) + total_revenue = sum(large_data['UnitPrice'] * large_data['Quantity']) + assert result['TotalRevenue'].sum() == total_revenue assert not result.empty From 1b2c650bdc186ca02cdd2250e7ab9efe67961b63 Mon Sep 17 00:00:00 2001 From: Franklin Aryee Date: Thu, 30 Jan 2025 18:31:42 -0800 Subject: [PATCH 6/8] doc: Updated example to include user-defined price thresholds --- docs/example.ipynb | 92 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 80 insertions(+), 12 deletions(-) diff --git a/docs/example.ipynb b/docs/example.ipynb index bda0d37..4361d41 100644 --- a/docs/example.ipynb +++ b/docs/example.ipynb @@ -266,7 +266,7 @@ "source": [ "## Get Revenue Share for each Product Category\n", "\n", - "Another feature of `saleanalyzer`, the `segment_revenue_share()` function, segments products into three categories (cheap < medium < expensive) — based on their price, and calculates the respective share of total revenue contributed by each segment. This function is particularly useful for analyzing product sales data and understanding revenue distribution across different pricing tiers.\n", + "Another feature of `saleanalyzer`, the `segment_revenue_share()` function, segments products into three categories (cheap < medium < expensive) — based on their price, and calculates the respective share of total revenue contributed by each segment. By default, the price thresholds are set automatically, but users can define custom thresholds to categorize products according to their specific business needs. This function is particularly useful for analyzing product sales data and understanding revenue distribution across different pricing tiers.\n", "> Use help(sales_summary_statistics) for more information about the function" ] }, @@ -337,10 +337,83 @@ } ], "source": [ + "# Using default price thresholds\n", "revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity')\n", "revenue_share" ] }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
PriceSegmentTotalRevenueRevenueShare (%)
0cheap11509.24
1medium360028.92
2expensive770061.85
\n", + "
" + ], + "text/plain": [ + " PriceSegment TotalRevenue RevenueShare (%)\n", + "0 cheap 1150 9.24\n", + "1 medium 3600 28.92\n", + "2 expensive 7700 61.85" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Using user-defined price thresholds\n", + "revenue_share = segment_revenue_share(sample_data, price_col='UnitPrice', quantity_col='Quantity', price_thresholds=(300, 500))\n", + "revenue_share" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -356,7 +429,7 @@ }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -409,7 +482,7 @@ "1 1.33" ] }, - "execution_count": 5, + "execution_count": 6, "metadata": {}, "output_type": "execute_result" } @@ -441,7 +514,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -494,7 +567,7 @@ "1 1.88" ] }, - "execution_count": 6, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } @@ -509,16 +582,11 @@ "source": [ "This is the end of the tutorial, where you have seen how to get sales data insights using our package." ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] } ], "metadata": { "kernelspec": { - "display_name": "salesanalyzser", + "display_name": "salesanalyzer", "language": "python", "name": "python3" }, @@ -532,7 +600,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.12.2" + "version": "3.11.9" } }, "nbformat": 4, From fb88d525fef50adff3272667313dad4e02f1c028 Mon Sep 17 00:00:00 2001 From: Franklin Aryee Date: Thu, 30 Jan 2025 18:49:18 -0800 Subject: [PATCH 7/8] Updated readme to include price_threshold parameter --- README.md | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 291fb34..a6a042c 100644 --- a/README.md +++ b/README.md @@ -14,6 +14,7 @@ $ pip install salesanalyzer_mds ``` ## Functions + - `segment_revenue_share`: Segments products into three categories: cheap, medium, expensive, based on price, and calculates their respective share in total revenue. - `predictSales`: Predicts future sales based on the provided historical data and the target. - `sales_summary_statistics`: Calculates a variety of summary statistics that provide insights into overall sales performance, @@ -36,10 +37,13 @@ import pandas as pd # additional import to handle your sales data 3. Retrieve the insights: **Summary statistics** + ``` sales_summary_statistics(your_sales_data) ``` -The `sales_summary_statistics` returns a pandas DataFrame with: + +The `sales_summary_statistics()` function returns a pandas DataFrame with: + - 'total_revenue': The total revenue generated by all sales. - 'unique_customers': The number of unique customers. - 'average_order_value': The average value of an order (sum of revenue per invoice). @@ -48,15 +52,22 @@ The `sales_summary_statistics` returns a pandas DataFrame with: - 'average_revenue_per_customer': The average revenue generated by each customer. **Segment revenue share** + ``` segment_revenue_share(your_sales_data, price_col='UnitPrice', - quantity_col='Quantity') # replace column names with your data column names + quantity_col='Quantity', + price_thresholds=None) # replace column names with your data column names ``` -The `segment_revenue_share` returns a pandas DataFrame showing the total revenue share for each price segment: -'cheap', 'medium', 'expensive'. + +The `segment_revenue_share()` funtion returns a pandas DataFrame showing the total revenue share for each price segment: +'cheap', 'medium', 'expensive'. Custom price thresholds can be set by the user other set automatically. + +- Custom price thresholds can be set using the `price_thresholds` parameter. +- If not specified, thresholds are automatically determined based on the data. **Predict sales** + ``` predict_sales(your_sales_data, new_data, # new sales data to base the predictions on @@ -65,12 +76,14 @@ predict_sales(your_sales_data, target = 'Quantity', date_feature = 'InvoiceDate') ``` -The `predict_sales` returns a DataFrame with prediction values, and a printed out MSE score. + +The `predict_sales()` function returns a DataFrame with prediction values, and a printed out MSE score. ## Developer notes: ### Running The Tests Run the following command in the terminal from the project's root directory to execute the tests: + ```bash pytest tests/ ``` From bd6e2da401ab28ffb6366c6bf9fe1783fbda39f2 Mon Sep 17 00:00:00 2001 From: Franklin Aryee Date: Thu, 30 Jan 2025 19:45:18 -0800 Subject: [PATCH 8/8] Added environment.yml file --- environment.yml | 6 ++++++ 1 file changed, 6 insertions(+) create mode 100644 environment.yml diff --git a/environment.yml b/environment.yml new file mode 100644 index 0000000..8c34e53 --- /dev/null +++ b/environment.yml @@ -0,0 +1,6 @@ +name: salesanalyzer +channels: + - conda-forge +dependencies: + - python==3.10 + - cookiecutter==2.6.0