A comprehensive business intelligence solution featuring both Jupyter notebook analysis and a professional Streamlit dashboard for e-commerce sales data with configurable time periods and reusable business metrics calculations.
This project transforms a basic exploratory data analysis into a professional, maintainable business intelligence framework. The refactored solution provides:
- Configurable Analysis: Easily analyze any time period or compare different years
- Modular Architecture: Reusable data loading and metrics calculation modules
- Professional Visualizations: Business-oriented charts with proper formatting
- Strategic Insights: Automated generation of business recommendations
data_analysis/
├── EDA_Refactored.ipynb # Main analysis notebook
├── dashboard.py # Streamlit dashboard application
├── data_loader.py # Data loading and processing module
├── business_metrics.py # Business metrics calculation module
├── requirements.txt # Python dependencies
├── README.md # This file
└── ecommerce_data/ # Data directory
├── orders_dataset.csv
├── order_items_dataset.csv
├── products_dataset.csv
├── customers_dataset.csv
├── order_reviews_dataset.csv
└── order_payments_dataset.csv
- Set analysis year, comparison year, and month filters
- Flexible time period analysis without code changes
- Automatic handling of missing data periods
- Revenue Analysis: Total revenue, growth rates, average order value
- Product Performance: Category analysis, revenue share, top performers
- Geographic Insights: State-level revenue and order analysis
- Customer Satisfaction: Review scores, satisfaction distribution
- Delivery Performance: Delivery times, speed categorization
- Monthly revenue trend charts
- Product category performance bars
- Interactive geographic heatmaps
- Customer satisfaction distributions
- Consistent color schemes and formatting
- Strategic recommendations based on data patterns
- Performance benchmarking and alerts
- Executive summary generation
- Python 3.8 or higher
- Jupyter Notebook or JupyterLab (for notebook analysis)
-
Clone or download the project files
-
Install required dependencies:
pip install -r requirements.txt
-
Ensure data files are in place:
- Place CSV files in the
ecommerce_data/directory - Verify all required files are present (see Project Structure above)
- Place CSV files in the
-
Run the applications:
For Streamlit Dashboard:
streamlit run dashboard.py
For Jupyter Notebook Analysis:
jupyter notebook EDA_Refactored.ipynb
-
Launch the dashboard:
streamlit run dashboard.py
-
Navigate the interface:
- Use the year filter in the top-right to select analysis period
- View KPI cards showing key metrics with trend indicators
- Explore interactive charts in the 2x2 grid layout
- Monitor customer experience metrics in the bottom row
-
Dashboard Features:
- Real-time filtering: All charts update automatically when year is changed
- Professional styling: Clean, business-ready interface
- Trend indicators: Green/red arrows showing performance changes
- Formatted values: Currency displayed as $300K, $2M for readability
The EDA_Refactored.ipynb notebook provides a comprehensive, well-structured analysis with the following improvements over the original:
- Professional Structure: Clear table of contents and logical flow
- Configurable Framework: Easy parameter adjustment without code changes
- Enhanced Visualizations: Business-oriented charts with proper formatting
- Automated Insights: Data-driven recommendations and summaries
- Modular Code: Reusable functions from custom modules
-
Open the refactored notebook:
EDA_Refactored.ipynb -
Configure analysis parameters in the second code cell:
ANALYSIS_YEAR = 2023 # Primary analysis year COMPARISON_YEAR = 2022 # Comparison year for growth calculations ANALYSIS_MONTH = None # Set to specific month (1-12) or None for full year DATA_PATH = 'ecommerce_data/'
-
Run all cells to generate the complete analysis
- Introduction & Business Objectives - Context and key questions
- Data Loading & Configuration - Environment setup and parameters
- Data Preparation & Transformation - Data quality and processing
- Business Metrics Analysis - Comprehensive performance analysis
- Revenue Analysis (growth, trends, AOV)
- Product Performance (categories, market share)
- Geographic Analysis (state-level insights)
- Customer Experience (satisfaction, delivery performance)
- Key Findings & Summary - Executive summary and insights
- Data Dictionary - Business terms and technical documentation
# Analyze only Q4 2023
for month in [10, 11, 12]:
ANALYSIS_MONTH = month
# Run analysis# Use different data location
DATA_PATH = '/path/to/your/data/'# Modify in data_loader.py create_sales_dataset method
status_filter = 'delivered' # or 'shipped', 'processing', etc.from data_loader import EcommerceDataLoader, load_and_process_data
# Quick start
loader, processed_data = load_and_process_data('ecommerce_data/')
# Advanced usage
loader = EcommerceDataLoader('ecommerce_data/')
loader.load_raw_data()
processed_data = loader.process_all_data()
# Create filtered dataset
sales_data = loader.create_sales_dataset(
year_filter=2023,
month_filter=None,
status_filter='delivered'
)from business_metrics import BusinessMetricsCalculator, MetricsVisualizer
# Calculate metrics
metrics_calc = BusinessMetricsCalculator(sales_data)
report = metrics_calc.generate_comprehensive_report(
current_year=2023,
previous_year=2022
)
# Create visualizations
visualizer = MetricsVisualizer(report)
revenue_fig = visualizer.plot_revenue_trend()
category_fig = visualizer.plot_category_performance()- Total Revenue: Sum of all delivered order item prices
- Revenue Growth Rate: Year-over-year percentage change
- Average Order Value (AOV): Average total value per order
- Monthly Growth Trends: Month-over-month performance
- Category Revenue: Revenue by product category
- Market Share: Percentage of total revenue by category
- Category Diversity: Distribution across product lines
- State Performance: Revenue and order count by state
- Market Penetration: Number of active markets
- Regional AOV: Average order value by geographic region
- Review Scores: Average satisfaction rating (1-5 scale)
- Satisfaction Distribution: Percentage of high/low ratings
- Delivery Performance: Average delivery time and speed metrics
BUSINESS METRICS SUMMARY - 2023
============================================================
REVENUE PERFORMANCE:
Total Revenue: $3,360,294.74
Total Orders: 4,635
Average Order Value: $724.98
Revenue Growth: -2.5%
CUSTOMER SATISFACTION:
Average Review Score: 4.10/5.0
High Satisfaction (4+): 84.2%
DELIVERY PERFORMANCE:
Average Delivery Time: 8.0 days
Fast Delivery (≤3 days): 28.5%
- Monthly revenue trend line charts
- Top product category horizontal bar charts
- Interactive US state choropleth maps
- Customer satisfaction distribution charts
- Extend the
BusinessMetricsCalculatorclass inbusiness_metrics.py - Add visualization methods to
MetricsVisualizerclass - Update the notebook to display new metrics
# Example: Custom visualization
def plot_custom_metric(self, data):
fig, ax = plt.subplots(figsize=(12, 6))
# Your visualization code
return fig- Modify
data_loader.pyto handle different CSV structures - Update column mappings in the
EcommerceDataLoaderclass - Add new data validation rules
-
Module Import Errors:
- Ensure all files are in the same directory
- Check Python path configuration
-
Missing Data Files:
- Verify CSV files are in the
ecommerce_data/directory - Check file naming matches expected patterns
- Verify CSV files are in the
-
Empty Results:
- Verify date filters match available data
- Check order status filtering
-
Visualization Issues:
- Ensure all required packages are installed
- Check Plotly version compatibility for interactive maps
- For large datasets, consider chunked processing
- Use data sampling for initial exploration
- Implement caching for repeated analysis
- Header: Title with year selection filter (applies globally)
- KPI Row: 4 metric cards with trend indicators
- Total Revenue, Monthly Growth, Average Order Value, Total Orders
- Color-coded trends (green for positive, red for negative)
- Charts Grid: 2x2 interactive visualization layout
- Revenue trend (current vs previous year)
- Top 10 product categories bar chart
- US state choropleth map
- Customer satisfaction vs delivery time analysis
- Bottom Row: Customer experience metrics
- Average delivery time with trend
- Review score with star rating
- Real-time Filtering: All visualizations update automatically
- Professional Styling: Business-ready interface with uniform card heights
- Plotly Charts: Interactive, publication-quality visualizations
- Responsive Design: Adapts to different screen sizes
- Error Handling: Graceful handling of missing data
- Real-time data connections
- Predictive analytics and forecasting
- Customer segmentation analysis
- A/B testing framework
- Automated report scheduling
- Export functionality (PDF reports)
- Integration with business intelligence tools
- API endpoints for metrics access
- Machine learning model integration
- Advanced statistical analysis
- Mobile-responsive improvements
To extend this analysis framework:
- Follow the existing code structure and documentation patterns
- Add comprehensive docstrings to new functions
- Include unit tests for new business logic
- Update this README with new features
This project is provided as-is for educational and business analysis purposes.
Note: This framework is designed to be easily maintained and extended for ongoing business intelligence needs. The modular architecture ensures that updates to data sources or metric calculations can be made without affecting the overall analysis structure.