Skip to content

simon200ok/cetm50-laurel-tech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Laurel Technology Solutions – Customer Data Unification (ETL Project)

📌 Project Overview

Laurel Technology Solutions Ltd. is a UK-based small-to-medium enterprise (SMB) that has recently experienced significant growth and a rapid influx of new customers. Like many growing organisations, Laurel currently relies on off-the-shelf IT systems to support individual business functions such as Finance and Human Resources.

While these systems function well in isolation, they store customer information in separate silos, making it difficult to compare, analyse, or exploit data across departments. For example, financial data is stored exclusively in finance systems, while employment information is maintained only within HR systems. As a result, Laurel does not currently possess a single, unified customer record.

This project addresses that challenge by designing and implementing a Python-based ETL (Extract, Transform, Load) solution that consolidates multiple heterogeneous data sources into a central organisational database.


🎯 Business Context & Expansion Scenario

Following a highly successful financial year, Laurel Technology Solutions Ltd. is preparing for international expansion.

Key expansion goals include:

  • Establishing a new regional office in Seoul, South Korea
  • Supporting customers across East Asian markets
  • Maintaining the UK headquarters as the primary operational hub
  • Enabling regular data exchange between UK and South Korea offices
  • Hiring local employees in South Korea while retaining UK staff
  • Supporting executives who will travel frequently between offices

To support this growth, Laurel aims to modernise its core data infrastructure by unifying its customer data streams, enabling deeper analysis, improved decision-making, and future data exploitation.


🛠️ Technical Objective

The primary goal of this project is to build a robust ETL pipeline that:

  1. Extracts customer data from multiple structured and semi-structured sources
  2. Transforms and cleans the data, resolving inconsistencies and duplicates
  3. Unifies all data into a single, coherent customer record
  4. Loads the unified data into a central MySQL database

This central data store can then be used as a foundation for future analytics, reporting, and expansion-related operations.


📂 Data Sources

The ETL pipeline processes the following data formats:

  • CSV – Demographic and vehicle information
  • JSON – Financial and billing details
  • XML – HR-related data such as salary, pension, and employment attributes
  • TXT – Unstructured business rules and data corrections

Each source represents data generated by different organisational systems, reflecting real-world enterprise data fragmentation.


🧩 Solution Architecture

  • Programming Language: Python
  • Database: MySQL (via USBWebserver)
  • ORM: PonyORM
  • Design Approach:
    • Modular ETL stages (Extract → Transform → Load)
    • Defensive programming to handle missing or inconsistent data
    • De-duplication to ensure one unified record per customer
    • Reusable and maintainable code structure

🚀 Key Features

  • Combines multiple heterogeneous data sources into a single schema
  • Safely handles missing fields and inconsistent records
  • Prevents duplicate customer entries during data unification
  • Automatically creates database tables using ORM mapping
  • Designed to be rerunnable without corrupting existing data
  • Portable and environment-agnostic database setup using USBWebserver

📊 Output

  • A centralised MySQL database containing unified customer records
  • A clean and consistent dataset suitable for:
    • Business intelligence
    • Cross-departmental analysis
    • International expansion planning
  • Exportable CSV output for reporting and assessment submission

📝 Reflection & Evaluation

In addition to the technical implementation, this project includes a critical report that:

  • Reflects on the design and implementation of the ETL solution
  • Evaluates the suitability of chosen technologies for international expansion
  • Discusses scalability, data governance, and infrastructure considerations
  • Provides recommendations for future improvements and growth

🔮 Future Improvements

Potential extensions to this project include:

  • Introducing unique customer IDs for stronger entity resolution
  • Splitting the unified schema into multiple relational tables
  • Adding validation rules for financial and personal data
  • Implementing role-based access control for international teams
  • Integrating analytics or visualisation tools for data insights

👤 Author

Simon Ugochukwu Awaogu
MSc Cybersecurity / Computing
University of Sunderland


📄 License

This project is developed as part of CETM_50 Coursework.

About

centralizing database for Laurel Technologies

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages