Laurel Technology Solutions – Customer Data Unification (ETL Project)

📌 Project Overview

Laurel Technology Solutions Ltd. is a UK-based small-to-medium enterprise (SMB) that has recently experienced significant growth and a rapid influx of new customers. Like many growing organisations, Laurel currently relies on off-the-shelf IT systems to support individual business functions such as Finance and Human Resources.

While these systems function well in isolation, they store customer information in separate silos, making it difficult to compare, analyse, or exploit data across departments. For example, financial data is stored exclusively in finance systems, while employment information is maintained only within HR systems. As a result, Laurel does not currently possess a single, unified customer record.

This project addresses that challenge by designing and implementing a Python-based ETL (Extract, Transform, Load) solution that consolidates multiple heterogeneous data sources into a central organisational database.

🎯 Business Context & Expansion Scenario

Following a highly successful financial year, Laurel Technology Solutions Ltd. is preparing for international expansion.

Key expansion goals include:

Establishing a new regional office in Seoul, South Korea
Supporting customers across East Asian markets
Maintaining the UK headquarters as the primary operational hub
Enabling regular data exchange between UK and South Korea offices
Hiring local employees in South Korea while retaining UK staff
Supporting executives who will travel frequently between offices

To support this growth, Laurel aims to modernise its core data infrastructure by unifying its customer data streams, enabling deeper analysis, improved decision-making, and future data exploitation.

🛠️ Technical Objective

The primary goal of this project is to build a robust ETL pipeline that:

Extracts customer data from multiple structured and semi-structured sources
Transforms and cleans the data, resolving inconsistencies and duplicates
Unifies all data into a single, coherent customer record
Loads the unified data into a central MySQL database

This central data store can then be used as a foundation for future analytics, reporting, and expansion-related operations.

📂 Data Sources

The ETL pipeline processes the following data formats:

CSV – Demographic and vehicle information
JSON – Financial and billing details
XML – HR-related data such as salary, pension, and employment attributes
TXT – Unstructured business rules and data corrections

Each source represents data generated by different organisational systems, reflecting real-world enterprise data fragmentation.

🧩 Solution Architecture

Programming Language: Python
Database: MySQL (via USBWebserver)
ORM: PonyORM
Design Approach:
- Modular ETL stages (Extract → Transform → Load)
- Defensive programming to handle missing or inconsistent data
- De-duplication to ensure one unified record per customer
- Reusable and maintainable code structure

🚀 Key Features

Combines multiple heterogeneous data sources into a single schema
Safely handles missing fields and inconsistent records
Prevents duplicate customer entries during data unification
Automatically creates database tables using ORM mapping
Designed to be rerunnable without corrupting existing data
Portable and environment-agnostic database setup using USBWebserver

📊 Output

A centralised MySQL database containing unified customer records
A clean and consistent dataset suitable for:
- Business intelligence
- Cross-departmental analysis
- International expansion planning
Exportable CSV output for reporting and assessment submission

📝 Reflection & Evaluation

In addition to the technical implementation, this project includes a critical report that:

Reflects on the design and implementation of the ETL solution
Evaluates the suitability of chosen technologies for international expansion
Discusses scalability, data governance, and infrastructure considerations
Provides recommendations for future improvements and growth

🔮 Future Improvements

Potential extensions to this project include:

Introducing unique customer IDs for stronger entity resolution
Splitting the unified schema into multiple relational tables
Adding validation rules for financial and personal data
Implementing role-based access control for international teams
Integrating analytics or visualisation tools for data insights

👤 Author

Simon Ugochukwu Awaogu
MSc Cybersecurity / Computing
University of Sunderland

📄 License

This project is developed as part of CETM_50 Coursework.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
the_laurel_proj.py		the_laurel_proj.py
user_data.csv		user_data.csv
user_data.json		user_data.json
user_data.txt		user_data.txt
user_data.xml		user_data.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Laurel Technology Solutions – Customer Data Unification (ETL Project)

📌 Project Overview

🎯 Business Context & Expansion Scenario

🛠️ Technical Objective

📂 Data Sources

🧩 Solution Architecture

🚀 Key Features

📊 Output

📝 Reflection & Evaluation

🔮 Future Improvements

👤 Author

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Laurel Technology Solutions – Customer Data Unification (ETL Project)

📌 Project Overview

🎯 Business Context & Expansion Scenario

🛠️ Technical Objective

📂 Data Sources

🧩 Solution Architecture

🚀 Key Features

📊 Output

📝 Reflection & Evaluation

🔮 Future Improvements

👤 Author

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages