Large scale business data integration
- Involves joining SQL data tables without a common designated primary key
- Utilizes database indexes of two different vendor tables to perform a scalable join
- Natural record ordering is leveraged to significantly reduce number of search iterations
- Ultimately combines data into a single consolidated table containing each row's source
- Uses Python Pandas library to efficiently manipulate and join records in memory