From 776f31b60b406429108d0d34d96651338c9a1dc0 Mon Sep 17 00:00:00 2001 From: Vesa Alexandru Date: Mon, 8 Apr 2024 09:39:10 +0300 Subject: [PATCH] readme for module2 --- course/module-2/Readme.md | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/course/module-2/Readme.md b/course/module-2/Readme.md index b380d16..7395392 100644 --- a/course/module-2/Readme.md +++ b/course/module-2/Readme.md @@ -7,7 +7,21 @@ Change Data Capture, commonly known as CDC, is an efficient way to track changes The purpose of CDC is to capture insertions, updates, and deletions applied to a database and to make this change data available in a format easily consumable by downstream applications. Why do we need CDC pattern? - Real-time Data Syncing: CDC facilitates near-real-time data integration and syncing. - Efficient Data Pipelines: It allows incremental data loading, which is more efficient than bulk load operations. - Minimized System Impact: CDC minimizes the impact on the source system by reducing the need for performance-intensive queries. - Event-Driven Architectures: It enables event-driven architectures by streaming database events. + 1. Real-time Data Syncing: CDC facilitates near-real-time data integration and syncing. + 2. Efficient Data Pipelines: It allows incremental data loading, which is more efficient than bulk load operations. + 3. Minimized System Impact: CDC minimizes the impact on the source system by reducing the need for performance-intensive queries. + 4. Event-Driven Architectures: It enables event-driven architectures by streaming database events. + + +## Contents +- data_flow module: Singleton class to manage RabbitMQ connection. + - mq.py +- db module: Singleton class to connect to MongoDB database. + - mongo.py +- cdc.py : +- **Functionality**: This script sets up a CDC pipeline. It monitors a MongoDB database for changes and forwards these changes to a RabbitMQ message queue. +- **MongoDB Connection**: It connects to a MongoDB database (specifically the "scrabble" collection), utilizing a class from `db.mongo`. +- **Change Tracking**: The script watches for insert operations in MongoDB. When a change is detected, it processes and serializes the change details. +- **Data Processing**: Each change's metadata is extracted and serialized using `json_util`. The `_id` field is converted to a string and additional metadata is added. +- **RabbitMQ Integration**: After serialization, the script publishes the data to a RabbitMQ queue named "test_queue". +- test_cdc.py \ No newline at end of file