Skip to content

Commit

Permalink
readme for module2
Browse files Browse the repository at this point in the history
  • Loading branch information
alexandruvesa committed Apr 8, 2024
1 parent 495b8d9 commit 776f31b
Showing 1 changed file with 18 additions and 4 deletions.
22 changes: 18 additions & 4 deletions course/module-2/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,21 @@ Change Data Capture, commonly known as CDC, is an efficient way to track changes
The purpose of CDC is to capture insertions, updates, and deletions applied to a database and to make this change data available in a format easily consumable by downstream applications.
Why do we need CDC pattern?

Real-time Data Syncing: CDC facilitates near-real-time data integration and syncing.
Efficient Data Pipelines: It allows incremental data loading, which is more efficient than bulk load operations.
Minimized System Impact: CDC minimizes the impact on the source system by reducing the need for performance-intensive queries.
Event-Driven Architectures: It enables event-driven architectures by streaming database events.
1. Real-time Data Syncing: CDC facilitates near-real-time data integration and syncing.
2. Efficient Data Pipelines: It allows incremental data loading, which is more efficient than bulk load operations.
3. Minimized System Impact: CDC minimizes the impact on the source system by reducing the need for performance-intensive queries.
4. Event-Driven Architectures: It enables event-driven architectures by streaming database events.


## Contents
- data_flow module: Singleton class to manage RabbitMQ connection.
- mq.py
- db module: Singleton class to connect to MongoDB database.
- mongo.py
- cdc.py :
- **Functionality**: This script sets up a CDC pipeline. It monitors a MongoDB database for changes and forwards these changes to a RabbitMQ message queue.
- **MongoDB Connection**: It connects to a MongoDB database (specifically the "scrabble" collection), utilizing a class from `db.mongo`.
- **Change Tracking**: The script watches for insert operations in MongoDB. When a change is detected, it processes and serializes the change details.
- **Data Processing**: Each change's metadata is extracted and serialized using `json_util`. The `_id` field is converted to a string and additional metadata is added.
- **RabbitMQ Integration**: After serialization, the script publishes the data to a RabbitMQ queue named "test_queue".
- test_cdc.py

0 comments on commit 776f31b

Please sign in to comment.