datazip-inc · Akshay-datazip · Nov 17, 2025 · Nov 17, 2025 · Nov 17, 2025 · Nov 17, 2025
diff --git a/blog/2024-09-16-mongodb-etl-challenges.mdx b/blog/2024-09-16-mongodb-etl-challenges.mdx
@@ -9,7 +9,7 @@ tags: [mongodb,etl ]
 
 # Four Critical MongoDB ETL Challenges and How to tackle them for your Data Lake and Data Warehouse?
 
-![Mongo db logo showing ETL challenges](/img/blog/cover/mongodb-etl-challenges-cover.webp)
+![Monitor with leaf icon on green grid background, representing MongoDB ETL challenges](/img/blog/cover/mongodb-etl-challenges-cover.webp)
 
 
 Moving data from MongoDB into a data warehouse or lakehouse for analytics and reporting can be a complex process. 

diff --git a/blog/2024-09-24-querying-json-in-snowflake.mdx b/blog/2024-09-24-querying-json-in-snowflake.mdx
@@ -118,7 +118,7 @@ In this query, you're flattening the orders array inside the `customer_data` JSO
 
 **Output:**
 
-![Snowflake query result showing JSON data extraction with proper formatting](/img/blog/2024/09/querying-json-in-snowflake-3.webp)
+![Database query results table with one row for customer ID "C123", first order "O1001", and total orders as 3.](/img/blog/2024/09/querying-json-in-snowflake-3.webp)
 
 * John doesn't have any orders, so he won't appear in the results.
 
@@ -1308,7 +1308,7 @@ Now, doing a `SELECT * customer_data`
 
 **OUTPUT**:
 
-![Database query results table showing a single row with John Doe's customer info (name, age 30, and email) as a JSON object field](/img/blog/2024/09/querying-json-in-snowflake-28.webp)
+![Database query results table showing a single row with John Doe’s customer info (name, age 30, and email) as a JSON object field](/img/blog/2024/09/querying-json-in-snowflake-28.webp)
 
 **Querying the OBJECT**:
 

diff --git a/blog/2024-10-18-flatten-array.mdx b/blog/2024-10-18-flatten-array.mdx
@@ -379,7 +379,7 @@ df = json_normalize( data)
 
 and you’ll be good to go.
 
-![Terminal showing a row from a DataFrame with id, name, nested projects JSON, and individual contact info columns](/img/blog/2024/11/flatten-array-24.webp)
+![Nested JSON data is transformed so each top-level key maps to a column in a flat table](/img/blog/2024/11/flatten-array-24.webp)
 
 ## Method 5: Flattening Nested JSON in PySpark
 

diff --git a/blog/2025-01-07-olake-architecture.mdx b/blog/2025-01-07-olake-architecture.mdx
@@ -9,7 +9,7 @@ tags: [olake]
 
 # OLake Architecture, How did we do it?
 
-![Pipeline diagram: source DB data chunked and routed to Amazon S3, then transformed and written to a lakehouse](/img/blog/cover/olake-architecture-cover.webp)
+![Diagram showing database sync flow: snapshot/CDC extraction, chunking, transform, Amazon S3, and writing to lakehouse](/img/blog/cover/olake-architecture-cover.webp)
 
 update: [18.02.2025]
 1. We support S3 data partitioning - refer docs [here](https://olake.io/docs/writers/parquet/partitioning)
@@ -184,7 +184,7 @@ These results prove that with chunk-based parallel loading and direct Writer int
 
 To illustrate how concurrency is handled, here’s a more extended ASCII diagram:
 
-![Pipeline diagram: source DB data chunked and routed to Amazon S3, then transformed and written to a lakehouse](/img/blog/cover/olake-architecture-cover.webp)
+![Diagram showing database sync flow: snapshot/CDC extraction, chunking, transform, Amazon S3, and writing to lakehouse](/img/blog/cover/olake-architecture-cover.webp)
 
 Each driver/writer pair can independently read chunks from MongoDB and write them directly to the target, while the Core monitors everything centrally.
 

diff --git a/blog/2025-04-23-how-to-set-up-postgresql-cdc-on-aws-rds.mdx b/blog/2025-04-23-how-to-set-up-postgresql-cdc-on-aws-rds.mdx
@@ -67,7 +67,7 @@ Access is needed to modify following (please contact your DevOps team who has se
 AWS RDS already has a default RDS parameter group as given in the below picture, and you won’t be able to edit the parameters from this group.
 
 
-![Amazon RDS parameter groups dashboard showing default MySQL and PostgreSQL parameter groups list](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-1.webp)
+![Apache Airflow logo with text 'with OLake', illustrating integration between Airflow workflow management and OLake platform](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-1.webp)
 
 Hence it is advised to create a new parameter group as suggested below.
 
@@ -80,7 +80,7 @@ Hence it is advised to create a new parameter group as suggested below.
     2. Choose the required postgres version
 
 
-![Create parameter group screen for PostgreSQL in AWS RDS, with CDC-enabled production setup fields shown](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-2.webp)
+![Create PostgreSQL parameter group in AWS RDS: prod-cdc-paramgroup for postgres14 with CDC enabled](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-2.webp)
 
 3. Click on Create and parameter group will be created.
 
@@ -134,7 +134,7 @@ Everything on RDS runs within virtual private networks, which means we need to c
 **Backup Retention Period**: Choose a backup retention period of at least 7 days.
 :::
 
-![AWS RDS additional configuration showing selected DB parameter group and backup retention period option](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-5.webp)
+![Additional configuration for AWS RDS PostgreSQL instance showing DB parameter group selection (pg15) and backup retention period set to 1 day](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-5.webp)
 
 * At the bottom, Continue -> Apply immediately ->  Modify DB instance.
 
@@ -149,7 +149,7 @@ select * from pg_settings where name in ('wal_level', 'rds.logical_replication')
 ```
 You should see results like below ( settings , on and logical )
 
-![SQL query results showing rds.logical_replication set to on and wal_level as logical, with descriptions](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-4.webp)
+![SQL query showing rds.logical_replication set to on and wal_level set to logical, enabling logical decoding for PostgreSQL](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-4.webp)
 
 Now we could connect to this database using our Postgres root user. However, best practices are to use a dedicated account which has the minimal set of required privileges for CDC. Use this user credentials to connect to the Postgres source
 

diff --git a/blog/2025-09-04-creating-job-olake-docker-cli.mdx b/blog/2025-09-04-creating-job-olake-docker-cli.mdx
@@ -36,16 +36,16 @@ We'll take the "job-first" approach. It's straightforward and keeps you in one f
 From the left nav, go to **Jobs → Create Job**.  
 You'll land on a wizard that starts with the **source**.
 
-![OLake jobs dashboard with the Jobs tab, Create Job button, and Create your first Job button highlighted](/img/docs/getting-started/create-your-first-job/job-create.webp)
+![OLake jobs dashboard for new users with option to create first job highlighted](/img/docs/getting-started/create-your-first-job/job-create.webp)
 
 ### 2) Configure the Source (Postgres)
 
 Choose **Set up a new source** → select **Postgres** → keep OLake version at the latest stable.  
 Name it clearly, fill the Postgres endpoint config, and hit **Test Connection**.
 
-![OLake Create Job step 2 screen, showing source connector options including Postgres, MongoDB, MySQL, and Oracle, with Postgres highlighted](/img/docs/getting-started/create-your-first-job/job-source-connector.webp)
+![OLake create job interface with new source connector selection for MongoDB, Postgres, MySQL, Oracle](/img/docs/getting-started/create-your-first-job/job-source-connector.webp)
 
-![OLake Create Job with Postgres source configuration fields and a side help panel with setup steps](/img/docs/getting-started/create-your-first-job/job-source-config.webp)
+![OLake create job screen showing Postgres source endpoint and CDC configuration with setup guide](/img/docs/getting-started/create-your-first-job/job-source-config.webp)
 
 > 📝 **Planning for CDC?**  
 > Make sure a **replication slot** exists in Postgres.  
@@ -56,13 +56,13 @@ Name it clearly, fill the Postgres endpoint config, and hit **Test Connection**.
 Now we set where the data will land.  
 Pick **Apache Iceberg** as the destination, and **AWS Glue** as the catalog.
 
-![OLake Create Job step 3 destination setup, showing connector selection with Amazon S3 and Apache Iceberg options, and Apache Iceberg highlighted](/img/docs/getting-started/create-your-first-job/job-dest-connector.webp)
+![OLake create job destination step showing Apache Iceberg and Amazon S3 connector selection](/img/docs/getting-started/create-your-first-job/job-dest-connector.webp)
 
-![OLake Create Job destination setup for Apache Iceberg, with Catalog Type dropdown showing AWS Glue, JDBC, Hive, and REST options](/img/docs/getting-started/create-your-first-job/job-dest-catalog.webp)
+![OLake create job destination endpoint config with catalog type selection AWS Glue JDBC Hive REST](/img/docs/getting-started/create-your-first-job/job-dest-catalog.webp)
 
 Provide the connection details and **Test Connection**.
 
-![OLake Create Job destination config for Apache Iceberg with AWS Glue; right panel shows AWS Glue Catalog Write Guide with setup and prerequisites](/img/docs/getting-started/create-your-first-job/job-dest-config.webp)
+![OLake create job destination setup with Apache Iceberg, AWS Glue catalog, and S3 configuration form](/img/docs/getting-started/create-your-first-job/job-dest-config.webp)
 
 ### 4) Configure Streams
 
@@ -76,50 +76,50 @@ For this walkthrough, we'll:
 - **Partitioning:** by **year** extracted from `dropoff_datetime`
 - **Schedule:** every day at **12:00 AM**
 
-![OLake streams selection, employee_data and other tables checked, sync mode set to Full Refresh + CDC](/img/docs/getting-started/create-your-first-job/job-streams.webp)
+![OLake stream selection UI for Postgres to Iceberg job with Full Refresh + CDC mode](/img/docs/getting-started/create-your-first-job/job-streams.webp)
 
 Select the checkbox for `fivehundred`, then click the stream name to open stream settings.  
 Pick the sync mode and toggle **Normalization**.
 
-![OLake streams- only five hundred selected, Full Refresh + CDC mode](/img/docs/getting-started/create-your-first-job/job-stream-select.webp)
+![OLake create job stream selection for Postgres to Iceberg with Full Refresh + CDC on fivehundred](/img/docs/getting-started/create-your-first-job/job-stream-select.webp)
 
 Let's make the destination query-friendly. Open **Partitioning** → choose `dropoff_datetime` → **year**.  
 Want more? Read the [Partitioning Guide](/docs/writers/parquet/partitioning).
 
-![OLake: fivehundred stream selected, partition by dropoff_datetime and year](/img/docs/getting-started/create-your-first-job/job-stream-partition.webp)
+![OLake partitioning UI for stream fivehundred using dropoff_datetime and year fields in Iceberg](/img/docs/getting-started/create-your-first-job/job-stream-partition.webp)
 
 Add the **Data Filter** so we only move rows from 2010 onward.
 
-![OLake: fivehundred stream, filter dropoff_datetime >= 2010-01-01](/img/docs/getting-started/create-your-first-job/job-data-filter.webp)
+![OLake create job with data filter for Postgres to Iceberg pipeline on dropoff_datetime column](/img/docs/getting-started/create-your-first-job/job-data-filter.webp)
 
 Click **Next** to continue.
 
 ### 5) Schedule the Job
 
 Give the job a clear name, set **Every Day @ 12:00 AM**, and hit **Create Job**.
 
-![OLake Create Job page showing step 1, with job name, frequency dropdown (Every Day highlighted), and job start time settings](/img/docs/getting-started/create-your-first-job/job-schedule.webp)
+![OLake create job stream filter UI for Postgres to Iceberg pipeline using dropoff_datetime column and operators](/img/docs/getting-started/create-your-first-job/job-schedule.webp)
 
 You're set! 🎉
 
-![OLake job created successfully for fivehundred stream, Full Refresh + CDC](/img/docs/getting-started/create-your-first-job/job-creation-success.webp)
+![OLake job creation success dialog for Postgres to Iceberg ETL pipeline](/img/docs/getting-started/create-your-first-job/job-creation-success.webp)
 
 Want results right away? Start a run immediately with **Jobs → (⋮) → Sync Now**.
 
-![Active jobs screen for OLake with job options menu expanded.](/img/docs/getting-started/create-your-first-job/job-sync-now.webp)
+![OLake jobs dashboard with actions menu for sync, edit streams, pause, logs, settings, delete](/img/docs/getting-started/create-your-first-job/job-sync-now.webp)
 
 You'll see status badges on the right (**Running / Failed / Completed**).  
 For more details, open **Job Logs & History**.
 
 - Running  
-  ![OLake active jobs screen showing a running job](/img/docs/getting-started/create-your-first-job/job-running.webp)
+  ![OLake jobs dashboard showing active job status as running for Postgres to Iceberg pipeline](/img/docs/getting-started/create-your-first-job/job-running.webp)
 
 - Completed  
-  ![OLake active jobs screen showing a completed job](/img/docs/getting-started/create-your-first-job/job-success.webp)
+  ![OLake jobs dashboard showing completed status for Postgres to Iceberg pipeline job](/img/docs/getting-started/create-your-first-job/job-success.webp)
 
 Finally, verify that data landed in S3/Iceberg as configured:
 
-![Amazon S3 folder view showing two Parquet files under dropoff_datetime_year=2011](/img/docs/getting-started/create-your-first-job/job-data-s3.webp)
+![Amazon S3 browser showing parquet files for dropoff_datetime_year=2011 partition folder](/img/docs/getting-started/create-your-first-job/job-data-s3.webp)
 
 ### 6) Manage Your Job (from the Jobs page)
 
@@ -128,29 +128,29 @@ Finally, verify that data landed in S3/Iceberg as configured:
 **Edit Streams** — Change which streams are included and tweak replication settings.  
 Use the stepper to jump between **Source** and **Destination**.
 
-![Stream selection screen for OLake Postgres Iceberg job, with S3 folder and sync steps shown](/img/docs/getting-started/create-your-first-job/job-edit-streams-page.webp)
+![OLake Postgres Iceberg job UI with stepper showing Job Config, Source, Destination, Streams steps](/img/docs/getting-started/create-your-first-job/job-edit-streams-page.webp)
 
 > By default, source/destination editing is locked. Click **Edit** to unlock.
 
-![OLake Postgres Iceberg job destination config with AWS Glue setup and edit option](/img/docs/getting-started/create-your-first-job/job-edit-destination.webp)
+![OLake destination config edit screen for Postgres Iceberg job with AWS Glue write guide](/img/docs/getting-started/create-your-first-job/job-edit-destination.webp)
 
 > 🔄 **Need to change Partitioning / Filter / Normalization for an existing stream?**  
 > Unselect the stream → **Save** → reopen **Edit Streams** → re-add it with new settings.
 
 **Pause Job** — Temporarily stop runs. You'll find paused jobs under **Inactive Jobs**, where you can **Resume** any time.
 
-![Inactive jobs tab showing a PostgreSQL job with the option to resume in the OLake UI](/img/docs/getting-started/create-your-first-job/job-resume.webp)
+![OLake inactive jobs list with menu showing resume job option for Postgres Iceberg pipeline](/img/docs/getting-started/create-your-first-job/job-resume.webp)
 
 **Job Logs & History** — See all runs. Use **View Logs** for per-run details.
 
-![Job log history for a Postgres Iceberg job, showing a completed status and option to view logs.](/img/docs/getting-started/create-your-first-job/view-logs.webp)
+![OLake Postgres Iceberg job logs history screen showing completed run and view logs action](/img/docs/getting-started/create-your-first-job/view-logs.webp)
 
-![OLake Postgres Iceberg job logs showing system info and sync steps with Iceberg writer and Postgres source.](/img/docs/getting-started/create-your-first-job/logs-page.webp)
+![OLake job logs screen displaying detailed execution logs for Postgres to Iceberg sync job](/img/docs/getting-started/create-your-first-job/logs-page.webp)
 
 **Job Settings** — Rename, change frequency, pause, or delete.  
 Deleting a job moves its source/destination to **inactive** (if not used elsewhere).
 
-![Active Postgres Iceberg job settings screen; job runs daily at 12 AM UTC with pause and delete options](/img/docs/getting-started/create-your-first-job/job-settings.webp)
+![OLake job settings screen showing scheduling, pause and delete options for Postgres Iceberg job](/img/docs/getting-started/create-your-first-job/job-settings.webp)
 
 ## Option B — OLake CLI (Docker)
 

diff --git a/blog/2025-09-04-deletion-formats-deep-dive.mdx b/blog/2025-09-04-deletion-formats-deep-dive.mdx
@@ -38,7 +38,7 @@ This metadata layer consists of:
 - **Manifest files** that contain information about data files and their statistics
 - **Data files** where your actual data lives in formats like Parquet or Avro
 
-![OLake architecture diagram with connectors between user, database, and lakehouse](/img/blog/2025/11/architecture.webp)
+![MongoDB operational database to Apache Iceberg analytical lakehouse migration](/img/blog/2025/11/architecture.webp)
 
 This layered architecture is what makes Iceberg so powerful. When you want to query your data, the engine doesn't need to scan directories or enumerate files; it simply reads the metadata to understand exactly which data files contain the information you need.
 

diff --git a/blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx b/blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
@@ -164,7 +164,7 @@ Configure your Apache Iceberg destination in the OLake UI:
 
 OLake supports multiple Iceberg catalog implementations including Glue, Nessie, Polaris, Hive, and Unity Catalog. For detailed configuration of other catalogs, refer to the [OLake Catalogs Documentation](https://olake.io/docs/writers/iceberg/catalog/rest/).
 
-![OLake destination setup UI for Apache Iceberg with AWS Glue catalog configuration form](/img/blog/2025/12/step-4.webp)
+![OLake UI create destination screen for Apache Iceberg AWS Glue catalog configuration](/img/blog/2025/12/step-4.webp)
 
 ### Step 5: Create and Configure Your Replication Job
 
@@ -177,7 +177,7 @@ Once source and destination connections are established:
 3. Select your existing source and destination configurations
 4. In the schema section, choose tables/streams for Iceberg synchronization
 
-![OLake create job UI selecting existing Postgres data source for pipeline setup](/img/blog/2025/12/step-5-1.webp)
+![OLake create job wizard selecting MongoDB source from existing connectors](/img/blog/2025/12/step-5-1.webp)
 
 #### Choose Synchronization Mode
 
@@ -194,7 +194,7 @@ For each stream, select the appropriate sync mode based on your requirements:
 - **Partitioning**: Configure regex patterns for Iceberg table partitioning
 - **Detailed partitioning strategies**: [Iceberg Partitioning Guide](https://olake.io/docs/writers/iceberg/partitioning)
 
-![OLake stream selection step with Full Refresh + CDC sync for dz-stag-users table](/img/blog/2025/12/step-5-2.webp)
+![OLake job stream selection UI picking tables and configuring CDC sync mode](/img/blog/2025/12/step-5-2.webp)
 
 ### Step 6: Execute Your Synchronization
 
@@ -233,7 +233,7 @@ To validate your replication setup, configure AWS Athena for querying your Icebe
 2. Execute SQL queries against your replicated Iceberg tables
 3. Verify data consistency and query performance
 
-![Amazon Athena editor querying olake_test_table with SQL SELECT and results](/img/blog/2025/12/step-7.webp)
+![Amazon Athena query editor showing SQL SELECT on olake_test_table with results](/img/blog/2025/12/step-7.webp)
 
 ## Production-Ready Best Practices for PostgreSQL to Iceberg Replication