diff --git a/blog/2024-09-16-mongodb-etl-challenges.mdx b/blog/2024-09-16-mongodb-etl-challenges.mdx index 704f3af0..4f9cb2a3 100644 --- a/blog/2024-09-16-mongodb-etl-challenges.mdx +++ b/blog/2024-09-16-mongodb-etl-challenges.mdx @@ -9,7 +9,7 @@ tags: [mongodb,etl ] # Four Critical MongoDB ETL Challenges and How to tackle them for your Data Lake and Data Warehouse? -![Mongo db logo showing ETL challenges](/img/blog/cover/mongodb-etl-challenges-cover.webp) +![Monitor with leaf icon on green grid background, representing MongoDB ETL challenges](/img/blog/cover/mongodb-etl-challenges-cover.webp) Moving data from MongoDB into a data warehouse or lakehouse for analytics and reporting can be a complex process.  diff --git a/blog/2024-09-24-querying-json-in-snowflake.mdx b/blog/2024-09-24-querying-json-in-snowflake.mdx index 9e886b95..e27d1840 100644 --- a/blog/2024-09-24-querying-json-in-snowflake.mdx +++ b/blog/2024-09-24-querying-json-in-snowflake.mdx @@ -118,7 +118,7 @@ In this query, you're flattening the orders array inside the `customer_data` JSO **Output:** -![Snowflake query result showing JSON data extraction with proper formatting](/img/blog/2024/09/querying-json-in-snowflake-3.webp) +![Database query results table with one row for customer ID "C123", first order "O1001", and total orders as 3.](/img/blog/2024/09/querying-json-in-snowflake-3.webp) * John doesn't have any orders, so he won't appear in the results. @@ -1308,7 +1308,7 @@ Now, doing a `SELECT * customer_data` **OUTPUT**: -![Database query results table showing a single row with John Doe's customer info (name, age 30, and email) as a JSON object field](/img/blog/2024/09/querying-json-in-snowflake-28.webp) +![Database query results table showing a single row with John Doe’s customer info (name, age 30, and email) as a JSON object field](/img/blog/2024/09/querying-json-in-snowflake-28.webp) **Querying the OBJECT**: diff --git a/blog/2024-10-18-flatten-array.mdx b/blog/2024-10-18-flatten-array.mdx index 771cf6b7..eaa3b2b0 100644 --- a/blog/2024-10-18-flatten-array.mdx +++ b/blog/2024-10-18-flatten-array.mdx @@ -379,7 +379,7 @@ df = json_normalize( data) and you’ll be good to go. -![Terminal showing a row from a DataFrame with id, name, nested projects JSON, and individual contact info columns](/img/blog/2024/11/flatten-array-24.webp) +![Nested JSON data is transformed so each top-level key maps to a column in a flat table](/img/blog/2024/11/flatten-array-24.webp) ## Method 5: Flattening Nested JSON in PySpark diff --git a/blog/2025-01-07-olake-architecture.mdx b/blog/2025-01-07-olake-architecture.mdx index ade0d032..24f40836 100644 --- a/blog/2025-01-07-olake-architecture.mdx +++ b/blog/2025-01-07-olake-architecture.mdx @@ -9,7 +9,7 @@ tags: [olake] # OLake Architecture, How did we do it? -![Pipeline diagram: source DB data chunked and routed to Amazon S3, then transformed and written to a lakehouse](/img/blog/cover/olake-architecture-cover.webp) +![Diagram showing database sync flow: snapshot/CDC extraction, chunking, transform, Amazon S3, and writing to lakehouse](/img/blog/cover/olake-architecture-cover.webp) update: [18.02.2025] 1. We support S3 data partitioning - refer docs [here](https://olake.io/docs/writers/parquet/partitioning) @@ -184,7 +184,7 @@ These results prove that with chunk-based parallel loading and direct Writer int To illustrate how concurrency is handled, here’s a more extended ASCII diagram: -![Pipeline diagram: source DB data chunked and routed to Amazon S3, then transformed and written to a lakehouse](/img/blog/cover/olake-architecture-cover.webp) +![Diagram showing database sync flow: snapshot/CDC extraction, chunking, transform, Amazon S3, and writing to lakehouse](/img/blog/cover/olake-architecture-cover.webp) Each driver/writer pair can independently read chunks from MongoDB and write them directly to the target, while the Core monitors everything centrally. diff --git a/blog/2025-04-23-how-to-set-up-postgresql-cdc-on-aws-rds.mdx b/blog/2025-04-23-how-to-set-up-postgresql-cdc-on-aws-rds.mdx index 8ffe2cf4..06b12eed 100644 --- a/blog/2025-04-23-how-to-set-up-postgresql-cdc-on-aws-rds.mdx +++ b/blog/2025-04-23-how-to-set-up-postgresql-cdc-on-aws-rds.mdx @@ -67,7 +67,7 @@ Access is needed to modify following (please contact your DevOps team who has se AWS RDS already has a default RDS parameter group as given in the below picture, and you won’t be able to edit the parameters from this group. -![Amazon RDS parameter groups dashboard showing default MySQL and PostgreSQL parameter groups list](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-1.webp) +![Apache Airflow logo with text 'with OLake', illustrating integration between Airflow workflow management and OLake platform](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-1.webp) Hence it is advised to create a new parameter group as suggested below. @@ -80,7 +80,7 @@ Hence it is advised to create a new parameter group as suggested below. 2. Choose the required postgres version -![Create parameter group screen for PostgreSQL in AWS RDS, with CDC-enabled production setup fields shown](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-2.webp) +![Create PostgreSQL parameter group in AWS RDS: prod-cdc-paramgroup for postgres14 with CDC enabled](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-2.webp) 3. Click on Create and parameter group will be created. @@ -134,7 +134,7 @@ Everything on RDS runs within virtual private networks, which means we need to c **Backup Retention Period**: Choose a backup retention period of at least 7 days. ::: -![AWS RDS additional configuration showing selected DB parameter group and backup retention period option](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-5.webp) +![Additional configuration for AWS RDS PostgreSQL instance showing DB parameter group selection (pg15) and backup retention period set to 1 day](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-5.webp) * At the bottom, Continue -> Apply immediately -> Modify DB instance. @@ -149,7 +149,7 @@ select * from pg_settings where name in ('wal_level', 'rds.logical_replication') ``` You should see results like below ( settings , on and logical ) -![SQL query results showing rds.logical_replication set to on and wal_level as logical, with descriptions](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-4.webp) +![SQL query showing rds.logical_replication set to on and wal_level set to logical, enabling logical decoding for PostgreSQL](/img/blog/2025/04/how-to-set-up-postgresql-cdc-on-aws-rds-4.webp) Now we could connect to this database using our Postgres root user. However, best practices are to use a dedicated account which has the minimal set of required privileges for CDC. Use this user credentials to connect to the Postgres source diff --git a/blog/2025-09-04-creating-job-olake-docker-cli.mdx b/blog/2025-09-04-creating-job-olake-docker-cli.mdx index 6ab9390a..38e44fa3 100644 --- a/blog/2025-09-04-creating-job-olake-docker-cli.mdx +++ b/blog/2025-09-04-creating-job-olake-docker-cli.mdx @@ -36,16 +36,16 @@ We'll take the "job-first" approach. It's straightforward and keeps you in one f From the left nav, go to **Jobs → Create Job**. You'll land on a wizard that starts with the **source**. -![OLake jobs dashboard with the Jobs tab, Create Job button, and Create your first Job button highlighted](/img/docs/getting-started/create-your-first-job/job-create.webp) +![OLake jobs dashboard for new users with option to create first job highlighted](/img/docs/getting-started/create-your-first-job/job-create.webp) ### 2) Configure the Source (Postgres) Choose **Set up a new source** → select **Postgres** → keep OLake version at the latest stable. Name it clearly, fill the Postgres endpoint config, and hit **Test Connection**. -![OLake Create Job step 2 screen, showing source connector options including Postgres, MongoDB, MySQL, and Oracle, with Postgres highlighted](/img/docs/getting-started/create-your-first-job/job-source-connector.webp) +![OLake create job interface with new source connector selection for MongoDB, Postgres, MySQL, Oracle](/img/docs/getting-started/create-your-first-job/job-source-connector.webp) -![OLake Create Job with Postgres source configuration fields and a side help panel with setup steps](/img/docs/getting-started/create-your-first-job/job-source-config.webp) +![OLake create job screen showing Postgres source endpoint and CDC configuration with setup guide](/img/docs/getting-started/create-your-first-job/job-source-config.webp) > 📝 **Planning for CDC?** > Make sure a **replication slot** exists in Postgres. @@ -56,13 +56,13 @@ Name it clearly, fill the Postgres endpoint config, and hit **Test Connection**. Now we set where the data will land. Pick **Apache Iceberg** as the destination, and **AWS Glue** as the catalog. -![OLake Create Job step 3 destination setup, showing connector selection with Amazon S3 and Apache Iceberg options, and Apache Iceberg highlighted](/img/docs/getting-started/create-your-first-job/job-dest-connector.webp) +![OLake create job destination step showing Apache Iceberg and Amazon S3 connector selection](/img/docs/getting-started/create-your-first-job/job-dest-connector.webp) -![OLake Create Job destination setup for Apache Iceberg, with Catalog Type dropdown showing AWS Glue, JDBC, Hive, and REST options](/img/docs/getting-started/create-your-first-job/job-dest-catalog.webp) +![OLake create job destination endpoint config with catalog type selection AWS Glue JDBC Hive REST](/img/docs/getting-started/create-your-first-job/job-dest-catalog.webp) Provide the connection details and **Test Connection**. -![OLake Create Job destination config for Apache Iceberg with AWS Glue; right panel shows AWS Glue Catalog Write Guide with setup and prerequisites](/img/docs/getting-started/create-your-first-job/job-dest-config.webp) +![OLake create job destination setup with Apache Iceberg, AWS Glue catalog, and S3 configuration form](/img/docs/getting-started/create-your-first-job/job-dest-config.webp) ### 4) Configure Streams @@ -76,21 +76,21 @@ For this walkthrough, we'll: - **Partitioning:** by **year** extracted from `dropoff_datetime` - **Schedule:** every day at **12:00 AM** -![OLake streams selection, employee_data and other tables checked, sync mode set to Full Refresh + CDC](/img/docs/getting-started/create-your-first-job/job-streams.webp) +![OLake stream selection UI for Postgres to Iceberg job with Full Refresh + CDC mode](/img/docs/getting-started/create-your-first-job/job-streams.webp) Select the checkbox for `fivehundred`, then click the stream name to open stream settings. Pick the sync mode and toggle **Normalization**. -![OLake streams- only five hundred selected, Full Refresh + CDC mode](/img/docs/getting-started/create-your-first-job/job-stream-select.webp) +![OLake create job stream selection for Postgres to Iceberg with Full Refresh + CDC on fivehundred](/img/docs/getting-started/create-your-first-job/job-stream-select.webp) Let's make the destination query-friendly. Open **Partitioning** → choose `dropoff_datetime` → **year**. Want more? Read the [Partitioning Guide](/docs/writers/parquet/partitioning). -![OLake: fivehundred stream selected, partition by dropoff_datetime and year](/img/docs/getting-started/create-your-first-job/job-stream-partition.webp) +![OLake partitioning UI for stream fivehundred using dropoff_datetime and year fields in Iceberg](/img/docs/getting-started/create-your-first-job/job-stream-partition.webp) Add the **Data Filter** so we only move rows from 2010 onward. -![OLake: fivehundred stream, filter dropoff_datetime >= 2010-01-01](/img/docs/getting-started/create-your-first-job/job-data-filter.webp) +![OLake create job with data filter for Postgres to Iceberg pipeline on dropoff_datetime column](/img/docs/getting-started/create-your-first-job/job-data-filter.webp) Click **Next** to continue. @@ -98,28 +98,28 @@ Click **Next** to continue. Give the job a clear name, set **Every Day @ 12:00 AM**, and hit **Create Job**. -![OLake Create Job page showing step 1, with job name, frequency dropdown (Every Day highlighted), and job start time settings](/img/docs/getting-started/create-your-first-job/job-schedule.webp) +![OLake create job stream filter UI for Postgres to Iceberg pipeline using dropoff_datetime column and operators](/img/docs/getting-started/create-your-first-job/job-schedule.webp) You're set! 🎉 -![OLake job created successfully for fivehundred stream, Full Refresh + CDC](/img/docs/getting-started/create-your-first-job/job-creation-success.webp) +![OLake job creation success dialog for Postgres to Iceberg ETL pipeline](/img/docs/getting-started/create-your-first-job/job-creation-success.webp) Want results right away? Start a run immediately with **Jobs → (⋮) → Sync Now**. -![Active jobs screen for OLake with job options menu expanded.](/img/docs/getting-started/create-your-first-job/job-sync-now.webp) +![OLake jobs dashboard with actions menu for sync, edit streams, pause, logs, settings, delete](/img/docs/getting-started/create-your-first-job/job-sync-now.webp) You'll see status badges on the right (**Running / Failed / Completed**). For more details, open **Job Logs & History**. - Running - ![OLake active jobs screen showing a running job](/img/docs/getting-started/create-your-first-job/job-running.webp) + ![OLake jobs dashboard showing active job status as running for Postgres to Iceberg pipeline](/img/docs/getting-started/create-your-first-job/job-running.webp) - Completed - ![OLake active jobs screen showing a completed job](/img/docs/getting-started/create-your-first-job/job-success.webp) + ![OLake jobs dashboard showing completed status for Postgres to Iceberg pipeline job](/img/docs/getting-started/create-your-first-job/job-success.webp) Finally, verify that data landed in S3/Iceberg as configured: -![Amazon S3 folder view showing two Parquet files under dropoff_datetime_year=2011](/img/docs/getting-started/create-your-first-job/job-data-s3.webp) +![Amazon S3 browser showing parquet files for dropoff_datetime_year=2011 partition folder](/img/docs/getting-started/create-your-first-job/job-data-s3.webp) ### 6) Manage Your Job (from the Jobs page) @@ -128,29 +128,29 @@ Finally, verify that data landed in S3/Iceberg as configured: **Edit Streams** — Change which streams are included and tweak replication settings. Use the stepper to jump between **Source** and **Destination**. -![Stream selection screen for OLake Postgres Iceberg job, with S3 folder and sync steps shown](/img/docs/getting-started/create-your-first-job/job-edit-streams-page.webp) +![OLake Postgres Iceberg job UI with stepper showing Job Config, Source, Destination, Streams steps](/img/docs/getting-started/create-your-first-job/job-edit-streams-page.webp) > By default, source/destination editing is locked. Click **Edit** to unlock. -![OLake Postgres Iceberg job destination config with AWS Glue setup and edit option](/img/docs/getting-started/create-your-first-job/job-edit-destination.webp) +![OLake destination config edit screen for Postgres Iceberg job with AWS Glue write guide](/img/docs/getting-started/create-your-first-job/job-edit-destination.webp) > 🔄 **Need to change Partitioning / Filter / Normalization for an existing stream?** > Unselect the stream → **Save** → reopen **Edit Streams** → re-add it with new settings. **Pause Job** — Temporarily stop runs. You'll find paused jobs under **Inactive Jobs**, where you can **Resume** any time. -![Inactive jobs tab showing a PostgreSQL job with the option to resume in the OLake UI](/img/docs/getting-started/create-your-first-job/job-resume.webp) +![OLake inactive jobs list with menu showing resume job option for Postgres Iceberg pipeline](/img/docs/getting-started/create-your-first-job/job-resume.webp) **Job Logs & History** — See all runs. Use **View Logs** for per-run details. -![Job log history for a Postgres Iceberg job, showing a completed status and option to view logs.](/img/docs/getting-started/create-your-first-job/view-logs.webp) +![OLake Postgres Iceberg job logs history screen showing completed run and view logs action](/img/docs/getting-started/create-your-first-job/view-logs.webp) -![OLake Postgres Iceberg job logs showing system info and sync steps with Iceberg writer and Postgres source.](/img/docs/getting-started/create-your-first-job/logs-page.webp) +![OLake job logs screen displaying detailed execution logs for Postgres to Iceberg sync job](/img/docs/getting-started/create-your-first-job/logs-page.webp) **Job Settings** — Rename, change frequency, pause, or delete. Deleting a job moves its source/destination to **inactive** (if not used elsewhere). -![Active Postgres Iceberg job settings screen; job runs daily at 12 AM UTC with pause and delete options](/img/docs/getting-started/create-your-first-job/job-settings.webp) +![OLake job settings screen showing scheduling, pause and delete options for Postgres Iceberg job](/img/docs/getting-started/create-your-first-job/job-settings.webp) ## Option B — OLake CLI (Docker) diff --git a/blog/2025-09-04-deletion-formats-deep-dive.mdx b/blog/2025-09-04-deletion-formats-deep-dive.mdx index 74f25aac..527cf1da 100644 --- a/blog/2025-09-04-deletion-formats-deep-dive.mdx +++ b/blog/2025-09-04-deletion-formats-deep-dive.mdx @@ -38,7 +38,7 @@ This metadata layer consists of: - **Manifest files** that contain information about data files and their statistics - **Data files** where your actual data lives in formats like Parquet or Avro -![OLake architecture diagram with connectors between user, database, and lakehouse](/img/blog/2025/11/architecture.webp) +![MongoDB operational database to Apache Iceberg analytical lakehouse migration](/img/blog/2025/11/architecture.webp) This layered architecture is what makes Iceberg so powerful. When you want to query your data, the engine doesn't need to scan directories or enumerate files; it simply reads the metadata to understand exactly which data files contain the information you need. diff --git a/blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx b/blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx index e07990ef..cb600ae0 100644 --- a/blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx +++ b/blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx @@ -164,7 +164,7 @@ Configure your Apache Iceberg destination in the OLake UI: OLake supports multiple Iceberg catalog implementations including Glue, Nessie, Polaris, Hive, and Unity Catalog. For detailed configuration of other catalogs, refer to the [OLake Catalogs Documentation](https://olake.io/docs/writers/iceberg/catalog/rest/). -![OLake destination setup UI for Apache Iceberg with AWS Glue catalog configuration form](/img/blog/2025/12/step-4.webp) +![OLake UI create destination screen for Apache Iceberg AWS Glue catalog configuration](/img/blog/2025/12/step-4.webp) ### Step 5: Create and Configure Your Replication Job @@ -177,7 +177,7 @@ Once source and destination connections are established: 3. Select your existing source and destination configurations 4. In the schema section, choose tables/streams for Iceberg synchronization -![OLake create job UI selecting existing Postgres data source for pipeline setup](/img/blog/2025/12/step-5-1.webp) +![OLake create job wizard selecting MongoDB source from existing connectors](/img/blog/2025/12/step-5-1.webp) #### Choose Synchronization Mode @@ -194,7 +194,7 @@ For each stream, select the appropriate sync mode based on your requirements: - **Partitioning**: Configure regex patterns for Iceberg table partitioning - **Detailed partitioning strategies**: [Iceberg Partitioning Guide](https://olake.io/docs/writers/iceberg/partitioning) -![OLake stream selection step with Full Refresh + CDC sync for dz-stag-users table](/img/blog/2025/12/step-5-2.webp) +![OLake job stream selection UI picking tables and configuring CDC sync mode](/img/blog/2025/12/step-5-2.webp) ### Step 6: Execute Your Synchronization @@ -233,7 +233,7 @@ To validate your replication setup, configure AWS Athena for querying your Icebe 2. Execute SQL queries against your replicated Iceberg tables 3. Verify data consistency and query performance -![Amazon Athena editor querying olake_test_table with SQL SELECT and results](/img/blog/2025/12/step-7.webp) +![Amazon Athena query editor showing SQL SELECT on olake_test_table with results](/img/blog/2025/12/step-7.webp) ## Production-Ready Best Practices for PostgreSQL to Iceberg Replication diff --git a/blog/2025-09-09-mysql-to-apache-iceberg-replication.mdx b/blog/2025-09-09-mysql-to-apache-iceberg-replication.mdx index 75b11ed6..cb5315b1 100644 --- a/blog/2025-09-09-mysql-to-apache-iceberg-replication.mdx +++ b/blog/2025-09-09-mysql-to-apache-iceberg-replication.mdx @@ -221,7 +221,7 @@ In the OLake UI, navigate to **Sources → Add Source → MySQL**. OLake automatically optimizes data chunking strategies for MySQL, using primary key-based chunking for maximum performance during initial loads and incremental sync operations. -![OLake UI create source configuration screen for MySQL connector and endpoint details](/img/blog/2025/13/step-3.webp) +![OLake platform setup source configuration UI for MongoDB connector](/img/blog/2025/13/step-3.webp) ### Step 4: Configure Apache Iceberg Destination (AWS Glue) @@ -240,7 +240,7 @@ Configure your Iceberg destination in the OLake UI for seamless lakehouse integr **Detailed Configuration Guide**: [Glue Catalog Setup](https://olake.io/docs/writers/iceberg/catalog/glue) **Alternative Catalogs**: For REST catalogs (Lakekeeper, Polaris) and other options: [Catalog Configuration Documentation](https://olake.io/docs/connectors) -![OLake destination setup UI for Apache Iceberg with AWS Glue catalog configuration form](/img/blog/2025/13/step-4.webp) +![OLake UI create destination screen for Apache Iceberg AWS Glue catalog configuration](/img/blog/2025/13/step-4.webp) ### Step 5: Create Replication Job and Configure Tables @@ -265,9 +265,9 @@ Once your source and destination connections are established, create and configu **Comprehensive partitioning strategies**: [Iceberg Partitioning Guide](https://olake.io/docs/writers/iceberg/partitioning) -![OLake create job UI selecting existing Postgres data source for pipeline setup](/img/blog/2025/13/step-5-1.webp) +![OLake create job wizard selecting MongoDB source from existing connectors](/img/blog/2025/13/step-5-1.webp) -![OLake stream selection step with Full Refresh + CDC sync for dz-stag-users table](/img/blog/2025/13/step-5-2.webp) +![OLake job stream selection UI picking tables and configuring CDC sync mode](/img/blog/2025/13/step-5-2.webp) ### Step 6: Execute Your MySQL to Iceberg Sync @@ -296,11 +296,11 @@ s3://your-bucket/ └── ... ``` -![OLake jobs dashboard with job actions, sync status, and source-destination mapping](/img/blog/2025/13/step-6-1.webp) +![OLake jobs dashboard showing active sync jobs, sources, destinations, and statuses](/img/blog/2025/13/step-6-1.webp) **Default File Formats**: OLake stores data files as Parquet format with metadata in JSON and Avro formats, following Apache Iceberg specifications for optimal query performance. -![Amazon S3 test-olake-pg bucket UI showing folders and object inventory options](/img/blog/2025/13/step-6-2.webp) +![Amazon S3 bucket olake_test_table UI showing data and metadata folders](/img/blog/2025/13/step-6-2.webp) **Data Organization**: Within the ".db" folder, you'll find tables synced from MySQL source. OLake normalizes column, table, and schema names to ensure compatibility with Glue catalog writing restrictions. @@ -326,7 +326,7 @@ Validate your MySQL to Iceberg migration by configuring AWS Athena for direct qu - Direct Iceberg table access with full metadata support - Integration with BI tools like QuickSight, Tableau, and Power BI -![Amazon Athena editor querying olake_test_table with SQL SELECT and results](/img/blog/2025/13/step-7.webp) +![Amazon Athena query editor showing SQL SELECT on olake_test_table with results](/img/blog/2025/13/step-7.webp) ## Production Best Practices for MySQL to Iceberg Replication diff --git a/blog/2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx b/blog/2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx index eb62a25f..ea0d3e4f 100644 --- a/blog/2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx +++ b/blog/2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx @@ -91,7 +91,7 @@ These challenges are why many DIY approaches get complicated quickly. Tools like ## Step-by-Step MongoDB to Iceberg Migration Workflow with OLake -![OLake architecture diagram with connectors between user, database, and lakehouse](/img/blog/2025/14/architecture.webp) +![MongoDB operational database to Apache Iceberg analytical lakehouse migration](/img/blog/2025/14/architecture.webp) ### How MongoDB to Iceberg Replication Works @@ -186,7 +186,7 @@ In the OLake UI, navigate to **Sources → Add Source → MongoDB**. OLake automatically optimizes data processing strategies for MongoDB, using efficient Change Streams processing for maximum performance during incremental sync operations. -![OLake UI create source configuration screen for MySQL connector and endpoint details](/img/blog/2025/14/step-3.webp) +![OLake platform setup source configuration UI for MongoDB connector](/img/blog/2025/14/step-3.webp) ### Step 4: Configure Apache Iceberg Destination (AWS Glue) @@ -206,7 +206,7 @@ Configure your Iceberg destination in the OLake UI for seamless lakehouse integr **Alternative Catalogs**: For REST catalogs (Lakekeeper, Polaris) and other options: [Catalog Configuration Documentation](https://olake.io/docs/connectors) -![OLake destination setup UI for Apache Iceberg with AWS Glue catalog configuration form](/img/blog/2025/14/step-4.webp) +![OLake UI create destination screen for Apache Iceberg AWS Glue catalog configuration](/img/blog/2025/14/step-4.webp) ### Step 5: Create Replication Job and Configure Collections @@ -218,9 +218,9 @@ Once your source and destination connections are established, create and configu 3. Select existing source and destination configurations 4. Choose collections/streams for Iceberg synchronization in schema section -![OLake create job UI selecting existing Postgres data source for pipeline setup](/img/blog/2025/14/step-5-1.webp) +![OLake create job wizard selecting MongoDB source from existing connectors](/img/blog/2025/14/step-5-1.webp) -![OLake stream selection step with Full Refresh + CDC sync for dz-stag-users table](/img/blog/2025/14/step-5-2.webp) +![OLake job stream selection UI picking tables and configuring CDC sync mode](/img/blog/2025/14/step-5-2.webp) **Sync Mode Options for each collection:** - **Full Refresh**: Complete data synchronization on every job execution @@ -244,7 +244,7 @@ After configuring and saving your replication job: - **Manual Trigger**: Use "Sync Now" for immediate execution - **Scheduled Execution**: Wait for automatic execution based on configured frequency - **Monitoring**: Track job progress, error handling, and performance metrics -![OLake jobs dashboard with job actions, sync status, and source-destination mapping](/img/blog/2025/14/step-6-1.webp) +![OLake jobs dashboard showing active sync jobs, sources, destinations, and statuses](/img/blog/2025/14/step-6-1.webp) **Important Considerations**: Ordering during initial full loads is not guaranteed. If data ordering is critical for downstream consumption, implement sorting requirements during query execution or downstream processing stages. @@ -259,7 +259,7 @@ Your MongoDB to Iceberg replication creates a structured hierarchy in S3 object **Data Organization**: Within the ".db" folder, you'll find collections synced from MongoDB source. OLake normalizes collection and field names to ensure compatibility with Glue catalog writing restrictions. -![Amazon S3 test-olake-pg bucket UI showing folders and object inventory options](/img/blog/2025/14/step-6-2.webp) +![Amazon S3 bucket olake_test_table UI showing data and metadata folders](/img/blog/2025/14/step-6-2.webp) **File Structure**: Each collection contains respective data and metadata files organized for efficient querying and maintenance operations. @@ -281,7 +281,7 @@ Validate your MongoDB to Iceberg migration by configuring AWS Athena for direct - Direct Iceberg table access with full metadata support - Integration with BI tools like QuickSight, Tableau, and Power BI -![Amazon Athena editor querying olake_test_table with SQL SELECT and results](/img/blog/2025/14/step-7.webp) +![Amazon Athena query editor showing SQL SELECT on olake_test_table with results](/img/blog/2025/14/step-7.webp) ## Production Best Practices for MongoDB to Iceberg Replication diff --git a/docs/connectors/mongodb/cdc_setup.mdx b/docs/connectors/mongodb/cdc_setup.mdx index 53831a02..2075ab90 100644 --- a/docs/connectors/mongodb/cdc_setup.mdx +++ b/docs/connectors/mongodb/cdc_setup.mdx @@ -1,5 +1,6 @@ --- -title: MongoDB and Atlas CDC Setup +title: "MongoDB & Atlas CDC Setup Guide | OLake Change Data Capture" +description: "Step-by-step guide to enable Change Data Capture on self-hosted MongoDB and Atlas. Includes replica set setup, user creation, and oplog verification." --- # MongoDB and Atlas CDC Setup diff --git a/docs/drafts/faqs.mdx b/docs/drafts/faqs.mdx index 537a3e41..0660d023 100644 --- a/docs/drafts/faqs.mdx +++ b/docs/drafts/faqs.mdx @@ -1,3 +1,7 @@ +--- +title: "Frequently Asked Questions - OLake Data Replication" +--- + ### FAQs diff --git a/docs/release/v0.1.0-v0.1.1.mdx b/docs/release/v0.1.0-v0.1.1.mdx index 06319d27..3a3aa22c 100644 --- a/docs/release/v0.1.0-v0.1.1.mdx +++ b/docs/release/v0.1.0-v0.1.1.mdx @@ -1,3 +1,8 @@ +--- +title: "OLake v0.1.0 - v0.1.1 Release Highlights & Bug Fixes" +description: "OLake v0.1.0-v0.1.1 adds MongoDB, Postgres, MySQL sources, Iceberg and Parquet writers, CDC sync mode, schema discovery improvements, and key bug fixes" +--- + # Olake (v0.1.0 – v0.1.1) June 13 – June 18, 2025 diff --git a/docs/release/v0.1.2-v0.1.5.mdx b/docs/release/v0.1.2-v0.1.5.mdx index f05aba67..3f5c92e2 100644 --- a/docs/release/v0.1.2-v0.1.5.mdx +++ b/docs/release/v0.1.2-v0.1.5.mdx @@ -1,3 +1,8 @@ +--- +title: "OLake v0.1.2 - v0.1.5 Release Notes & Key Updates" +description: "OLake v0.1.2-v0.1.5 introduces Oracle source connector, Unity Catalog support, telemetry via Segment IO & Mixpanel, config decryption, and Iceberg deduplication fix" +--- + # OLake (v0.1.2 – v0.1.5) June 26 – July 01, 2025 diff --git a/docs/release/v0.1.6-v0.1.8.mdx b/docs/release/v0.1.6-v0.1.8.mdx index 694896fa..2a2f6800 100644 --- a/docs/release/v0.1.6-v0.1.8.mdx +++ b/docs/release/v0.1.6-v0.1.8.mdx @@ -1,3 +1,8 @@ +--- +title: "OLake v0.1.6 - v0.1.8 Release Notes & Updates" +description: "OLake v0.1.6-v0.1.8 adds incremental MongoDB/Oracle sync, Oracle filter and chunking, MySQL binlog permission checks, and fixes Postgres CDC and discovery issues" +--- + # OLake (v0.1.6 – v0.1.8) July 17 – July 30, 2025 diff --git a/docs/release/v0.1.9-v0.1.11.mdx b/docs/release/v0.1.9-v0.1.11.mdx index f27105e7..e53e8b2d 100644 --- a/docs/release/v0.1.9-v0.1.11.mdx +++ b/docs/release/v0.1.9-v0.1.11.mdx @@ -1,3 +1,8 @@ +--- +title: "OLake v0.1.9 - v0.1.11 Release Notes & Features" +description: "OLake v0.1.9-v0.1.11 introduces MongoDB multi-cursor, incremental MySQL/Postgres sync, batch size consistency, relational DB normalization, and bug fixes" +--- + # OLake (v0.1.9 – v0.1.11) August 15 – August 27, 2025 diff --git a/docs/release/v0.2.0-v0.2.1.mdx b/docs/release/v0.2.0-v0.2.1.mdx index 9f3c9627..80bf828a 100644 --- a/docs/release/v0.2.0-v0.2.1.mdx +++ b/docs/release/v0.2.0-v0.2.1.mdx @@ -1,3 +1,8 @@ +--- +title: "OLake v0.2.0 - v0.2.1 Release Highlights & Bug Fixes" +description: "OLake v0.2.0-v0.2.1 introduces namespace normalization, spec commands, Java writer refactor, AWS IRSA fixes, and gRPC dependency resolutions for stability" +--- + # OLake (v0.2.0 – v0.2.1) August 15 – August 27, 2025 diff --git a/docs/release/v0.2.10.mdx b/docs/release/v0.2.10.mdx index 2cb99bdd..28b37146 100644 --- a/docs/release/v0.2.10.mdx +++ b/docs/release/v0.2.10.mdx @@ -1,3 +1,8 @@ +--- +title: "OLake v0.2.10 - v0.3.1 Release Notes: Kafka Connector" +description: "OLake v0.2.10-v0.3.1 introduces Kafka source connector for streaming data ingestion and replication to Apache Iceberg lakehouses." +--- + # OLake (v0.2.10 - v0.3.1) October 31 – November 12, 2025 diff --git a/docs/release/v0.2.2-v0.2.4.mdx b/docs/release/v0.2.2-v0.2.4.mdx index 9dea04ae..c15b6bf6 100644 --- a/docs/release/v0.2.2-v0.2.4.mdx +++ b/docs/release/v0.2.2-v0.2.4.mdx @@ -1,3 +1,8 @@ +--- +title: "OLake v0.2.2 - v0.2.4 Release Notes & Bug Fixes" +description: "OLake v0.2.2-v0.2.4 updates include custom DB preservation for new streams, MySQL empty table sync fix, and MongoDB _id type fallback improvements" +--- + # OLake (v0.2.2 – v0.2.4) September 16 – September 19, 2025 diff --git a/docs/release/v0.2.5-v0.2.7.mdx b/docs/release/v0.2.5-v0.2.7.mdx index 44e2291c..6ab788b2 100644 --- a/docs/release/v0.2.5-v0.2.7.mdx +++ b/docs/release/v0.2.5-v0.2.7.mdx @@ -1,3 +1,8 @@ +--- +title: "OLake v0.2.5 - v0.2.7 Release Notes: PgOutput & IAM Auth" +description: "OLake v0.2.5-v0.2.7 introduces PgOutput plugin for faster Postgres CDC, MongoDB IAM authentication, and enhanced normalization with improved performance." +--- + # OLake (v0.2.5 - v0.2.7) September 20 – October 11, 2025 diff --git a/docs/release/v0.2.8.mdx b/docs/release/v0.2.8.mdx index b0bb9484..9281519f 100644 --- a/docs/release/v0.2.8.mdx +++ b/docs/release/v0.2.8.mdx @@ -1,3 +1,8 @@ +--- +title: "OLake v0.2.8 - v0.2.9 Release Notes: Oracle Sync & Fixes" +description: "OLake v0.2.8-v0.2.9 adds sync progress tracking for Oracle, fixes MongoDB sharded clusters, improves MySQL CDC, and enhances Postgres normalization." +--- + # OLake (v0.2.8 - v0.2.9) October 11 – October 30, 2025 diff --git a/docs/shared/OlakeCTA.mdx b/docs/shared/OlakeCTA.mdx index e69de29b..74c11974 100644 --- a/docs/shared/OlakeCTA.mdx +++ b/docs/shared/OlakeCTA.mdx @@ -0,0 +1,5 @@ +--- +title: "Get Started with OLake - Data Replication Platform" +description: "Join OLake community and start replicating your databases to Apache Iceberg. Connect with data engineers building modern data lakehouses." +--- + diff --git a/docs/shared/commands/DockerDiscoverMongoDB.mdx b/docs/shared/commands/DockerDiscoverMongoDB.mdx index 4f1ec59c..8737fc08 100644 --- a/docs/shared/commands/DockerDiscoverMongoDB.mdx +++ b/docs/shared/commands/DockerDiscoverMongoDB.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Discover MongoDB Command - OLake Docker" +description: "Run OLake discover for MongoDB using Docker. Detect MongoDB collections, schemas, and sync capabilities before replication." +--- diff --git a/docs/shared/commands/DockerDiscoverMySQL.mdx b/docs/shared/commands/DockerDiscoverMySQL.mdx index 348157ab..137a0f64 100644 --- a/docs/shared/commands/DockerDiscoverMySQL.mdx +++ b/docs/shared/commands/DockerDiscoverMySQL.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Discover MySQL Command - OLake Docker" +description: "Run OLake discover command for MySQL using Docker. Detect MySQL tables, columns, and schemas before starting data replication." +--- diff --git a/docs/shared/commands/DockerDiscoverOracle.mdx b/docs/shared/commands/DockerDiscoverOracle.mdx index 2275704c..380e827c 100644 --- a/docs/shared/commands/DockerDiscoverOracle.mdx +++ b/docs/shared/commands/DockerDiscoverOracle.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Discover Oracle Command - OLake Docker" +description: "Run OLake discover command for Oracle Database using Docker. Detect available Oracle tables and schemas for replication setup." +--- diff --git a/docs/shared/commands/DockerDiscoverPostgres.mdx b/docs/shared/commands/DockerDiscoverPostgres.mdx index 83811e43..26bd3b86 100644 --- a/docs/shared/commands/DockerDiscoverPostgres.mdx +++ b/docs/shared/commands/DockerDiscoverPostgres.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Discover PostgreSQL Command - OLake Docker" +description: "Execute OLake discover for PostgreSQL using Docker. Identify available tables, schemas, and sync modes for Postgres replication." +--- diff --git a/docs/shared/commands/DockerSyncMongoDB.mdx b/docs/shared/commands/DockerSyncMongoDB.mdx index c6ada681..a5b81e22 100644 --- a/docs/shared/commands/DockerSyncMongoDB.mdx +++ b/docs/shared/commands/DockerSyncMongoDB.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Sync MongoDB Command - OLake Docker" +description: "Execute OLake sync for MongoDB using Docker. Replicate MongoDB collections to data lakes with containerized deployment." +--- diff --git a/docs/shared/commands/DockerSyncMySQL.mdx b/docs/shared/commands/DockerSyncMySQL.mdx index 2d2b577e..c90c02f8 100644 --- a/docs/shared/commands/DockerSyncMySQL.mdx +++ b/docs/shared/commands/DockerSyncMySQL.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Sync MySQL Command - OLake Docker" +description: "Run OLake sync for MySQL using Docker containers. Replicate MySQL databases to Apache Iceberg with binlog CDC support." +--- diff --git a/docs/shared/commands/DockerSyncOracle.mdx b/docs/shared/commands/DockerSyncOracle.mdx index 3ae55968..dc427756 100644 --- a/docs/shared/commands/DockerSyncOracle.mdx +++ b/docs/shared/commands/DockerSyncOracle.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Sync Oracle Command - OLake Docker" +description: "Execute OLake sync for Oracle Database using Docker. Replicate Oracle tables to data lakes with containerized OLake deployment." +--- diff --git a/docs/shared/commands/DockerSyncPostgres.mdx b/docs/shared/commands/DockerSyncPostgres.mdx index 5e3660c8..c98b54f9 100644 --- a/docs/shared/commands/DockerSyncPostgres.mdx +++ b/docs/shared/commands/DockerSyncPostgres.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Sync PostgreSQL Command - OLake Docker" +description: "Execute OLake sync for PostgreSQL using Docker. Replicate PostgreSQL tables to data lakes with containerized OLake setup." +--- diff --git a/docs/shared/commands/DockerSyncWithStateMongoDB.mdx b/docs/shared/commands/DockerSyncWithStateMongoDB.mdx index 4610deb8..f891e9da 100644 --- a/docs/shared/commands/DockerSyncWithStateMongoDB.mdx +++ b/docs/shared/commands/DockerSyncWithStateMongoDB.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Sync with State MongoDB - OLake Docker" +description: "Run OLake MongoDB sync with state management in Docker. Track oplog positions and maintain CDC state across container runs." +--- diff --git a/docs/shared/commands/DockerSyncWithStateMySQL.mdx b/docs/shared/commands/DockerSyncWithStateMySQL.mdx index bd03725f..deb44b0a 100644 --- a/docs/shared/commands/DockerSyncWithStateMySQL.mdx +++ b/docs/shared/commands/DockerSyncWithStateMySQL.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Sync with State MySQL - OLake Docker" +description: "Run OLake MySQL sync with state management in Docker. Track binlog positions and maintain CDC state for incremental updates." +--- diff --git a/docs/shared/commands/DockerSyncWithStatePostgres.mdx b/docs/shared/commands/DockerSyncWithStatePostgres.mdx index e72f6084..636dea41 100644 --- a/docs/shared/commands/DockerSyncWithStatePostgres.mdx +++ b/docs/shared/commands/DockerSyncWithStatePostgres.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Sync with State PostgreSQL - OLake Docker" +description: "Run OLake PostgreSQL sync with state management in Docker. Track incremental changes and maintain sync state across runs." +--- diff --git a/docs/shared/commands/LocalDiscoverMongoDB.mdx b/docs/shared/commands/LocalDiscoverMongoDB.mdx index 697cd584..f0a100a5 100644 --- a/docs/shared/commands/LocalDiscoverMongoDB.mdx +++ b/docs/shared/commands/LocalDiscoverMongoDB.mdx @@ -1,3 +1,7 @@ +--- +title: "Local Discover MongoDB Command - OLake CLI" +description: "Execute OLake discover locally for MongoDB. Identify collections, document schemas, and available sync modes for data replication." +--- diff --git a/docs/shared/commands/LocalDiscoverMySQL.mdx b/docs/shared/commands/LocalDiscoverMySQL.mdx index 7f37a0d8..dcec3b1c 100644 --- a/docs/shared/commands/LocalDiscoverMySQL.mdx +++ b/docs/shared/commands/LocalDiscoverMySQL.mdx @@ -1,4 +1,9 @@ +--- +title: "Local Discover MySQL Command - OLake CLI" +description: "Run OLake discover command locally for MySQL to identify available tables, columns, and supported sync modes for replication." +--- + diff --git a/docs/shared/commands/LocalDiscoverOracle.mdx b/docs/shared/commands/LocalDiscoverOracle.mdx index 8281d5fd..deebb2ac 100644 --- a/docs/shared/commands/LocalDiscoverOracle.mdx +++ b/docs/shared/commands/LocalDiscoverOracle.mdx @@ -1,3 +1,7 @@ +--- +title: "Local Discover Oracle Command - OLake CLI" +description: "Run OLake discover locally for Oracle Database. Detect available Oracle tables, columns, and schemas for data replication setup." +--- diff --git a/docs/shared/commands/LocalDiscoverPostgres.mdx b/docs/shared/commands/LocalDiscoverPostgres.mdx index e97d635c..48394632 100644 --- a/docs/shared/commands/LocalDiscoverPostgres.mdx +++ b/docs/shared/commands/LocalDiscoverPostgres.mdx @@ -1,3 +1,7 @@ +--- +title: "Local Discover PostgreSQL Command - OLake CLI" +description: "Run OLake discover command locally for PostgreSQL to detect available tables, schemas, and sync modes before data replication." +--- diff --git a/docs/shared/commands/LocalSyncMongoDB.mdx b/docs/shared/commands/LocalSyncMongoDB.mdx index 23a014cf..73a27496 100644 --- a/docs/shared/commands/LocalSyncMongoDB.mdx +++ b/docs/shared/commands/LocalSyncMongoDB.mdx @@ -1,4 +1,9 @@ +--- +title: "Local Sync MongoDB Command - OLake CLI" +description: "Execute OLake sync locally for MongoDB collections. Replicate MongoDB data to Apache Iceberg with CDC and incremental sync." +--- + diff --git a/docs/shared/commands/LocalSyncMySQL.mdx b/docs/shared/commands/LocalSyncMySQL.mdx index af5531c9..2f357440 100644 --- a/docs/shared/commands/LocalSyncMySQL.mdx +++ b/docs/shared/commands/LocalSyncMySQL.mdx @@ -1,3 +1,7 @@ +--- +title: "Local Sync MySQL Command - OLake CLI" +description: "Run OLake sync locally for MySQL databases. Replicate MySQL tables to Apache Iceberg with full refresh and CDC support." +--- diff --git a/docs/shared/commands/LocalSyncOracle.mdx b/docs/shared/commands/LocalSyncOracle.mdx index dd4a0e2b..8e55b03b 100644 --- a/docs/shared/commands/LocalSyncOracle.mdx +++ b/docs/shared/commands/LocalSyncOracle.mdx @@ -1,3 +1,7 @@ +--- +title: "Local Sync Oracle Command - OLake CLI" +description: "Run OLake sync locally for Oracle Database. Replicate Oracle tables to Apache Iceberg with full refresh and incremental sync." +--- diff --git a/docs/shared/commands/LocalSyncPostgres.mdx b/docs/shared/commands/LocalSyncPostgres.mdx index 4a40f747..c61bae40 100644 --- a/docs/shared/commands/LocalSyncPostgres.mdx +++ b/docs/shared/commands/LocalSyncPostgres.mdx @@ -1,4 +1,9 @@ +--- +title: "Local Sync PostgreSQL Command - OLake CLI" +description: "Run OLake sync command locally for PostgreSQL data replication. Sync database tables to Apache Iceberg or Parquet formats." +--- + diff --git a/docs/shared/commands/LocalSyncWithStateMongoDB.mdx b/docs/shared/commands/LocalSyncWithStateMongoDB.mdx index f41d7cac..9c90f8da 100644 --- a/docs/shared/commands/LocalSyncWithStateMongoDB.mdx +++ b/docs/shared/commands/LocalSyncWithStateMongoDB.mdx @@ -1,3 +1,8 @@ +--- +title: "Local Sync with State MongoDB Command - OLake CLI" +description: "Execute OLake sync with state management for MongoDB locally. Maintain sync state for incremental updates and CDC operations." +--- + diff --git a/docs/shared/commands/LocalSyncWithStateMySQL.mdx b/docs/shared/commands/LocalSyncWithStateMySQL.mdx index b7cd6115..20372c9a 100644 --- a/docs/shared/commands/LocalSyncWithStateMySQL.mdx +++ b/docs/shared/commands/LocalSyncWithStateMySQL.mdx @@ -1,3 +1,8 @@ +--- +title: "Local Sync with State MySQL - OLake CLI" +description: "Execute OLake MySQL sync with state management locally. Maintain binlog positions and track incremental sync progress." +--- + diff --git a/docs/shared/commands/LocalSyncWithStatePostgres.mdx b/docs/shared/commands/LocalSyncWithStatePostgres.mdx index c7feece1..9e66176a 100644 --- a/docs/shared/commands/LocalSyncWithStatePostgres.mdx +++ b/docs/shared/commands/LocalSyncWithStatePostgres.mdx @@ -1,3 +1,8 @@ +--- +title: "Local Sync with State PostgreSQL - OLake CLI" +description: "Run OLake PostgreSQL sync with state management locally. Track sync progress and enable incremental updates with state files." +--- + diff --git a/docs/shared/config/DockerParquetConfig.mdx b/docs/shared/config/DockerParquetConfig.mdx index b8b1852b..672b2ebf 100644 --- a/docs/shared/config/DockerParquetConfig.mdx +++ b/docs/shared/config/DockerParquetConfig.mdx @@ -1,3 +1,7 @@ +--- +title: "Docker Parquet Configuration - OLake Local Writer" +description: "Configure local Parquet writer for OLake Docker containers. Setup mounted volumes for Parquet file output in Docker environments." +--- ```json title="destination.json" { diff --git a/docs/shared/config/GlueIcebergWriterConfig.mdx b/docs/shared/config/GlueIcebergWriterConfig.mdx index 55b05074..2599aebf 100644 --- a/docs/shared/config/GlueIcebergWriterConfig.mdx +++ b/docs/shared/config/GlueIcebergWriterConfig.mdx @@ -1,3 +1,8 @@ +--- +title: "AWS Glue Catalog Iceberg Writer Configuration - OLake" +description: "Configure AWS Glue Data Catalog for OLake Iceberg writer. Setup Glue catalog, S3 storage, and IAM credentials for Iceberg tables." +--- + ```json title="destination.json" { "type": "ICEBERG", diff --git a/docs/shared/config/HiveIcebergWriterConfig.mdx b/docs/shared/config/HiveIcebergWriterConfig.mdx index 633e2d8a..6f998541 100644 --- a/docs/shared/config/HiveIcebergWriterConfig.mdx +++ b/docs/shared/config/HiveIcebergWriterConfig.mdx @@ -1,3 +1,8 @@ +--- +title: "Hive Metastore Iceberg Writer Configuration - OLake" +description: "Configure Apache Hive Metastore catalog for OLake Iceberg writer. Setup Hive URI, S3 storage, and catalog parameters for Iceberg tables." +--- + ```json title="destination.json" { "type": "ICEBERG", diff --git a/docs/shared/config/MinioJDBCIcebergWriterConfigLocal.mdx b/docs/shared/config/MinioJDBCIcebergWriterConfigLocal.mdx index 0d1be224..1f8f58bb 100644 --- a/docs/shared/config/MinioJDBCIcebergWriterConfigLocal.mdx +++ b/docs/shared/config/MinioJDBCIcebergWriterConfigLocal.mdx @@ -1,3 +1,8 @@ +--- +title: "MinIO JDBC Catalog Configuration - OLake Local Setup" +description: "Configure MinIO with JDBC catalog for local OLake Iceberg development. Setup PostgreSQL JDBC catalog with MinIO object storage." +--- + ```json title="destination.json" { "type": "ICEBERG", diff --git a/docs/shared/config/MongoDBSourceConfig.mdx b/docs/shared/config/MongoDBSourceConfig.mdx index d405a908..34426f77 100644 --- a/docs/shared/config/MongoDBSourceConfig.mdx +++ b/docs/shared/config/MongoDBSourceConfig.mdx @@ -1,3 +1,7 @@ +--- +title: "MongoDB Source Configuration - OLake Config" +description: "Configure MongoDB source connection for OLake. Setup hosts, replica sets, authentication, and connection parameters for data replication." +--- ```json title="OLAKE_DIRECTORY/source.json" { diff --git a/docs/shared/config/MongoDBSourceConfigWithSRV.mdx b/docs/shared/config/MongoDBSourceConfigWithSRV.mdx index fd520618..50e93c7b 100644 --- a/docs/shared/config/MongoDBSourceConfigWithSRV.mdx +++ b/docs/shared/config/MongoDBSourceConfigWithSRV.mdx @@ -1,3 +1,7 @@ +--- +title: "MongoDB Source Configuration with SRV - OLake Config" +description: "Configure MongoDB source connection with SRV DNS records for OLake. Setup replica sets and MongoDB Atlas connections securely." +--- ```json title="OLAKE_DIRECTORY/source.json" { diff --git a/docs/shared/config/MySQLSourceConfig.mdx b/docs/shared/config/MySQLSourceConfig.mdx index 601e65f0..ce764fc3 100644 --- a/docs/shared/config/MySQLSourceConfig.mdx +++ b/docs/shared/config/MySQLSourceConfig.mdx @@ -1,3 +1,8 @@ +--- +title: "MySQL Source Configuration - OLake Config" +description: "Configure MySQL source connection for OLake. Setup binlog CDC, SSH tunneling, TLS, and connection parameters for MySQL replication." +--- + ````json title="source.json" { "hosts": "mysql-host", diff --git a/docs/shared/config/OracleSourceConfig.mdx b/docs/shared/config/OracleSourceConfig.mdx index 2fc2d80e..5ad528db 100644 --- a/docs/shared/config/OracleSourceConfig.mdx +++ b/docs/shared/config/OracleSourceConfig.mdx @@ -1,3 +1,7 @@ +--- +title: "Oracle Database Source Configuration - OLake Config" +description: "Configure Oracle Database source connection for OLake. Setup service name, SID, port, and SSL parameters for Oracle data replication." +--- ```json title="source.json" { diff --git a/docs/shared/config/PostgresSourceConfig.mdx b/docs/shared/config/PostgresSourceConfig.mdx index 05633c0a..71d05135 100644 --- a/docs/shared/config/PostgresSourceConfig.mdx +++ b/docs/shared/config/PostgresSourceConfig.mdx @@ -1,3 +1,8 @@ +--- +title: "PostgreSQL Source Configuration - OLake Config" +description: "Configure PostgreSQL source connection for OLake. Setup logical replication, CDC, SSH tunneling, and connection parameters." +--- + ```json title="source.json" { "host": "localhost", diff --git a/docs/shared/config/RESTIcebergWriterConfig.mdx b/docs/shared/config/RESTIcebergWriterConfig.mdx index 815da91b..db9563a3 100644 --- a/docs/shared/config/RESTIcebergWriterConfig.mdx +++ b/docs/shared/config/RESTIcebergWriterConfig.mdx @@ -1,3 +1,8 @@ +--- +title: "REST Catalog Iceberg Writer Configuration - OLake" +description: "Configure REST catalog for OLake Iceberg writer. Setup Polaris, Nessie, Unity Catalog, or custom REST endpoints for Apache Iceberg." +--- + ```json title="destination.json" { "type": "ICEBERG", diff --git a/docs/shared/config/S3Config.mdx b/docs/shared/config/S3Config.mdx index 4a311237..c3c14b76 100644 --- a/docs/shared/config/S3Config.mdx +++ b/docs/shared/config/S3Config.mdx @@ -1,3 +1,7 @@ +--- +title: "AWS S3 Destination Configuration - OLake Parquet Writer" +description: "Configure AWS S3 as destination for OLake Parquet writer. Set up S3 bucket, region, access keys for data lake storage." +--- ```json title="destination.json" { diff --git a/docs/shared/config/S3ConfigGCS.mdx b/docs/shared/config/S3ConfigGCS.mdx index 6f6be54c..96f37718 100644 --- a/docs/shared/config/S3ConfigGCS.mdx +++ b/docs/shared/config/S3ConfigGCS.mdx @@ -1,3 +1,7 @@ +--- +title: "Google Cloud Storage (GCS) Configuration - OLake Parquet Writer" +description: "Configure Google Cloud Storage as destination for OLake Parquet writer using S3-compatible API. Setup GCS bucket and credentials." +--- ```json title="destination.json" { diff --git a/docs/shared/config/S3ConfigMinIO.mdx b/docs/shared/config/S3ConfigMinIO.mdx index 2d1d9ad6..ef6f6811 100644 --- a/docs/shared/config/S3ConfigMinIO.mdx +++ b/docs/shared/config/S3ConfigMinIO.mdx @@ -1,3 +1,8 @@ +--- +title: "MinIO Configuration for OLake - S3-Compatible Storage" +description: "Configure MinIO as S3-compatible destination for OLake Parquet writer. Setup local object storage for development and testing." +--- + ```json title="destination.json" { "type": "PARQUET", diff --git a/docs/shared/streams/SelectedStreamsOnly.mdx b/docs/shared/streams/SelectedStreamsOnly.mdx index b847b103..b7bd6cf2 100644 --- a/docs/shared/streams/SelectedStreamsOnly.mdx +++ b/docs/shared/streams/SelectedStreamsOnly.mdx @@ -1,3 +1,8 @@ +--- +title: "Selected Streams Configuration - OLake Stream Selection" +description: "Configure selected streams for OLake data replication. Define specific tables, partitioning, normalization, and filters for sync." +--- + ```json title="streams.json" "selected_streams": { "my_db": [ diff --git a/docs/shared/streams/StreamsFull.mdx b/docs/shared/streams/StreamsFull.mdx index a74e2149..2d18efe5 100644 --- a/docs/shared/streams/StreamsFull.mdx +++ b/docs/shared/streams/StreamsFull.mdx @@ -1,3 +1,8 @@ +--- +title: "Full Streams Configuration with Partitioning - OLake" +description: "Configure complete OLake streams with advanced partitioning, chunking, normalization, and sync mode settings for data replication." +--- + ```json title="OLAKE_DIRECTORY/streams.json" { "selected_streams": { diff --git a/docs/shared/streams/StreamsOnly.mdx b/docs/shared/streams/StreamsOnly.mdx index 492b8442..e3521def 100644 --- a/docs/shared/streams/StreamsOnly.mdx +++ b/docs/shared/streams/StreamsOnly.mdx @@ -1,3 +1,8 @@ +--- +title: "OLake Streams Configuration - Stream Definition" +description: "Configure OLake stream definitions for data replication. Define schemas, sync modes, cursor fields, and primary keys for tables." +--- + ```json title="streams.json" { "stream": { diff --git a/docusaurus.config.js b/docusaurus.config.js index af58dcec..ac56e034 100644 --- a/docusaurus.config.js +++ b/docusaurus.config.js @@ -956,10 +956,7 @@ const config = { // Note: Query parameter URLs (e.g., /docs/features?tab=schema) are handled // via canonical tags in the theme files, not through redirects - { - to: 'https://join.slack.com/t/getolake/shared_invite/zt-2uyphqf69-KQxih9Gwd4GCQRD_XFcuyw', - from: '/slack', - }, + // /slack redirect is now handled by src/pages/slack.tsx with proper SEO meta tags { to: 'https://github.com/datazip-inc/olake', from: '/github', diff --git a/src/components/Iceberg/QueryEngineLayout.tsx b/src/components/Iceberg/QueryEngineLayout.tsx index 6c0d0ba7..9e1a3eb4 100644 --- a/src/components/Iceberg/QueryEngineLayout.tsx +++ b/src/components/Iceberg/QueryEngineLayout.tsx @@ -86,9 +86,9 @@ export const QueryEngineLayout: React.FC = ({
-

+

{title} -

+

{description}

diff --git a/src/components/site/Glace.tsx b/src/components/site/Glace.tsx index 6c16cde2..7db59e4e 100644 --- a/src/components/site/Glace.tsx +++ b/src/components/site/Glace.tsx @@ -96,9 +96,9 @@ const Glace: React.FC = ({
{/* Main Heading */} -

+

Migrate to Iceberg In Days, Not Months -

+ {/* Subtitle */}

diff --git a/src/pages/about-us.jsx b/src/pages/about-us.jsx index f611d08c..f6d68330 100644 --- a/src/pages/about-us.jsx +++ b/src/pages/about-us.jsx @@ -34,9 +34,9 @@ const AboutTeam = () => {

{/* About Us Section */}
-

- About Us -

+

+ About OLake - Fastest Open Source Data Replication Tool +

OLake is the fastest open-source, Iceberg-first EL engine that removes the pain of brittle scripts and one-off pipelines. We make "database → Apache Iceberg" simple, fast, and observable—with recent benchmarks showing up to 500× faster ingest than common alternatives—so your team can stop handling connectors and start focusing on models, products, and impact.

diff --git a/src/pages/slack.tsx b/src/pages/slack.tsx new file mode 100644 index 00000000..475863c4 --- /dev/null +++ b/src/pages/slack.tsx @@ -0,0 +1,36 @@ +import React, { useEffect } from 'react'; +import Head from '@docusaurus/Head'; + +export default function SlackRedirect() { + useEffect(() => { + window.location.href = 'https://join.slack.com/t/getolake/shared_invite/zt-2uyphqf69-KQxih9Gwd4GCQRD_XFcuyw'; + }, []); + + return ( + <> + + Join OLake Community on Slack - Connect with Data Engineers + + + + + + +
+

Redirecting to OLake Slack Community...

+

If you're not redirected automatically, click here.

+
+ + ); +} +