Internal linking #241

Akshay-datazip · 2025-10-31T17:48:19Z

Followed the internal back linking seo audit report by seobyte

Content Audit Improvements: Blog Posts (3 files): - Add Key Takeaways sections (5 bullet points each) for quick scanning - Add comprehensive FAQ sections (10 questions per blog) - Add 34+ internal links to related OLake documentation - Improve scannability with bullet points and shorter paragraphs Query Engine Pages (6 files): - Add 24 real-world use case examples with concrete scenarios - Improve descriptions for better clarity and readability - Add plain-language summaries for technical concepts - Add intro paragraphs before feature matrices Files Modified: - blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx - blog/2025-09-09-mysql-to-apache-iceberg-replication.mdx - blog/2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx - docs-iceberg-query-engine/bigquery.mdx - docs-iceberg-query-engine/databricks.mdx - docs-iceberg-query-engine/snowflake.mdx - docs-iceberg-query-engine/starburst.mdx - docs-iceberg-query-engine/starrocks.mdx - docs-iceberg-query-engine/trino.mdx Target: Improve Flesch Reading Ease scores (21-30 → more accessible) Result: More scannable, practical, and user-friendly content

- Add Learn More buttons to Iceberg feature cards (Schema evolution, Schema datatype changes, Partitioning) - Make Why OLake feature cards clickable (Faster Resumable Full Load, Schema-Aware Logs, CDC Cursor Preservation, near real-time latency) - Add Quickstart Guide link in FAQ - Add internal links in docs/intro.mdx (parallelized chunking, binlogs, oplogs, Apache Iceberg) - Add internal links in docs/features/index.mdx (Parallelised Chunking, Schema Evolution, Iceberg partitioning) - Add internal links in docs/core/use-cases.mdx (open-source data stack, log-based CDC, ML feature stores) - Add internal links in docs/benchmarks.mdx (PostgreSQL to Apache Iceberg, MongoDB)

- docs/benchmarks.mdx: Link CDC, Debezium, PostgreSQL - docs/getting-started/quickstart.mdx: Link OLake UI to architecture - docs/getting-started/playground.mdx: Link Apache Iceberg, Presto, Docker Compose - docs/install/olake-ui/index.mdx: Link Create Jobs tutorial - docs/getting-started/creating-first-pipeline.mdx: Link OLake UI, Postgres, Apache Iceberg

- docs/features/index.mdx: Link Sync Modes - docs/core/architecture.mdx: Link chunking strategies, concurrency models, state management - docs/core/use-cases.mdx: Link Apache Iceberg, Iceberg lakehouse

- docs/understanding/compatibility-catalogs.mdx: Link REST catalog, Hive Metastore, JDBC Catalog - docs/understanding/compatibility-engines.mdx: Link Presto guide

- docs/connectors/postgres: Link Full Refresh + CDC, CDC Only - docs/connectors/mysql: Link CDC Only, Full Refresh + Incremental, Binary Logging - docs/connectors/mongodb: Link CDC Only, oplog, Change Data Capture (CDC)

- blog/how-to-set-up-postgres-apache-iceberg: Link Apache Iceberg, AWS Glue Catalog, Quick Start Installation, Trino

- blog/how-to-set-up-mongodb-apache-iceberg: Link Apache Iceberg, oplog, Athena - blog/mysql-apache-iceberg-replication: Link Apache Iceberg, binlog, Trino

- blog/apache-hive-vs-iceberg-comparison: Link Partitioning, schema evolution, time travel, open table format - blog/binlogs: Link MySQL - blog/creating-job-olake-docker-cli: Link Postgres to Apache Iceberg - blog/iceberg-metadata: Link Hive Metastore Catalog, AWS Glue Catalog, REST Catalog, partitioning - blog/debezium-vs-olake: Link CDC - blog/json-vs-bson-vs-jsonb: Link PostgreSQL, MongoDB Progress: ~12 blog posts completed with 40+ internal links added

- blog/apache-iceberg-vs-delta-lake-guide: Link Deletion Vectors, Apache Polaris - blog/building-open-data-lakehouse: Link PrestoDB, Iceberg REST catalog - blog/apache-polaris-lakehouse: Link Trino, Time travel Progress: All major blog posts completed with 60+ internal links added

- docs-iceberg-query-engine/trino: Link hive_metastore, glue, snowflake, Time Travel, distributed SQL - docs-iceberg-query-engine/presto: Link Hive Metastore, AWS Glue, AS OF TIMESTAMP, Distributed SQL - docs-iceberg-query-engine/athena: Link Amazon Athena, AWS Glue Data Catalog, INSERT/UPDATE/DELETE/MERGE - docs-iceberg-query-engine/spark: Link Hive Metastore, AWS Glue, Timestamp-based Query - docs-iceberg-query-engine/flink: Link Hive Metastore, AWS Glue, CDC to Iceberg Progress: 5 major query engine docs completed with 20+ internal links

…stone) - docs-iceberg-query-engine/hive: Link REST/Nessie, Traditional data warehouse, MERGE - docs-iceberg-query-engine/duckdb: Link REST catalog - docs-iceberg-query-engine/clickhouse: Link Time-travel - docs-iceberg-query-engine/bigquery: Link FOR SYSTEM_TIME AS OF, BigLake external - docs-iceberg-query-engine/snowflake: Link UniForm, Trino - docs-iceberg-query-engine/databricks: Link UniForm, time-travel - docs-iceberg-query-engine/dreamio: Link Time Travel - docs-iceberg-query-engine/starrocks: Link Time Travel, materialized view - docs-iceberg-query-engine/impala: Link Hive Metastore, position deletes Progress: All 15 query engine docs completed with 35+ internal links

- iceberg/olake-iceberg-trino: Link Trino, AWS Glue Data Catalog, trino/athena/spark - iceberg/olake-iceberg-athena: Link Amazon Athena, Trino Progress: ALL internal linking tasks completed! Summary: - 8 commits with 100+ internal links added - Homepage & core components (4 files) - Documentation pages (15 files) - Connectors (3 files) - Blog posts (12 files) - Query engine docs (15 files) - Iceberg integration docs (2 files) Total: 51 files updated with comprehensive internal linking

deepanshupal09-datazip

There are a lot of 404 urls in this PR, I have not mentioned all of them... but there are many of them...

Once check with AI to find all the 404 urls and replace or remove them

Here is the list that I was able to find out with the help of AI, please check this once:

Mapping: 404 links -> files containing the link
- ../jobs/create-jobs (resolved to https://olake.io/jobs/create-jobs — 404)
  - olake-ui.mdx
  - index.mdx
  - local.mdx
  - local.mdx

- /blog/iceberg-metadata (resolved to https://olake.io/blog/iceberg-metadata — 404)
  - 2025-09-15-apache-hive-vs-apache-iceberg-comparison.mdx
  - compatibility-catalogs.mdx
  - index.mdx

- http://localhost:8000 (local dev link, unreachable from CI / public check)
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
  - creating-first-pipeline.mdx
  - olake-ui.mdx
  - playground.mdx
  - quickstart.mdx
  - docs/playground/olake-iceberg-presto.mdx
  - kubernetes.mdx
  - 2025-08-29-deploying-olake-on-kubernetes.mdx
  - index.mdx
  - offline-environments.mdx
  - blog/2025-10-01-building-complete-open-data-lakehouse-from-scratch.mdx
  - local.mdx
  - local.mdx
  - docs/writers/azure-adls/overview.mdx

- /docs/connectors/glue-catalog (resolved to https://olake.io/docs/connectors/glue-catalog — 404)
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx (duplicate)

- /docs/intro (resolved to https://olake.io/docs/intro — 404)
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
  - kubernetes.mdx
  - starrocks.mdx
  - src/pages/index.js
  - docusaurus.config.ts
  - src/pages/index1.tsx

- /docs/understanding/cdc (resolved to https://olake.io/docs/understanding/cdc — 404)
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx

- /docs/understanding/iceberg-partitioning (resolved to https://olake.io/docs/understanding/iceberg-partitioning — 404)
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx

- /docs/understanding/schema-evolution (resolved to https://olake.io/docs/understanding/schema-evolution — 404)
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx

- /docs/writers/iceberg/catalog/intro (resolved to https://olake.io/docs/writers/iceberg/catalog/intro — 404)
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx

- /iceberg/intro (resolved to https://olake.io/iceberg/intro — 404)
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx

- /iceberg/query-engine/intro (resolved to https://olake.io/iceberg/query-engine/intro — 404)
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx

docs-iceberg-query-engine/athena.mdx

docs/features/index.mdx

docs/getting-started/creating-first-pipeline.mdx

docs/install/olake-ui/index.mdx

docs/understanding/compatibility-catalogs.mdx

blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx

blog/2025-09-09-mysql-to-apache-iceberg-replication.mdx

blog/2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx

blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx

blog/2025-09-09-mysql-to-apache-iceberg-replication.mdx

- Fix /blog/iceberg-metadata -> /blog/2025/10/03/iceberg-metadata (3 files) - Remove /docs/intro links (not in CSV) - Remove /iceberg/query-engine/intro links (not in CSV) - Remove /docs/understanding/cdc links (not in CSV) - Remove /docs/understanding/iceberg-partitioning links (not in CSV) - Remove /docs/understanding/schema-evolution links (not in CSV) - Remove /docs/writers/iceberg/catalog/intro links (not in CSV) - Remove /docs/connectors/glue-catalog links (not in CSV) All changes now strictly follow the CSV backlinking document

…ption - Remove markdown link from description property (plain string, not MDX) - Link remains in title where it will properly render

- Added back markdown link as per CSV line 89 - Other query engine files use markdown links in descriptions - Link: AWS Glue Data Catalog -> /iceberg/olake-iceberg-athena

- apache-polaris-lakehouse: Link 'Trino' in heading (CSV line 122) - olake-iceberg-athena: Link 'AWS Glue Data Catalog' in intro blurb (CSV line 89) - iceberg-metadata: Link 'partitioning' in Best Practices (CSV line 74) All anchor texts from CSV now properly linked in correct sections

- Add internal link 'Trino' -> /iceberg/olake-iceberg-trino in Resources section - This is an internal OLake blog post, not external documentation

…tion pages

deepanshupal09-datazip

Now, all the links are working at least... Two things I wanna highlight:

I noticed that we are adding or changing the content in blogs and iceberg query engines, I've marked few instances... is this asked by the SEOByte team?
You’ve added links where the text is being passed as a plain string, so they’re rendering as text instead of clickable links. I’ve pointed out a few examples, but not all. Please fix this everywhere.

src/pages/index.jsx

blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx

docs-iceberg-query-engine/athena.mdx

docs-iceberg-query-engine/bigquery.mdx

docs-iceberg-query-engine/dreamio.mdx

docs-iceberg-query-engine/spark.mdx

…avel links, fix catalog URLs

docs-iceberg-query-engine/athena.mdx

docs-iceberg-query-engine/presto.mdx

- Created .blue-link CSS class to replace repeated className attributes - Replaced all instances of long blue link className with blue-link class - Fixed white link in presto.mdx description with inline styles - Addresses PR review comments for code maintainability

…eData - Converted [Hive] and [AWS Glue] markdown links to JSX Link components in Flink tableData - Converted [Hive Metastore] and [AWS Glue] markdown links to JSX Link components in Spark tableData - Added Link import to spark.mdx - Links now properly render as clickable elements with blue-link styling

Akshay-datazip added 13 commits October 31, 2025 22:23

Add additional internal linking - Part 2

10bb244

- docs/features/index.mdx: Link Sync Modes - docs/core/architecture.mdx: Link chunking strategies, concurrency models, state management - docs/core/use-cases.mdx: Link Apache Iceberg, Iceberg lakehouse

Add additional internal linking - Part 3

3e8bf22

- docs/understanding/compatibility-catalogs.mdx: Link REST catalog, Hive Metastore, JDBC Catalog - docs/understanding/compatibility-engines.mdx: Link Presto guide

Add internal linking - Part 4: Connectors

1b9736b

- docs/connectors/postgres: Link Full Refresh + CDC, CDC Only - docs/connectors/mysql: Link CDC Only, Full Refresh + Incremental, Binary Logging - docs/connectors/mongodb: Link CDC Only, oplog, Change Data Capture (CDC)

Add internal linking - Part 5: PostgreSQL blog post

6322c28

- blog/how-to-set-up-postgres-apache-iceberg: Link Apache Iceberg, AWS Glue Catalog, Quick Start Installation, Trino

Add internal linking - Part 6: MongoDB & MySQL blog posts

b01eb0b

- blog/how-to-set-up-mongodb-apache-iceberg: Link Apache Iceberg, oplog, Athena - blog/mysql-apache-iceberg-replication: Link Apache Iceberg, binlog, Trino

deepanshupal09-datazip requested changes Nov 5, 2025

View reviewed changes

Akshay-datazip added 8 commits November 5, 2025 18:24

Fix athena.mdx: Remove invalid markdown link from plain string descri…

880b838

…ption - Remove markdown link from description property (plain string, not MDX) - Link remains in title where it will properly render

Restore AWS Glue Data Catalog link in athena.mdx description

dec5159

- Added back markdown link as per CSV line 89 - Other query engine files use markdown links in descriptions - Link: AWS Glue Data Catalog -> /iceberg/olake-iceberg-athena

Add Trino link in Resources section (CSV line 83)

dfb6da8

- Add internal link 'Trino' -> /iceberg/olake-iceberg-trino in Resources section - This is an internal OLake blog post, not external documentation

updated backlinks with the new doc

e0feef8

Merge branch 'master' into internal-linking

5344568

Fix all 404 internal links - update broken paths to correct documenta…

767614a

…tion pages

deepanshupal09-datazip reviewed Nov 13, 2025

View reviewed changes

Fix internal links: update colors, remove unwanted links, add time tr…

ac79d49

…avel links, fix catalog URLs

deepanshupal09-datazip reviewed Nov 19, 2025

View reviewed changes

docs-iceberg-query-engine/athena.mdx Outdated Show resolved Hide resolved

docs-iceberg-query-engine/presto.mdx Outdated Show resolved Hide resolved

deepanshupal09-datazip previously approved these changes Nov 20, 2025

View reviewed changes

Akshay-datazip dismissed deepanshupal09-datazip’s stale review via f60bcf5 November 20, 2025 07:41

deepanshupal09-datazip approved these changes Nov 20, 2025

View reviewed changes

Akshay-datazip merged commit 76c1504 into master Nov 20, 2025
2 checks passed

Internal linking #241

Internal linking #241

Uh oh!

Conversation

Akshay-datazip commented Oct 31, 2025

Uh oh!

deepanshupal09-datazip left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deepanshupal09-datazip left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants