Skip to content

Conversation

@Akshay-datazip
Copy link
Collaborator

Followed the internal back linking seo audit report by seobyte

Content Audit Improvements:

Blog Posts (3 files):
- Add Key Takeaways sections (5 bullet points each) for quick scanning
- Add comprehensive FAQ sections (10 questions per blog)
- Add 34+ internal links to related OLake documentation
- Improve scannability with bullet points and shorter paragraphs

Query Engine Pages (6 files):
- Add 24 real-world use case examples with concrete scenarios
- Improve descriptions for better clarity and readability
- Add plain-language summaries for technical concepts
- Add intro paragraphs before feature matrices

Files Modified:
- blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
- blog/2025-09-09-mysql-to-apache-iceberg-replication.mdx
- blog/2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
- docs-iceberg-query-engine/bigquery.mdx
- docs-iceberg-query-engine/databricks.mdx
- docs-iceberg-query-engine/snowflake.mdx
- docs-iceberg-query-engine/starburst.mdx
- docs-iceberg-query-engine/starrocks.mdx
- docs-iceberg-query-engine/trino.mdx

Target: Improve Flesch Reading Ease scores (21-30 → more accessible)
Result: More scannable, practical, and user-friendly content
- Add Learn More buttons to Iceberg feature cards (Schema evolution, Schema datatype changes, Partitioning)
- Make Why OLake feature cards clickable (Faster Resumable Full Load, Schema-Aware Logs, CDC Cursor Preservation, near real-time latency)
- Add Quickstart Guide link in FAQ
- Add internal links in docs/intro.mdx (parallelized chunking, binlogs, oplogs, Apache Iceberg)
- Add internal links in docs/features/index.mdx (Parallelised Chunking, Schema Evolution, Iceberg partitioning)
- Add internal links in docs/core/use-cases.mdx (open-source data stack, log-based CDC, ML feature stores)
- Add internal links in docs/benchmarks.mdx (PostgreSQL to Apache Iceberg, MongoDB)
- docs/benchmarks.mdx: Link CDC, Debezium, PostgreSQL
- docs/getting-started/quickstart.mdx: Link OLake UI to architecture
- docs/getting-started/playground.mdx: Link Apache Iceberg, Presto, Docker Compose
- docs/install/olake-ui/index.mdx: Link Create Jobs tutorial
- docs/getting-started/creating-first-pipeline.mdx: Link OLake UI, Postgres, Apache Iceberg
- docs/features/index.mdx: Link Sync Modes
- docs/core/architecture.mdx: Link chunking strategies, concurrency models, state management
- docs/core/use-cases.mdx: Link Apache Iceberg, Iceberg lakehouse
- docs/understanding/compatibility-catalogs.mdx: Link REST catalog, Hive Metastore, JDBC Catalog
- docs/understanding/compatibility-engines.mdx: Link Presto guide
- docs/connectors/postgres: Link Full Refresh + CDC, CDC Only
- docs/connectors/mysql: Link CDC Only, Full Refresh + Incremental, Binary Logging
- docs/connectors/mongodb: Link CDC Only, oplog, Change Data Capture (CDC)
- blog/how-to-set-up-postgres-apache-iceberg: Link Apache Iceberg, AWS Glue Catalog, Quick Start Installation, Trino
- blog/how-to-set-up-mongodb-apache-iceberg: Link Apache Iceberg, oplog, Athena
- blog/mysql-apache-iceberg-replication: Link Apache Iceberg, binlog, Trino
- blog/apache-hive-vs-iceberg-comparison: Link Partitioning, schema evolution, time travel, open table format
- blog/binlogs: Link MySQL
- blog/creating-job-olake-docker-cli: Link Postgres to Apache Iceberg
- blog/iceberg-metadata: Link Hive Metastore Catalog, AWS Glue Catalog, REST Catalog, partitioning
- blog/debezium-vs-olake: Link CDC
- blog/json-vs-bson-vs-jsonb: Link PostgreSQL, MongoDB

Progress: ~12 blog posts completed with 40+ internal links added
- blog/apache-iceberg-vs-delta-lake-guide: Link Deletion Vectors, Apache Polaris
- blog/building-open-data-lakehouse: Link PrestoDB, Iceberg REST catalog
- blog/apache-polaris-lakehouse: Link Trino, Time travel

Progress: All major blog posts completed with 60+ internal links added
- docs-iceberg-query-engine/trino: Link hive_metastore, glue, snowflake, Time Travel, distributed SQL
- docs-iceberg-query-engine/presto: Link Hive Metastore, AWS Glue, AS OF TIMESTAMP, Distributed SQL
- docs-iceberg-query-engine/athena: Link Amazon Athena, AWS Glue Data Catalog, INSERT/UPDATE/DELETE/MERGE
- docs-iceberg-query-engine/spark: Link Hive Metastore, AWS Glue, Timestamp-based Query
- docs-iceberg-query-engine/flink: Link Hive Metastore, AWS Glue, CDC to Iceberg

Progress: 5 major query engine docs completed with 20+ internal links
…stone)

- docs-iceberg-query-engine/hive: Link REST/Nessie, Traditional data warehouse, MERGE
- docs-iceberg-query-engine/duckdb: Link REST catalog
- docs-iceberg-query-engine/clickhouse: Link Time-travel
- docs-iceberg-query-engine/bigquery: Link FOR SYSTEM_TIME AS OF, BigLake external
- docs-iceberg-query-engine/snowflake: Link UniForm, Trino
- docs-iceberg-query-engine/databricks: Link UniForm, time-travel
- docs-iceberg-query-engine/dreamio: Link Time Travel
- docs-iceberg-query-engine/starrocks: Link Time Travel, materialized view
- docs-iceberg-query-engine/impala: Link Hive Metastore, position deletes

Progress: All 15 query engine docs completed with 35+ internal links
- iceberg/olake-iceberg-trino: Link Trino, AWS Glue Data Catalog, trino/athena/spark
- iceberg/olake-iceberg-athena: Link Amazon Athena, Trino

Progress: ALL internal linking tasks completed!
Summary:
- 8 commits with 100+ internal links added
- Homepage & core components (4 files)
- Documentation pages (15 files)
- Connectors (3 files)
- Blog posts (12 files)
- Query engine docs (15 files)
- Iceberg integration docs (2 files)

Total: 51 files updated with comprehensive internal linking
Copy link
Collaborator

@deepanshupal09-datazip deepanshupal09-datazip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of 404 urls in this PR, I have not mentioned all of them... but there are many of them...

Once check with AI to find all the 404 urls and replace or remove them

Here is the list that I was able to find out with the help of AI, please check this once:

Mapping: 404 links -> files containing the link
- ../jobs/create-jobs (resolved to https://olake.io/jobs/create-jobs — 404)
  - olake-ui.mdx
  - index.mdx
  - local.mdx
  - local.mdx

- /blog/iceberg-metadata (resolved to https://olake.io/blog/iceberg-metadata — 404)
  - 2025-09-15-apache-hive-vs-apache-iceberg-comparison.mdx
  - compatibility-catalogs.mdx
  - index.mdx

- http://localhost:8000 (local dev link, unreachable from CI / public check)
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
  - creating-first-pipeline.mdx
  - olake-ui.mdx
  - playground.mdx
  - quickstart.mdx
  - docs/playground/olake-iceberg-presto.mdx
  - kubernetes.mdx
  - 2025-08-29-deploying-olake-on-kubernetes.mdx
  - index.mdx
  - offline-environments.mdx
  - blog/2025-10-01-building-complete-open-data-lakehouse-from-scratch.mdx
  - local.mdx
  - local.mdx
  - docs/writers/azure-adls/overview.mdx

- /docs/connectors/glue-catalog (resolved to https://olake.io/docs/connectors/glue-catalog — 404)
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx (duplicate)

- /docs/intro (resolved to https://olake.io/docs/intro — 404)
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
  - kubernetes.mdx
  - starrocks.mdx
  - src/pages/index.js
  - docusaurus.config.ts
  - src/pages/index1.tsx

- /docs/understanding/cdc (resolved to https://olake.io/docs/understanding/cdc — 404)
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx

- /docs/understanding/iceberg-partitioning (resolved to https://olake.io/docs/understanding/iceberg-partitioning — 404)
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx

- /docs/understanding/schema-evolution (resolved to https://olake.io/docs/understanding/schema-evolution — 404)
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx
  - 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx

- /docs/writers/iceberg/catalog/intro (resolved to https://olake.io/docs/writers/iceberg/catalog/intro — 404)
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx

- /iceberg/intro (resolved to https://olake.io/iceberg/intro — 404)
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx

- /iceberg/query-engine/intro (resolved to https://olake.io/iceberg/query-engine/intro — 404)
  - 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
  - 2025-09-09-mysql-to-apache-iceberg-replication.mdx

- Fix /blog/iceberg-metadata -> /blog/2025/10/03/iceberg-metadata (3 files)
- Remove /docs/intro links (not in CSV)
- Remove /iceberg/query-engine/intro links (not in CSV)
- Remove /docs/understanding/cdc links (not in CSV)
- Remove /docs/understanding/iceberg-partitioning links (not in CSV)
- Remove /docs/understanding/schema-evolution links (not in CSV)
- Remove /docs/writers/iceberg/catalog/intro links (not in CSV)
- Remove /docs/connectors/glue-catalog links (not in CSV)

All changes now strictly follow the CSV backlinking document
…ption

- Remove markdown link from description property (plain string, not MDX)
- Link remains in title where it will properly render
- Added back markdown link as per CSV line 89
- Other query engine files use markdown links in descriptions
- Link: AWS Glue Data Catalog -> /iceberg/olake-iceberg-athena
- apache-polaris-lakehouse: Link 'Trino' in heading (CSV line 122)
- olake-iceberg-athena: Link 'AWS Glue Data Catalog' in intro blurb (CSV line 89)
- iceberg-metadata: Link 'partitioning' in Best Practices (CSV line 74)

All anchor texts from CSV now properly linked in correct sections
- Add internal link 'Trino' -> /iceberg/olake-iceberg-trino in Resources section
- This is an internal OLake blog post, not external documentation
Copy link
Collaborator

@deepanshupal09-datazip deepanshupal09-datazip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, all the links are working at least... Two things I wanna highlight:

  1. I noticed that we are adding or changing the content in blogs and iceberg query engines, I've marked few instances... is this asked by the SEOByte team?
  2. You’ve added links where the text is being passed as a plain string, so they’re rendering as text instead of clickable links. I’ve pointed out a few examples, but not all. Please fix this everywhere.

- Created .blue-link CSS class to replace repeated className attributes
- Replaced all instances of long blue link className with blue-link class
- Fixed white link in presto.mdx description with inline styles
- Addresses PR review comments for code maintainability
…eData

- Converted [Hive] and [AWS Glue] markdown links to JSX Link components in Flink tableData
- Converted [Hive Metastore] and [AWS Glue] markdown links to JSX Link components in Spark tableData
- Added Link import to spark.mdx
- Links now properly render as clickable elements with blue-link styling
@Akshay-datazip Akshay-datazip merged commit 76c1504 into master Nov 20, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants