-
Notifications
You must be signed in to change notification settings - Fork 16
Internal linking #241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal linking #241
Conversation
Content Audit Improvements: Blog Posts (3 files): - Add Key Takeaways sections (5 bullet points each) for quick scanning - Add comprehensive FAQ sections (10 questions per blog) - Add 34+ internal links to related OLake documentation - Improve scannability with bullet points and shorter paragraphs Query Engine Pages (6 files): - Add 24 real-world use case examples with concrete scenarios - Improve descriptions for better clarity and readability - Add plain-language summaries for technical concepts - Add intro paragraphs before feature matrices Files Modified: - blog/2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx - blog/2025-09-09-mysql-to-apache-iceberg-replication.mdx - blog/2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx - docs-iceberg-query-engine/bigquery.mdx - docs-iceberg-query-engine/databricks.mdx - docs-iceberg-query-engine/snowflake.mdx - docs-iceberg-query-engine/starburst.mdx - docs-iceberg-query-engine/starrocks.mdx - docs-iceberg-query-engine/trino.mdx Target: Improve Flesch Reading Ease scores (21-30 → more accessible) Result: More scannable, practical, and user-friendly content
- Add Learn More buttons to Iceberg feature cards (Schema evolution, Schema datatype changes, Partitioning) - Make Why OLake feature cards clickable (Faster Resumable Full Load, Schema-Aware Logs, CDC Cursor Preservation, near real-time latency) - Add Quickstart Guide link in FAQ - Add internal links in docs/intro.mdx (parallelized chunking, binlogs, oplogs, Apache Iceberg) - Add internal links in docs/features/index.mdx (Parallelised Chunking, Schema Evolution, Iceberg partitioning) - Add internal links in docs/core/use-cases.mdx (open-source data stack, log-based CDC, ML feature stores) - Add internal links in docs/benchmarks.mdx (PostgreSQL to Apache Iceberg, MongoDB)
- docs/benchmarks.mdx: Link CDC, Debezium, PostgreSQL - docs/getting-started/quickstart.mdx: Link OLake UI to architecture - docs/getting-started/playground.mdx: Link Apache Iceberg, Presto, Docker Compose - docs/install/olake-ui/index.mdx: Link Create Jobs tutorial - docs/getting-started/creating-first-pipeline.mdx: Link OLake UI, Postgres, Apache Iceberg
- docs/features/index.mdx: Link Sync Modes - docs/core/architecture.mdx: Link chunking strategies, concurrency models, state management - docs/core/use-cases.mdx: Link Apache Iceberg, Iceberg lakehouse
- docs/understanding/compatibility-catalogs.mdx: Link REST catalog, Hive Metastore, JDBC Catalog - docs/understanding/compatibility-engines.mdx: Link Presto guide
- docs/connectors/postgres: Link Full Refresh + CDC, CDC Only - docs/connectors/mysql: Link CDC Only, Full Refresh + Incremental, Binary Logging - docs/connectors/mongodb: Link CDC Only, oplog, Change Data Capture (CDC)
- blog/how-to-set-up-postgres-apache-iceberg: Link Apache Iceberg, AWS Glue Catalog, Quick Start Installation, Trino
- blog/how-to-set-up-mongodb-apache-iceberg: Link Apache Iceberg, oplog, Athena - blog/mysql-apache-iceberg-replication: Link Apache Iceberg, binlog, Trino
- blog/apache-hive-vs-iceberg-comparison: Link Partitioning, schema evolution, time travel, open table format - blog/binlogs: Link MySQL - blog/creating-job-olake-docker-cli: Link Postgres to Apache Iceberg - blog/iceberg-metadata: Link Hive Metastore Catalog, AWS Glue Catalog, REST Catalog, partitioning - blog/debezium-vs-olake: Link CDC - blog/json-vs-bson-vs-jsonb: Link PostgreSQL, MongoDB Progress: ~12 blog posts completed with 40+ internal links added
- blog/apache-iceberg-vs-delta-lake-guide: Link Deletion Vectors, Apache Polaris - blog/building-open-data-lakehouse: Link PrestoDB, Iceberg REST catalog - blog/apache-polaris-lakehouse: Link Trino, Time travel Progress: All major blog posts completed with 60+ internal links added
- docs-iceberg-query-engine/trino: Link hive_metastore, glue, snowflake, Time Travel, distributed SQL - docs-iceberg-query-engine/presto: Link Hive Metastore, AWS Glue, AS OF TIMESTAMP, Distributed SQL - docs-iceberg-query-engine/athena: Link Amazon Athena, AWS Glue Data Catalog, INSERT/UPDATE/DELETE/MERGE - docs-iceberg-query-engine/spark: Link Hive Metastore, AWS Glue, Timestamp-based Query - docs-iceberg-query-engine/flink: Link Hive Metastore, AWS Glue, CDC to Iceberg Progress: 5 major query engine docs completed with 20+ internal links
…stone) - docs-iceberg-query-engine/hive: Link REST/Nessie, Traditional data warehouse, MERGE - docs-iceberg-query-engine/duckdb: Link REST catalog - docs-iceberg-query-engine/clickhouse: Link Time-travel - docs-iceberg-query-engine/bigquery: Link FOR SYSTEM_TIME AS OF, BigLake external - docs-iceberg-query-engine/snowflake: Link UniForm, Trino - docs-iceberg-query-engine/databricks: Link UniForm, time-travel - docs-iceberg-query-engine/dreamio: Link Time Travel - docs-iceberg-query-engine/starrocks: Link Time Travel, materialized view - docs-iceberg-query-engine/impala: Link Hive Metastore, position deletes Progress: All 15 query engine docs completed with 35+ internal links
- iceberg/olake-iceberg-trino: Link Trino, AWS Glue Data Catalog, trino/athena/spark - iceberg/olake-iceberg-athena: Link Amazon Athena, Trino Progress: ALL internal linking tasks completed! Summary: - 8 commits with 100+ internal links added - Homepage & core components (4 files) - Documentation pages (15 files) - Connectors (3 files) - Blog posts (12 files) - Query engine docs (15 files) - Iceberg integration docs (2 files) Total: 51 files updated with comprehensive internal linking
deepanshupal09-datazip
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of 404 urls in this PR, I have not mentioned all of them... but there are many of them...
Once check with AI to find all the 404 urls and replace or remove them
Here is the list that I was able to find out with the help of AI, please check this once:
Mapping: 404 links -> files containing the link
- ../jobs/create-jobs (resolved to https://olake.io/jobs/create-jobs — 404)
- olake-ui.mdx
- index.mdx
- local.mdx
- local.mdx
- /blog/iceberg-metadata (resolved to https://olake.io/blog/iceberg-metadata — 404)
- 2025-09-15-apache-hive-vs-apache-iceberg-comparison.mdx
- compatibility-catalogs.mdx
- index.mdx
- http://localhost:8000 (local dev link, unreachable from CI / public check)
- 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
- 2025-09-09-mysql-to-apache-iceberg-replication.mdx
- 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
- creating-first-pipeline.mdx
- olake-ui.mdx
- playground.mdx
- quickstart.mdx
- docs/playground/olake-iceberg-presto.mdx
- kubernetes.mdx
- 2025-08-29-deploying-olake-on-kubernetes.mdx
- index.mdx
- offline-environments.mdx
- blog/2025-10-01-building-complete-open-data-lakehouse-from-scratch.mdx
- local.mdx
- local.mdx
- docs/writers/azure-adls/overview.mdx
- /docs/connectors/glue-catalog (resolved to https://olake.io/docs/connectors/glue-catalog — 404)
- 2025-09-09-mysql-to-apache-iceberg-replication.mdx
- 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
- 2025-09-09-mysql-to-apache-iceberg-replication.mdx (duplicate)
- /docs/intro (resolved to https://olake.io/docs/intro — 404)
- 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
- 2025-09-09-mysql-to-apache-iceberg-replication.mdx
- 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
- kubernetes.mdx
- starrocks.mdx
- src/pages/index.js
- docusaurus.config.ts
- src/pages/index1.tsx
- /docs/understanding/cdc (resolved to https://olake.io/docs/understanding/cdc — 404)
- 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
- 2025-09-09-mysql-to-apache-iceberg-replication.mdx
- 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
- /docs/understanding/iceberg-partitioning (resolved to https://olake.io/docs/understanding/iceberg-partitioning — 404)
- 2025-09-09-mysql-to-apache-iceberg-replication.mdx
- 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
- 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
- /docs/understanding/schema-evolution (resolved to https://olake.io/docs/understanding/schema-evolution — 404)
- 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
- 2025-09-09-mysql-to-apache-iceberg-replication.mdx
- 2025-09-10-how-to-set-up-mongodb-apache-iceberg.mdx
- /docs/writers/iceberg/catalog/intro (resolved to https://olake.io/docs/writers/iceberg/catalog/intro — 404)
- 2025-09-09-mysql-to-apache-iceberg-replication.mdx
- /iceberg/intro (resolved to https://olake.io/iceberg/intro — 404)
- 2025-09-09-mysql-to-apache-iceberg-replication.mdx
- /iceberg/query-engine/intro (resolved to https://olake.io/iceberg/query-engine/intro — 404)
- 2025-09-07-how-to-set-up-postgres-apache-iceberg.mdx
- 2025-09-09-mysql-to-apache-iceberg-replication.mdx
- Fix /blog/iceberg-metadata -> /blog/2025/10/03/iceberg-metadata (3 files) - Remove /docs/intro links (not in CSV) - Remove /iceberg/query-engine/intro links (not in CSV) - Remove /docs/understanding/cdc links (not in CSV) - Remove /docs/understanding/iceberg-partitioning links (not in CSV) - Remove /docs/understanding/schema-evolution links (not in CSV) - Remove /docs/writers/iceberg/catalog/intro links (not in CSV) - Remove /docs/connectors/glue-catalog links (not in CSV) All changes now strictly follow the CSV backlinking document
…ption - Remove markdown link from description property (plain string, not MDX) - Link remains in title where it will properly render
- Added back markdown link as per CSV line 89 - Other query engine files use markdown links in descriptions - Link: AWS Glue Data Catalog -> /iceberg/olake-iceberg-athena
- apache-polaris-lakehouse: Link 'Trino' in heading (CSV line 122) - olake-iceberg-athena: Link 'AWS Glue Data Catalog' in intro blurb (CSV line 89) - iceberg-metadata: Link 'partitioning' in Best Practices (CSV line 74) All anchor texts from CSV now properly linked in correct sections
- Add internal link 'Trino' -> /iceberg/olake-iceberg-trino in Resources section - This is an internal OLake blog post, not external documentation
deepanshupal09-datazip
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, all the links are working at least... Two things I wanna highlight:
- I noticed that we are adding or changing the content in blogs and iceberg query engines, I've marked few instances... is this asked by the SEOByte team?
- You’ve added links where the text is being passed as a plain string, so they’re rendering as text instead of clickable links. I’ve pointed out a few examples, but not all. Please fix this everywhere.
…avel links, fix catalog URLs
- Created .blue-link CSS class to replace repeated className attributes - Replaced all instances of long blue link className with blue-link class - Fixed white link in presto.mdx description with inline styles - Addresses PR review comments for code maintainability
…eData - Converted [Hive] and [AWS Glue] markdown links to JSX Link components in Flink tableData - Converted [Hive Metastore] and [AWS Glue] markdown links to JSX Link components in Spark tableData - Added Link import to spark.mdx - Links now properly render as clickable elements with blue-link styling
Followed the internal back linking seo audit report by seobyte