-
Notifications
You must be signed in to change notification settings - Fork 16
feat: Add comprehensive metadata enhancement system #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Akshay-datazip
commented
Sep 9, 2025
- Enhanced JSON-LD plugin with AI-specific meta tags
- Added ChatGPT and Claude optimized descriptions
- Implemented automatic content type detection
- Added enhanced Open Graph and Twitter meta tags
- Created PWA support with site.webmanifest
- Added IndexNow protocol for faster indexing
- Implemented LLM.txt for AI understanding
- Added OpenSearch protocol support
- Enhanced author metadata with social links
- Added comprehensive bot directives and SEO meta tags
- Implemented automatic date parsing from filenames
- Created intelligent metadata generation for all blog posts
- Enhanced JSON-LD plugin with AI-specific meta tags - Added ChatGPT and Claude optimized descriptions - Implemented automatic content type detection - Added enhanced Open Graph and Twitter meta tags - Created PWA support with site.webmanifest - Added IndexNow protocol for faster indexing - Implemented LLM.txt for AI understanding - Added OpenSearch protocol support - Enhanced author metadata with social links - Added comprehensive bot directives and SEO meta tags - Implemented automatic date parsing from filenames - Created intelligent metadata generation for all blog posts
- Create BlogBreadcrumbs component with structured data support - Add Home > Blog navigation path for better UX and SEO - Integrate breadcrumbs into BlogPostPage and BlogListPage - Support both /blog and /iceberg routes - Include proper ARIA labels and schema.org markup - Improve user navigation experience and search engine understanding
- Move breadcrumbs from top navigation to inside blog post content - Add breadcrumbs to BlogLayout component for proper positioning - Remove breadcrumbs from blog listing pages (only show on individual posts) - Breadcrumbs now appear below main nav but above blog content - Maintains proper content hierarchy and user experience
check_urls.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the need of this file?
comprehensive_url_analysis.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the need of this file?
detailed_check.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the need of this file? looks like repeated file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these images needed for enhancement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they might have been pulled for the recent commit might take a pull from master again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now submit this to bing webmasters
docusaurus.config.js
Outdated
| href: '/site.webmanifest', | ||
| }, | ||
| }, | ||
| // Declare some json-ld structured data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jsonLd is usually declared in blogLayout (one place for all the blogs) and index pages(for landing page), not in some config file.
- Remove lakehouse-image.png, step-3-image.png, step-4.png - Remove step-5-1.png, step-5-2.png, step-7.png, stp-6-1.png - These images were incorrectly placed in /2025/ instead of proper subdirectories - Images should be in /2025/12/ or /2025/13/ based on blog post requirements
- Remove Organization JSON-LD from docusaurus.config.js - Add Organization JSON-LD to BlogLayout component (for all blog pages) - Add Organization JSON-LD to index page (for landing page) - Follow senior's feedback: JSON-LD should be in layout components, not config - Maintains structured data while following proper Docusaurus patterns
- Remove check_urls.sh (unused development tool) - Remove comprehensive_url_analysis.sh (unused development tool) - These scripts were not integrated into build process - Docusaurus has built-in link validation capabilities - Scripts only worked with localhost and were not documented
- Remove detailed_check.sh (duplicate URL analysis script) - This was a repeated file with same functionality as previously removed scripts - Not integrated into build process or referenced anywhere - Docusaurus has built-in link validation capabilities - Repository now cleaner without redundant development tools
- Enhance BlogBreadcrumbs to include article titles - Add title truncation for long article names (50 char limit) - Make current page title non-clickable in breadcrumbs - Extract article title from document title in BlogLayout - Improve SEO with more detailed breadcrumb hierarchy - Example: Home > Blog > What Makes OLake Fast
- Add fallback title extraction from URL path when document title fails - Handle common abbreviations (API, UI, ETL, CDC, AWS, MySQL, etc.) - Convert kebab-case URLs to proper title case - Filter out generic titles like 'Blogs on OLake' - Example: /blog/mysql-to-apache-iceberg-replication -> 'MySQL to Apache Iceberg Replication' - More reliable title display in breadcrumbs
- Remove article title extraction from breadcrumbs - Revert to simple Home > Blog navigation - Remove complex title extraction logic - Keep breadcrumbs clean and simple - Maintain SEO benefits with basic breadcrumb structure
…analysis - Upgrade to TechArticle schema for technical content - Add alternative headlines based on content analysis - Implement rich about/mentions sections with contextual entities - Add tutorial/HowTo schemas for step-by-step guides - Enhance FAQ integration with mainEntity structure - Add blog context and language specifications - Intelligent content detection for schema type and sections - Rich keyword generation from title/description/content - Contextual about/mentions based on content analysis