Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] docs: Fix search engine ranking of manual pages (SEO) #4579

Open
neteler opened this issue Oct 23, 2024 · 12 comments · May be fixed by OSGeo/grass-addons#1241
Open

[Bug] docs: Fix search engine ranking of manual pages (SEO) #4579

neteler opened this issue Oct 23, 2024 · 12 comments · May be fixed by OSGeo/grass-addons#1241
Assignees
Labels
bug Something isn't working manual Documentation related issues
Milestone

Comments

@neteler
Copy link
Member

neteler commented Oct 23, 2024

Describe the bug

The GRASS GIS manual pages of the different versions have been published for a long time with a difficult to understand concept of being invisible, redirected or shown, which also strongly affects the search engine ranking.

Note: issue posted here since the core manual pages are affected (while the cronjobs are maintained in the addon repository).

How Python publishes its man pages

The scope of of a pull request in preparation is to partially adopt the Python manual pages concept which looks like this (checked Oct 23, 2024):

A Google search for "Python documentation" returns as the first hit the "3.13.0 Documentation" with the URL https://docs.python.org/. Clicking on this takes the user to https://docs.python.org/3/, which is identical to https://docs.python.org/3.13/.

This means that the same documentation is served at two URLs:

How can GRASS GIS publish its manual pages?

While the situation in the GRASS GIS project is a bit different, we can mimic the Python approach to some extent.

Current GRASS GIS version overview

label Ver
legacy 7.8
old 8.3
current stable 8.4
preview 8.5

I have started to locally implement modifications in the cronjobs to improve the terrible SEO situation and make more versions properly visible (ovedue for a long time).

Now, for a few days we have the following approach deployed on the server for testing purposes (cronjob PR coming soon):

  • https://grass.osgeo.org/grass-devel/manuals/ - now copied in modified cronjob from 8.5.x (i.e., main branch) with <link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">
  • https://grass.osgeo.org/grass-stable/manuals/ - now copied in modified cronjob from 8.4.x (i.e., releasebranch_8_4 branch) - this is the overall main manual
  • https://grass.osgeo.org/grass85/manuals/ - current unreleased development (generated by cronjob, then copied to grass-devel) - with <link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">
  • https://grass.osgeo.org/grass84/manuals/ - current stable release branch (generated by cronjob, then copied to grass-stable) - with <link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">
  • https://grass.osgeo.org/grass78/manuals/ - with <link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html"> and red box pointing to to grass-stable (generated by cronjob with box URL and canonical version defined)
  • https://grass.osgeo.org/grass65/manuals/ - with <link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html"> and red box pointing to to grass-stable (old static pages)
  • https://grass.osgeo.org/grass64/manuals/ - with <link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html"> and red box pointing to to grass-stable (old static pages)
  • ... likewise other old manual versions.

Sitemaps:

  • These have also been updated, this is done by the cronjobs.

Observations:

  • SEO: Without indication of "canonical" URLs different versions wipe each out out in search engines. Canonical tags help consolidate duplicate or similar content by specifying the preferred version of a page, ensuring search engines index and rank the desired URL while avoiding duplicate content issues. All older and "devel" manual pages now point to "stable" as the canonical to avoid duplicate content.
  • Very old versions: The static 6.4 and 6.5 versions have recently been reactivated and the associated Apache redirects on grass.osgeo.org removed to reduce the current SEO problems. So they are accessible again, but will not be indexed by search engines due to the "canonical" pointers (see above). Note that e.g. older scientific publications point to these (now) old GRASS GIS manual pages.
  • Rolling stable manual pages: As I don't see any point in releasing the static released stable version (e.g. 8.4.0), instead the daily stable release branch is used to make manual improvements immediately available.
  • "Devel" vs "Preview": Historically, we have called the development version "grass-devel", which is called "preview" on the website. This should be streamlined.

Note that it may even take weeks for Google etc. to "learn" the improved structure. At time, I am feeding Google search tools and Bing webmasters tools with the appropriate updates every few days.

Additional context

TODO:

@neteler neteler added bug Something isn't working manual Documentation related issues labels Oct 23, 2024
@neteler neteler added this to the 8.5.0 milestone Oct 23, 2024
@neteler neteler self-assigned this Oct 23, 2024
@echoix
Copy link
Member

echoix commented Oct 23, 2024

This is a great project and great ideas!

I don't know if Python docs are the best example of SEO, when I'm looking for Python docs on a topic/function/class it happens quite often that I don't find a link to the Python docs before the 2nd page, but probably all the results first try really hard with their SEO. That said, the plan doesn't seem wrong.

Would having a version selector in the red box (not exhaustive, just like Python docs) help to have an entry point between versions while still keeping the canonical refs?

Does some guidance exist on how read the docs work for that? These sites seem to work well with that.

@dhdeangelis
Copy link
Contributor

I am glad to see this proposal. Having access to links to manual pages that use the words "stable" and "devel", mirroring the current stable and development versions respectively, is also a huge bonus when writing tutorials and presentations. This makes the links and documents more durable/valid in time. At the same time keeping the links with version numbers enables the user to point to a specific version of a command if necessary.

@neteler
Copy link
Member Author

neteler commented Oct 27, 2024

Seems the new concepts is slowly gaining traction: GRASS GIS 8.4 is the first hit again.

https://www.google.com/search?q=grass+r.watershed

image

It will take more time though, to see an improved ranking.

@echoix
Copy link
Member

echoix commented Oct 27, 2024

I can confirm searching for the same term in my normally private, non tracked, non google logged in browser that it was the first result too.

Do you happen to know if the non-uppercase Grass GIS in the title is expected?
image

@neteler
Copy link
Member Author

neteler commented Oct 27, 2024

I assume that it takes more days (weeks) to also see the updated titles, i.e. all in caps. The unexpected lowercase will still originate from the very old manual pages being first rank previously.

@neteler neteler changed the title [Bug] docs: Fix search engine ranking of manual pages [Bug] docs: Fix search engine ranking of manual pages (SEO) Oct 28, 2024
@neteler
Copy link
Member Author

neteler commented Oct 28, 2024

I got automated feedback from the Google Search Console:

  • now G8.4 and G8.4 are colliding: "Page is not indexed: Duplicate without user-selected canonical".

I suggest to modify the approach outlined above to:

  • "stable" to be the new canonical (previously G8.4)
  • G8.4 will get injected: <link rel="canonical" href="https://grass.osgeo.org/grass-stable/manuals/index.html">
  • "devel" to be the new canonical (previously G8.5)
  • G8.5 will get injected: <link rel="canonical" href="https://grass.osgeo.org/grass-devel/manuals/index.html">

Thus

  • The two canonicals are no longer moving targets,
  • No more "Page not indexed: Duplicate without user-selected canonical" should occur,
  • as before, easy to reference in tutorials,
  • it is basically the same as done in the Python project.

Re: version selector in the old manual pages: why not - code contributions are welcome.

@neteler
Copy link
Member Author

neteler commented Oct 28, 2024

This is now deployed on the server.

BTW: Page indexing is slowly improving:

image

@echoix
Copy link
Member

echoix commented Nov 12, 2024

Weird, I searched for something (grass assertRasterFitsUnivar) and the two first results are for grass72:
image

Placing the links as formatted code to not create a link
https://grass.osgeo.org/grass72/manuals/libpython/genindex.html

https://grass.osgeo.org/grass72/manuals/libpython/gunittest.html

@neteler
Copy link
Member Author

neteler commented Nov 12, 2024

the two first results are for grass72

It is a PITA.
I will now inject "canonical" URLs also into the grass72 manual pages (and other old copies).

@neteler
Copy link
Member Author

neteler commented Nov 12, 2024

Sidenote: the missing acceptance of OSGeo/grass-addons#1215 blocks my unsubmitted SEO efforts in the cronjobs.

@neteler
Copy link
Member Author

neteler commented Nov 12, 2024

https://grass.osgeo.org/grass72/manuals/libpython/genindex.html

Ok, I have injected the "canonical" in all old versions, restored the file timestamps accordingly and let re-index some pages in the Google search console.

Slowly getting there:

image

It will take 1-2 weeks in my experience to propagate to the user side of Google Search, i.e. by end of Nov 2024.

neteler added a commit to neteler/grass-addons that referenced this issue Nov 12, 2024
The GRASS GIS manual pages of the different versions have been published for a long time with a difficult to understand concept of being invisible, redirected or shown, which also strongly affects the search engine ranking.

SEO: Without indication of "canonical" URLs different versions wipe each out out in search engines. Canonical tags help consolidate duplicate or similar content by specifying the preferred version of a page, ensuring search engines index and rank the desired URL while avoiding duplicate content issues.

This PR changes the cronjob scripts to
- inject "grass-stable" as the "canonical" into older manual pages under versioned URL
- inject "grass-devel" as the "canonical" into the development manual pages under versioned URL

Like this no "duplicate content" from a SEO perspective should occur.

Also `robots.txt` is updated to reactivate the manual pages of old GRASS GIS versions (which now contain "grass-stable" as the canonical).

Fixes OSGeo/grass#4579
@neteler
Copy link
Member Author

neteler commented Nov 13, 2024

I have updated the PR description: now https://grass.osgeo.org/grass-stable/manuals/ is the overall main manual the older and "devel" manual versions point to via "canonical". Like this no more duplicate content should occur.

See also OSGeo/grass-addons#1241

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working manual Documentation related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants