Skip to content

Conversation

@lindong28
Copy link
Member

@lindong28 lindong28 commented Jul 19, 2023

The purpose of this PR is to allow expired content on the nightly build Flink website to expire.

Currently, flink-web does not explicitly specify ExpiresActive in .htaccess, and therefore content expiration is disabled by default. CDN or users' web browser might still serve outdated content even after the content has been removed.

For example, this Flink ML web link still serves a page that has been deleted from the Flink ML repo even though the URL says "flink-ml-docs-master".

This PR fixes this problem by making the following changes:

  • Enable expiration by setting ExpiresActive on
  • Set text/html typed content to expire/refresh once every hour.
  • Set all other content (e.g. image/jpg) to expire/refresh once every day.

See [1] for a discussion of similar issues and the suggestion by Apache infra team. See [2] for documentation of the HTTP directives added in this PR. See [3] for a detailed explanation of the modification directive.

[1] https://issues.apache.org/jira/browse/INFRA-18519
[2] https://github.com/apache/echarts-website/blob/asf-site/.htaccess
[3] https://stackoverflow.com/questions/562802/cache-expire-control-with-last-modification

@lindong28 lindong28 force-pushed the support-expiration branch from bace521 to a359cd6 Compare July 19, 2023 02:30
@lindong28
Copy link
Member Author

@MartijnVisser Do you have time to review this PR?

@MartijnVisser
Copy link
Contributor

MartijnVisser commented Jul 19, 2023

Do you have time to review this PR?

Sure. I do think that there's a different issue. This .htaccess file is only used on https://flink.apache.org project website, but not for the documentation that's build on https://nightlies.apache.org/flink.

I don't immediately see a workflow for building the flink-ml docs: where is that done?
Edit: I see tools/ci/docs.sh but I don't see any workflow triggering that?

@lindong28
Copy link
Member Author

@MartijnVisser Thanks for the comments.

flink-ml docs is built by this script https://github.com/apache/infrastructure-bb2/blob/master/flink-ml.py. This script is executed every day by a build bot whose status can be found by searching "flink ml" at https://ci2.apache.org/#/builders.

If you are also not sure where to find/update .htaccess for https://nightlies.apache.org/flink, do you know who might know the answer? If none of us know, maybe I should create a JIRA for the Apache infra team.

@MartijnVisser
Copy link
Contributor

flink-ml docs is built by this script https://github.com/apache/infrastructure-bb2/blob/master/flink-ml.py. This script is executed every day by a build bot whose status can be found by searching "flink ml" at https://ci2.apache.org/#/builders.

@lindong28 I'm wondering if there's something wrong in the rsync step, that causes the file that serves https://nightlies.apache.org/flink/flink-ml-docs-master/docs/try-flink-ml/quick-start/ not to be removed. I think it's best to file a Jira for it. For the Flink repo, we've moved away from buildbot to https://github.com/apache/flink/blob/master/.github/workflows/docs.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants