Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic translator updates #1

Open
dstillman opened this issue Jun 11, 2018 · 3 comments
Open

Automatic translator updates #1

dstillman opened this issue Jun 11, 2018 · 3 comments
Assignees

Comments

@dstillman
Copy link
Member

Pull from the translators repo at startup and connect to the streaming server for immediate updates. These should be optional.

@dstillman dstillman self-assigned this Jun 11, 2018
@dstillman dstillman mentioned this issue Jul 26, 2018
@mtrojan-ub
Copy link

mtrojan-ub commented Aug 16, 2018

It would be great to have the translators repo url in a config file, to be able to use a fork instead of the original translators repo.
(/edit: or would it be enough to fork the server and update .gitmodules?)

@abaevbog
Copy link

abaevbog commented May 2, 2023

Summarized some thoughts here:

Main problem: The translators are currently pulled during start up of the lambda environment and not updated after. They can end up being out of date, and if there is a change to a translator, the new version will not be used until AWS decides to drop existing container and create a new one.

Idea 1: Add a middleware into the lambda to pull metadata, check /tmp/ folder if the translator code has been cached in there, and pull it from the repo.zotero.org if file doesn’t exist of is too old.
Issue: It’s slow, as it will always try to download new translator that has not been used before by that instance of lambda. Also, it creates dependency on repo.zotero.org.

Idea 2: Maybe we could use lambda layers? Translators can be packaged as a layer for the lambda function. The streaming server (correct me if I am not correct about how that piece works) can notify another agent (most likely another lambda function) that can do the fetching of metadata and asking repo.zotero.org about the updated translators, create a new layer, and update the lambda function of the translation server to use the new layer. That way, fetching/updating of the new translators happens only in one place, the translator server is independent of this logic, and repo.zotero.org is not checked for no reason.
Issue: this involves creating more pieces which is certainly making the whole setup more complex.

edit: In fact, the actual layering may not be needed. This other agent (most likely another lambda) can pull the latest translator-server code with the latest translators and run the deployment script. That way it's one less thing to worry about.

Idea 3: Skip the streaming server and try to use GitHub actions ci. On push to master to the translators repo, we can pull the translation-server, move latest translators file into the right folder of translation-server, and then run the deploy script to update lambdas

@dstillman
Copy link
Member Author

The translators are currently pulled during start up of the lambda environment and not updated after. They can end up being out of date, and if there is a change to a translator, the new version will not be used until AWS decides to drop existing container and create a new one.

Just to clarify, translators are currently updated when we update the git submodule and redeploy — it's not related to the Lambda execution environment at the moment.

I don't think we need or want to overly focus on Lambda here. That's how we deploy it, but I don't think there's any fundamental reason we can't have the same solution for Docker or straight Node deployments. So most of the logic here should just go in the main logic outside of lambda.js.

(The Lambda part does imply that using the streaming server doesn't make sense, since we can't use a persistent connection. I'm not sure if we were even deploying to Lambda when I opened this ticket, but ignore that in any case.)

I think continuing to use a submodule for the base set of translators is OK — most translators don't change for years at a time, so the server will be able to continue to use hundreds of them without downloading updates, and automatic updates should also be an optional setting.

I think the basic process is:

  1. In Zotero.Translators.init(), load translators from the submodule.
  2. If automatic updates are enabled, also fetch /metadata from the repo and store that in memory with the current timestamp.
  3. If we successfully fetch /metadata, use the target from there in getWebTranslatorsForLocation() instead of webRegexp (which comes from the target in the translator file). Note that /metadata could include new translators that weren't bundled. (There's also a mechanism for deleted translators that we have to check — I'll have to check the details.)
  4. When a request comes in and matches a translator, check whether cached translator's lastUpdated < the lastUpdated from /metadata. If it is, make a /code request for the updated translator and cache that in place of the bundled one.
  5. At some random point a bit before desired expiration time, re-fetch /metadata. This needs to be random so that concurrent requests don't all request /metadata after expiration.
  6. If a /metadata or /code request fails, just fall back to the data we already have.

So after a server deployment, it will request /metadata at startup and then periodically, but the number of translator requests will start at 0 and just increase as translators are updated and added.

For Lambda, since this is all cached outside of the invocation, execution environments will share the same set of updated translators. New ones will have to start with the submodule set. (We'll be able to see how often that's happening, and that can influence how often we bother updating the submodule, but I don't expect it to be much of a problem, as long as we're only fetching updated translators.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants