Automatic translator updates #1

dstillman · 2018-06-11T11:59:20Z

Pull from the translators repo at startup and connect to the streaming server for immediate updates. These should be optional.

mtrojan-ub · 2018-08-16T13:17:30Z

It would be great to have the translators repo url in a config file, to be able to use a fork instead of the original translators repo.
(/edit: or would it be enough to fork the server and update .gitmodules?)

abaevbog · 2023-05-02T16:31:31Z

Summarized some thoughts here:

Main problem: The translators are currently pulled during start up of the lambda environment and not updated after. They can end up being out of date, and if there is a change to a translator, the new version will not be used until AWS decides to drop existing container and create a new one.

Idea 1: Add a middleware into the lambda to pull metadata, check /tmp/ folder if the translator code has been cached in there, and pull it from the repo.zotero.org if file doesn’t exist of is too old.
Issue: It’s slow, as it will always try to download new translator that has not been used before by that instance of lambda. Also, it creates dependency on repo.zotero.org.

Idea 2: Maybe we could use lambda layers? Translators can be packaged as a layer for the lambda function. The streaming server (correct me if I am not correct about how that piece works) can notify another agent (most likely another lambda function) that can do the fetching of metadata and asking repo.zotero.org about the updated translators, create a new layer, and update the lambda function of the translation server to use the new layer. That way, fetching/updating of the new translators happens only in one place, the translator server is independent of this logic, and repo.zotero.org is not checked for no reason.
Issue: this involves creating more pieces which is certainly making the whole setup more complex.

edit: In fact, the actual layering may not be needed. This other agent (most likely another lambda) can pull the latest translator-server code with the latest translators and run the deployment script. That way it's one less thing to worry about.

Idea 3: Skip the streaming server and try to use GitHub actions ci. On push to master to the translators repo, we can pull the translation-server, move latest translators file into the right folder of translation-server, and then run the deploy script to update lambdas

dstillman · 2023-05-02T20:37:45Z

The translators are currently pulled during start up of the lambda environment and not updated after. They can end up being out of date, and if there is a change to a translator, the new version will not be used until AWS decides to drop existing container and create a new one.

Just to clarify, translators are currently updated when we update the git submodule and redeploy — it's not related to the Lambda execution environment at the moment.

I don't think we need or want to overly focus on Lambda here. That's how we deploy it, but I don't think there's any fundamental reason we can't have the same solution for Docker or straight Node deployments. So most of the logic here should just go in the main logic outside of lambda.js.

(The Lambda part does imply that using the streaming server doesn't make sense, since we can't use a persistent connection. I'm not sure if we were even deploying to Lambda when I opened this ticket, but ignore that in any case.)

I think continuing to use a submodule for the base set of translators is OK — most translators don't change for years at a time, so the server will be able to continue to use hundreds of them without downloading updates, and automatic updates should also be an optional setting.

I think the basic process is:

In Zotero.Translators.init(), load translators from the submodule.
If automatic updates are enabled, also fetch /metadata from the repo and store that in memory with the current timestamp.
If we successfully fetch /metadata, use the target from there in getWebTranslatorsForLocation() instead of webRegexp (which comes from the target in the translator file). Note that /metadata could include new translators that weren't bundled. (There's also a mechanism for deleted translators that we have to check — I'll have to check the details.)
When a request comes in and matches a translator, check whether cached translator's lastUpdated < the lastUpdated from /metadata. If it is, make a /code request for the updated translator and cache that in place of the bundled one.
At some random point a bit before desired expiration time, re-fetch /metadata. This needs to be random so that concurrent requests don't all request /metadata after expiration.
If a /metadata or /code request fails, just fall back to the data we already have.

So after a server deployment, it will request /metadata at startup and then periodically, but the number of translator requests will start at 0 and just increase as translators are updated and added.

For Lambda, since this is all cached outside of the invocation, execution environments will share the same set of updated translators. New ones will have to start with the submodule set. (We'll be able to see how often that's happening, and that can influence how often we bother updating the submodule, but I don't expect it to be much of a problem, as long as we're only fetching updated translators.)

dstillman self-assigned this Jun 11, 2018

dstillman mentioned this issue Jul 26, 2018

Docker #24

Merged

dstillman mentioned this issue Nov 15, 2018

Setting up a transation-server for CSL JSON metadata generation? #51

Closed

dstillman mentioned this issue Mar 11, 2019

Regularly publish releases at npmjs #92

Open

dstillman assigned abaevbog and unassigned dstillman May 2, 2023

This was referenced May 5, 2023

Automatic translator updates #157

Closed

Auto updating translators based on metadata #158

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic translator updates #1

Automatic translator updates #1

dstillman commented Jun 11, 2018

mtrojan-ub commented Aug 16, 2018 •

edited

Loading

abaevbog commented May 2, 2023 •

edited

Loading

dstillman commented May 2, 2023

Automatic translator updates #1

Automatic translator updates #1

Comments

dstillman commented Jun 11, 2018

mtrojan-ub commented Aug 16, 2018 • edited Loading

abaevbog commented May 2, 2023 • edited Loading

dstillman commented May 2, 2023

mtrojan-ub commented Aug 16, 2018 •

edited

Loading

abaevbog commented May 2, 2023 •

edited

Loading