-
Notifications
You must be signed in to change notification settings - Fork 79
Job queue, matrix builder, concurrency control #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trh/compilers-in-metadata
Are you sure you want to change the base?
Conversation
|
Main chunks of work to be done at this point:
|
|
But the first two bullets can be implemented as separate changes, right? They're not needed for strict parity. |
|
@fsoikin the overall goal is to merge #669 ASAP, so that we can reupload the whole registry. We could be doing all these changes in separate PRs but I don't see the point since they will all need to end up in the |
| insertPackageJob :: PackageOperation -> ContT Response (Run _) Response | ||
| insertPackageJob operation = do | ||
| lift $ Log.info $ "Enqueuing job for package " <> PackageName.print (Operation.packageName operation) | ||
| jobId <- newJobId | ||
| lift $ Db.insertPackageJob { jobId, payload: operation } | ||
| jsonOk V1.jobCreatedResponseCodec { jobId } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code checked for running jobs before creating new ones:
lift (Db.runningJobForPackage packageName) >>= case _ of
Right { jobId, jobType: runningJobType } -> ...Looks like the duplicate checking is no longer there, is that intentional?
| JOIN ${JOB_INFO_TABLE} info ON job.jobId = info.jobId | ||
| WHERE info.finishedAt IS NULL | ||
| AND info.startedAt IS NULL | ||
| ORDER BY info.createdAt DESC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why DESC here? Don't we want the oldest job to go out first (FIFO), like it previously was? Same for the other selectNext* functions.
|
Also the tests are failing because of a missing |
| DeleteIncompleteJobs next -> do | ||
| Run.liftEffect $ SQLite.deleteIncompleteJobs env.db | ||
| pure next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to just delete incomplete jobs? Should they instead be re-inserted into the queue? For example, reset startedAt to NULL so the job is picked up again.
We may also want to put more effort into making partially-completed jobs more recoverable, such as making operations as close to idempotent as possible and having a way to sweep through and catch any only partially-complete operations and decide whether to roll them back or to retry and complete them.
In addition to your list (some responses to that below), the other point that I'm not seeing here is that we need to ensure that the github issue module only proxies requests over to the server API and writes comments on 'notify' logs but does not actually execute e.g. the publish operation itself. Otherwise we've defeated the purpose of the pull request, as we can't enforce a lock on the git commits anymore.
This makes sense to me: start with no-dependency packages, and on completion we queue dependents that are satisfied by this package existing, propagate outwards.
What do we need to do beyond what we have today? Today, we open an issue, The same approach is used for transfers or unpublishing: open the issue as a |
WIP, should fix #696, fix #649