Webhook for deployments #562

jayantbh · 2024-10-01T19:46:57Z

Why do we need this ?

Currently PR merges are assumed to be deployments for a repo, which is fair for any repo that runs on some kind of a CI.

But for many such repos that don't, we should at least support a webhook based mechanism that allows me to feed my deployment/workflow runs data into Middleware.

That will let me have a better picture of my Dora metrics with more accurate Lead Time, and Deployment Frequency.

Kamlesh72 · 2024-10-02T11:26:43Z

@jayantbh working on this.

jayantbh · 2024-10-02T11:27:49Z

Sure. Do share your approach before you begin implementation.

jayantbh · 2024-10-03T15:45:06Z

Important

This issue is tagged advanced. By taking this up you acknowledge that you're accepting that this will be a non-trivial change and may require thorough testing and review.
Of course, this also means that we offer swag for someone who goes out of their way to tackle issues tagged advanced. 🚀
This also means we'll follow up on this regularly, and in case of inactivity the issue would be unassigned.

Kamlesh72 · 2024-10-03T18:41:47Z

@jayantbh Currently we take PR Merge or Workflow ( like Github_Actions ) for deployments. correct?

I am thinking to create a route that collects workflow/deployment webhook data.
This captured data will be mapped and pushed into RepoWorkflowRuns.
Separate adapter for each like bitbucket, circleci, gitlab etc.

This is basic idea, although more brainstorming needed.

jayantbh · 2024-10-04T17:11:13Z

This should ideally happen on the python backend (apiserver dir). But yes, you have the right idea. I'll let @adnanhashmi09 explain further.

adnanhashmi09 · 2024-10-07T19:00:22Z

Keep the following in mind while implementing the workflow:

Use authorization headers or custom headers for authenticating the workflow user. We should create a mechanism for users to create and update API keys. This would also include UI development efforts.
The webhook should never cause the workflow to fail or take an excessively long time. It should return a status of 200 in all cases. In case of an error, the response body should contain the error message and possible ways to fix it.
We need a mechanism to map these workflows to repositories linked with Middleware. Therefore, the webhook should also receive repository data for each workflow run.
The processing of data should be asynchronous and not block the API response. The API request should resolve almost immediately after the request has been sent.
The data should be processed in chunks, and the end user should send data in chunks, i.e., no more than 500 workflow runs data in a single call. This webhook should have the ability to sync large amounts of data and/or a single workflow run. Users can make a call to this webhook at the start and end of their workflow. We can infer the duration of the workflow run using that. Another case could be a user sending a number of their older workflow runs for us to process.
A simple validation of received data should be performed when someone tries to upload data. If the required fields are not present, we should return a process error body with a status code of 200. We don't keep erroneous data.
We would also need an API to prune the data synced if someone uploaded incorrect data and wanted to delete it.
An API to revoke/generate API tokens is necessary.
A frontend page to manage API tokens should be developed.
Implement alerting/notification in case of erroneous data.
A data dump for the request type, request body, response and error should be saved in case of an error. The data received from the end-user can be saved here and then later picked up for processing. So this could serve multiple purposes.
We need some event based system to process workflow runs asynchronously without blocking the main thread. So whenever someone sends are request to our webhook we register an "event" which is picked up by a listener. When that event is invoked, the listener queries the database for the latest data to process and starts processing.
The request body can be like as follows:

{
    "workflow_runs":[
        {          
            "workflow_name":"custom_workflow",
            "repo_names":["middleware"],
            "event_actor":"adnanhashmi09",
            "head_branch":"master",
            "workflow_run_unique_id":"unique_item",
            "status":"SUCCESS",
            "duration":"200", // can be provided, or we can infer this
            "workflow_run_conducted_at":"2024-09-28T20:35:45.123456+00:00"
        }
    ]
}

Read through the workflow sync once to check all the fields required for creating a RepoWorkflowRun

A RepoWorkflow shall be created based on workflow_name and repo_names if not already present. This shall also be a part of validation, ie, if RepoWorkflow cannot be created due to the repo_names being wrong or not being linked to middleware shall result in an error.

So there are a lot of moving parts in this implementation and would require a thorough understanding of our system. Please read through the sync and document your approach here before starting to implement. This is a rather comprehensive task and would longer to implement.

Kamlesh72 · 2024-10-09T13:45:34Z

@adnanhashmi09 Providers like Github actions, Gitlab, Circleci... are the sources of user deployment data. What other platforms can send data to our system? The structure of data will be different for each source, so we will need adapter to process data or user will be sending the structured response?

We can store all the incoming data into redis which will be later picked up for processing. We can only verify errors like API_Key before sending the response as data is not yet processed but we need to return 200 asap. So how we agree on point 6 as we are not processing data synchronously?

We also fetch data from Github Actions REST Api. So both rest api and webhook data will be stored in same table, right? Can a user also prune data fetched from Github Actions REST Api (point 7) ?

Can you please elaborate point 11 ?

adnanhashmi09 · 2024-10-10T22:38:33Z

@adnanhashmi09 Providers like Github actions, Gitlab, Circleci... are the sources of user deployment data. What other platforms can send data to our system? The structure of data will be different for each source, so we will need adapter to process data or user will be sending the structured response?

This webhook implementation is platform agnostic. We don't care about the workflow providers as the provider is not responsible for sending data. It is the user who integrates our webhook into their workflow who is responsible for sending the correct data. We will define a set of fields we require in the request body for us to register RepoWorkflow and RepoWorkflowRuns. It is up to the end user to make sure correct values are being sent.

We can store all the incoming data into redis which will be later picked up for processing. We can only verify errors like API_Key before sending the response as data is not yet processed but we need to return 200 asap. So how we agree on point 6 as we are not processing data synchronously?

Well, we can check for a few errors besides API_KEY errors. For instance, Maximum allowed data to be sent in once request, validate if the repo_names sent are linked with middleware or not. These operations are fairly quick to compute.

We also fetch data from Github Actions REST Api. So both rest api and webhook data will be stored in same table, right? Can a user also prune data fetched from Github Actions REST Api (point 7) ?

I don't think anybody would get github actions data from both integration and webhook. But yes, in practice we keep both data. We don't give the option to prune github actions data as they can always unlink that integration.

Can you please elaborate point 11 ?

We can save the entire request data in a database table including the data we receive for processing. This way we can check for errors and show alerts to the user by getting data from that table. It can also serve as a data dump to check what data has been received by our system for processing.

Kamlesh72 · 2024-10-23T16:55:39Z

@adnanhashmi09 The ?? is I have doubt whether to add it or not.

API KEYS

User will be able to Create, Read and Delete API Keys.
The API Key setting can be accessed as follows:

// APIKeys Table Schema in Postgres
API_KEYS {
    keyname: "",
    secret_key: "",
    expiry_at: "",
    is_deleted: "",
    scope: "" //  [ WORKFLOW, INCIDENT ]
    org_id: "" // ??
}

Receiving Webhook Data

The "/webhook" api will be added to flask server.

{
    event_type: "WORKFLOW", // or "INCIDENT",
    payload: {
        workflow_runs: [{
            workflow_name: "name",
            provider_workflow_id: "", // ??
            repo_name: "middleware",
            event_actor: "githubusername",
            head_branch: "master",
            workflow_run_id: "",
            status: "SUCCESS",
            duration: "200",
            workflow_run_conducted_at: "date",
            html_url: "url"
        }]
    }
}

// Headers: "X-Secret-Key": "secret_key"

Pre Processing Validation

Verify API Key
Verify size of data
Verify required fields
Verify repo_name exists in middleware
If error, send 200 with error message and Notify user about erroneous data on email/slack.
The notification module can be developed separately and later integrated into it.

Store the data for processing

Store the data in postgres table WebhookEventRequests (which act as DataDump table).
Same table to store Workflow or Incident webhook data.

WebhookEventRequests {
    request_type: "DEPLOYMENT", // Or INCIDENT
    request_data: "{ workflow_runs: [] }",
    status: "",
    error: "",
    created_in_db_at: "",
    processed_at: "date",
    response_data: "",
    retries: 0
}

Call the Celery to process data async. The broker will be Redis.
If there is any error, WebhookEventRequest will be updated accordingly with status, error.
Also, the user will be Notified about the error.
If no error, then update WebhookEventRequest and store data in RepoWorkflow and RepoWorkflowRuns.

UI

There will be 3 pages namely: Deployments, Incidents, API Keys.
This pages will show incoming data in the table form with columns like WebhookEventRequest_id, processed_at, status.
The data can be sent from WebhookEventRequests table using Server Side Events.
As of now, we will just use API calls, so user need to refresh to get new data.
Another way is to have 3 tabs (API Keys, Deployments, Incidents) in a single page.

Discussion

Should the API Key actually be deleted or just marked is_deleted?
Why repo_names and not repo_name?
If the incoming data is much less like <10 workflows, it can be processed synchronously or can be processed in batches in celery (will decide later).

jayantbh added enhancement New feature or request hacktoberfest advanced These are not really meant for beginners or brand new contributors, but feel free to give it a shot! labels Oct 1, 2024

jayantbh mentioned this issue Oct 2, 2024

Bitbucket Integration #560

Open

jayantbh assigned Kamlesh72 Oct 2, 2024

adnanhashmi09 mentioned this issue Oct 7, 2024

Webhook/endpoint to create "incidents" in the system on-demand #558

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Webhook for deployments #562

Webhook for deployments #562

jayantbh commented Oct 1, 2024

Kamlesh72 commented Oct 2, 2024

jayantbh commented Oct 2, 2024

jayantbh commented Oct 3, 2024

Kamlesh72 commented Oct 3, 2024

jayantbh commented Oct 4, 2024

adnanhashmi09 commented Oct 7, 2024

Kamlesh72 commented Oct 9, 2024

adnanhashmi09 commented Oct 10, 2024 •

edited

Loading

Kamlesh72 commented Oct 23, 2024

Webhook for deployments #562

Webhook for deployments #562

Comments

jayantbh commented Oct 1, 2024

Why do we need this ?

Kamlesh72 commented Oct 2, 2024

jayantbh commented Oct 2, 2024

jayantbh commented Oct 3, 2024

Kamlesh72 commented Oct 3, 2024

jayantbh commented Oct 4, 2024

adnanhashmi09 commented Oct 7, 2024

Kamlesh72 commented Oct 9, 2024

adnanhashmi09 commented Oct 10, 2024 • edited Loading

Kamlesh72 commented Oct 23, 2024

API KEYS

Receiving Webhook Data

Pre Processing Validation

Store the data for processing

UI

Discussion

adnanhashmi09 commented Oct 10, 2024 •

edited

Loading