Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GH-261] Handle revoked token #308

Merged
merged 27 commits into from
May 9, 2023

Conversation

MatthewDorner
Copy link
Contributor

Summary

If a GitLab API call fails with 401 {error: invalid_token}, disconnect the user and DM the user a message telling how to reconnect their account.

Ticket Link

Fixes #261

Questions:

  • I have not handled the case where the token is both expired and revoked. Here, p.checkAndRefreshToken attempts to refresh the token and the call to src.Token fails. However, if I handle the error and call p.disconnectGitlabAccount, it will call p.checkAndRefreshToken again, creating an infnite loop. How would you suggest to resolve this?

  • In the case where the error occurs during the execution of a slash command, I have not changed the existing slash command error responses. I just send the DM in addition. So if you get the error during /gitlab me, you'll get the new message in the DM channel but the command response will still be "Encountered an error getting your GitLab profile.". Should I change the command response to a similar or identical message to what I'm sending in the DM?

  • What test coverage is needed?

@mattermod
Copy link
Contributor

Hello @MatthewDorner,

Thanks for your pull request! A Core Committer will review your pull request soon. For code contributions, you can learn more about the review process here.

@codecov
Copy link

codecov bot commented Jul 10, 2022

Codecov Report

Patch coverage: 21.77% and project coverage change: -1.18 ⚠️

Comparison is base (de004a9) 32.20% compared to head (865c702) 31.03%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #308      +/-   ##
==========================================
- Coverage   32.20%   31.03%   -1.18%     
==========================================
  Files          21       21              
  Lines        3465     3757     +292     
==========================================
+ Hits         1116     1166      +50     
- Misses       2242     2471     +229     
- Partials      107      120      +13     
Impacted Files Coverage Δ
server/api.go 21.17% <0.00%> (-1.83%) ⬇️
server/flow.go 0.00% <0.00%> (ø)
server/telemetry.go 0.00% <0.00%> (ø)
server/plugin.go 15.22% <18.35%> (-0.48%) ⬇️
server/command.go 24.72% <40.90%> (+1.11%) ⬆️
server/webhook.go 40.30% <42.10%> (+0.18%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@mattermod
Copy link
Contributor

This PR has been automatically labelled "stale" because it hasn't had recent activity.
A core team member will check in on the status of the PR to help with questions.
Thank you for your contribution!

Copy link
Contributor

@mickmister mickmister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work on this @MatthewDorner, thanks for implementing this solution 👍

The PR mostly looks good. I have one request to make a common function handleGitlabError to avoid some code duplication

server/api.go Outdated
Comment on lines 519 to 521
if strings.Contains(err.Error(), invalidTokenError) {
p.handleRevokedToken(c.GitlabInfo)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating this block across the different functions, I'm thinking we should have a function handleGitlabError(err error), which has the above logic. What do you think @MatthewDorner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would be better.

Copy link
Contributor Author

@MatthewDorner MatthewDorner Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the function would need to take a *gitlab.UserInfo as well. Maybe the handleGitlabError should be combined with my handleRevokedToken as a single function such as checkAndHandleRevokedToken(err error, info *gitlab.UserInfo)?

return "Requested resource is private."
case strings.Contains(err.Error(), invalidTokenError):
p.handleRevokedToken(info)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code cleanup is good here, though I think for the purpose of this PR, can we just add a call to handleGitlabError(err) at the top of the if err != nil block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'll do that.

@mickmister
Copy link
Contributor

  • I have not handled the case where the token is both expired and revoked. Here, p.checkAndRefreshToken attempts to refresh the token and the call to src.Token fails. However, if I handle the error and call p.disconnectGitlabAccount, it will call p.checkAndRefreshToken again, creating an infnite loop. How would you suggest to resolve this?

What is the side effect of "not handling the case where the token is both expired and revoked"? Note that the plugin's API is called multiple times when the frontend requests data to fill the left hand side GitLab metrics, so it will always be tried more than once during this time. You can refresh your browser to trigger this data refresh.

There may be the case where we end up sending more than one bot DM here, due to more than one request trying to use the access token before it is removed from the kv store. If the server is running in a high availability environment, there may be some lag on the database read for rejecting the token, which could cause this scenario.

I'm thinking that sending multiple DMs is an ok outcome, as it's relatively harmless, and any defense around this is going to be complicated and potentially error-prone.

Also, we will want to make sure the changes in this PR work with the changes from #298

  • In the case where the error occurs during the execution of a slash command, I have not changed the existing slash command error responses. I just send the DM in addition. So if you get the error during /gitlab me, you'll get the new message in the DM channel but the command response will still be "Encountered an error getting your GitLab profile.". Should I change the command response to a similar or identical message to what I'm sending in the DM?

What you describe as the current behavior of this PR seems good to me 👍

  • What test coverage is needed?

I would test the handleGitlabError function to assert that it handles the error properly

@MatthewDorner
Copy link
Contributor Author

What is the side effect of "not handling the case where the token is both expired and revoked"? Note that the plugin's API is called multiple times when the frontend requests data to fill the left hand side GitLab metrics, so it will always be tried more than once during this time. You can refresh your browser to trigger this data refresh.

Each API call will get to checkAndRefreshToken, see that the token needs to be refreshed, but when it tries to refresh, the call to src.Token() will fail because the token is revoked and can't be refreshed. So each request will fail. This is the current behavior in this PR if the token is both expired and revoked. And I'm not sure what exact circumstances a user would revoke a token, but it seems like it might coincide with an expired token (they stop using the plugin for a while or something) so that this can be a realistic scenario.

@MatthewDorner
Copy link
Contributor Author

Also, we will want to make sure the changes in this PR work with the changes from #298

Yeah, I waited for that PR to be merged and this PR is based on code that includes those changes. But the issue with the "revoked and expired" case is how to make this work with the checkAndRefreshToken method from that PR. To clarify:

  • currently in this PR, the case is not handled which means all requests will fail if the token is revoked and expired
  • if I try to handle the error from the src.Token() call within checkAndRefreshToken the same way I have elsewhere, it will cause an infinite loop in the code due to the reasons described in my opening post

There is probably a workaround I'm just not seeing?

@mickmister
Copy link
Contributor

What is the side effect of "not handling the case where the token is both expired and revoked"? Note that the plugin's API is called multiple times when the frontend requests data to fill the left hand side GitLab metrics, so it will always be tried more than once during this time. You can refresh your browser to trigger this data refresh.

Each API call will get to checkAndRefreshToken, see that the token needs to be refreshed, but when it tries to refresh, the call to src.Token() will fail because the token is revoked and can't be refreshed. So each request will fail. This is the current behavior in this PR if the token is both expired and revoked. And I'm not sure what exact circumstances a user would revoke a token, but it seems like it might coincide with an expired token (they stop using the plugin for a while or something) so that this can be a realistic scenario.

@MatthewDorner Is there any risk with this situation? If it's revoked & expired, and this happens, is the kvstore then in an invalid state? It seems like a non-issue to me if we invalidate the token on our side so we don't try to use it again.

Also, we will want to make sure the changes in this PR work with the changes from #298

Yeah, I waited for that PR to be merged and this PR is based on code that includes those changes. But the issue with the "revoked and expired" case is how to make this work with the checkAndRefreshToken method from that PR. To clarify:

  • currently in this PR, the case is not handled which means all requests will fail if the token is revoked and expired
  • if I try to handle the error from the src.Token() call within checkAndRefreshToken the same way I have elsewhere, it will cause an infinite loop in the code due to the reasons described in my opening post

There is probably a workaround I'm just not seeing?

"This case is not handled"

There's something I'm missing here. I don't understand how the case is not handled. I understand that the token won't work in this case. Are we getting the "invalid token" error from GitLab in this case? Shouldn't we just delete it from our database and tell the user about it like usual? Is there something stopping us from doing this? Can we extract some code from the function causing the infinite loop to call in another context?

Also, please mention me in replies here so I get a notification for the mention 👍

@MatthewDorner
Copy link
Contributor Author

MatthewDorner commented Jul 22, 2022

@mickmister In checkAndRefreshToken, we just check the expiration time of the token from the KV store to determine if it needs to be refreshed. If a token is invalid, we only know when we make a request and it returns the error from GitLab. So in order to reproduce this condition:

  • shut down Mattermost server
  • wait 2 hours, or modify the expiry field in the KV store _gitlabtoken record
  • on the GitLab side, revoke the token
  • start server again

From my perspective, I feel like the issue is that checkAndRefreshToken from #298 is called from getGitlabUserInfoByMattermostID, which ensures it's called before every API call, but isn't really part of the purpose of that function. And it means checkAndRefreshToken can't then use those methods such as in my case p.disconnectGitlabAccount without causing a loop.

I can think about it more, just wanted to see if something obvious occurred.

@MatthewDorner
Copy link
Contributor Author

Are we getting the "invalid token" error from GitLab in this case? Shouldn't we just delete it from our database and tell the user about it like usual?

@mickmister We never get to the actual API call, as the call to src.Token() (calls GitLab to refresh the token since it's expired) in checkAndRefreshToken will fail and that error will be returned. If I add code in checkAndRefreshToken to check for that error and eventually call p.disconnectGitlabAccount, that will call getGitlabUserInfoByMattermostID which will call checkAndRefreshToken, which will again try to refresh the token and on and on. Sorry if it's confusing.

@mickmister
Copy link
Contributor

Are we getting the "invalid token" error from GitLab in this case? Shouldn't we just delete it from our database and tell the user about it like usual?

@mickmister We never get to the actual API call, as the call to src.Token() (calls GitLab to refresh the token since it's expired) in checkAndRefreshToken will fail and that error will be returned. If I add code in checkAndRefreshToken to check for that error and eventually call p.disconnectGitlabAccount, that will call getGitlabUserInfoByMattermostID which will call checkAndRefreshToken, which will again try to refresh the token and on and on. Sorry if it's confusing.

@MatthewDorner Yeah it would seem to me that getGitlabUserInfoByMattermostID should be a local operation, and not a "contact GitLab" operation. getGitlabUserInfoByMattermostID should not call checkAndRefreshToken, and any function calling getGitlabUserInfoByMattermostID that also cares about refreshing the token should explicitly call checkAndRefreshToken after the call getGitlabUserInfoByMattermostID has returned. Does this solve the issue?

@MatthewDorner
Copy link
Contributor Author

@mickmister yes, that is one of the solutions that occurred to me. Basically, check for both conditions (expired and revoked) in the same place.

I didn't want to go that route immediately since it would be a larger scope of changes than I did so far, so I will work on it a bit and update. Thanks for the input.

@DHaussermann DHaussermann self-requested a review July 25, 2022 15:25
@DHaussermann DHaussermann added 2: Dev Review Requires review by a core committer 3: QA Review Requires review by a QA tester and removed Lifecycle/1:stale labels Jul 25, 2022
@MatthewDorner
Copy link
Contributor Author

@mickmister Think it would work to use a wrapper or decorator around the GitLab client calls? Otherwise, every method that makes GitLab calls must call checkAndRefreshToken in its setup stage, and then later check for the invalid token after any call it makes, or at least after the first call it will definitely make, even if it's only to call handleGitlabError(err error).

A wrapper function should be cleaner and use less code, but I assume it would hit the KV store for every GitLab call rather than only once per method that makes GitLab calls, but that may be negligible. And may be difficult in other ways I don't foresee.

@mickmister
Copy link
Contributor

mickmister commented Jul 27, 2022

@MatthewDorner Could we have a wrapper function around the http handler functions?

apiRouter.HandleFunc("/user", p.checkAuth(p.attachContext(p.getGitlabUser), ResponseTypeJSON)).Methods(http.MethodPost)

We can rename checkAuth to checkMattermostAuth, then make a similar checkGitlabAuth function for certain routes. Note that since checkGitlabAuth would be a more expensive operation, we want to make that be the outer call when wrapping.

@mickmister
Copy link
Contributor

@mickmister Think it would work to use a wrapper or decorator around the GitLab client calls? Otherwise, every method that makes GitLab calls must call checkAndRefreshToken in its setup stage, and then later check for the invalid token after any call it makes, or at least after the first call it will definitely make, even if it's only to call handleGitlabError(err error).

A wrapper function should be cleaner and use less code, but I assume it would hit the KV store for every GitLab call rather than only once per method that makes GitLab calls, but that may be negligible. And may be difficult in other ways I don't foresee.

I'm not sure how this wrapper function would work, since each of the Gitlab client's methods return different types. To me, handleGitlabError is the wrapper function. We would probably need to abstract out the Gitlab client to include this wrapper behavior.

If the solution in my previous comment won't work, I'm thinking we should just go with the strategy designed in your first paragraph. Explicit error handling for each call to the external library Gitlab client makes sense to me.

@MatthewDorner
Copy link
Contributor Author

@mickmister unless you have a strong preference, I think I like locating the GitLab token error handling closer to the GitLab client calls, and to ensure it will be called in more circumstances such as slash commands, even if that isn't strictly necessary due to the frontend constantly making those requests. Also I already started working on that version haha.

@MatthewDorner
Copy link
Contributor Author

I'm thinking that sending multiple DMs is an ok outcome, as it's relatively harmless, and any defense around this is going to be complicated and potentially error-prone.

@mickmister yes, this will be the case when the revoked token is detected due to the 4 API requests from the frontend. It also spams the error log a bit with messages such as can't get GitLab user info from mattermost id because the user is already disconnected and it's still trying to process the other requests.

@MatthewDorner
Copy link
Contributor Author

MatthewDorner commented Jul 31, 2022

@mickmister Converting to draft because still having issues. The issue is related to the conversation #298 (comment) and case where the 4 API requests happen simultaneously, as well as the previous conversation in this PR about token being both expired and revoked.

While it was OK to have multiple token refreshes in the PR linked above, it is not OK once we are checking for and disconnecting the user based on an invalid token. What happens is one of the 4 requests will refresh the token, and the old token is now invalid. Then the 3 other goroutines try to refresh their (now invalid) tokens and (with the changes in this PR) the user gets disconnected. I'm not sure how the exact timing or concurrency works out, but this condition happens every time for me when I set my token to expire and then refresh the browser.

This is why it's a problem if getGitlabUserInfoByMattermostId is a local operation (and without locking anything): one goroutine can load the token from KV while another has already refreshed it and is about to store it, which means the first goroutine has loaded an invalid token.

I suspect this is also the reason for a user in the community server complaining of their error logs being spammed even after the token refresh fix, which resulted in this issue #313. The first request would refresh the token and the other 3 would fail and write to the error log, every 2 hours. It just wouldn't break any functionality, but if we also disconnect the user every time we detect an invalid token it becomes a bigger issue, and maybe some concurrency control is required now.

@MatthewDorner MatthewDorner marked this pull request as draft July 31, 2022 15:25
@MatthewDorner
Copy link
Contributor Author

The issue is made more difficult by the fact that GitLab's OAuth implementation does not allow a refresh_token to be used multiple times. It issues a new one upon each refresh. I guess the OAuth specification allows it to be done either way.

https://docs.gitlab.com/ee/api/oauth2.html

The Golang oauth2 library apparently auto-refreshes the token, but doesn't give you the new token back, so the auto-refresh doesn't work in cases like this where refresh_token is invalidated in this way and the token needs to be persistent. (and would be subject to the same concurrency issues I suppose). Anyway by detecting the need to refresh the token ahead of time we're avoiding that auto-refresh behavior.

Relevant thread here, although it's confusing because they're talking about a few different issues / cases at the same time:
golang/oauth2#84

Copy link

@DHaussermann DHaussermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and passed as per testing above

  • Spamming the logs when trying to fetch data from the API stops on upgrade as the plugin does not treat the affected users as connected
  • Affected users get the DM notice to reconnect
  • Users with a functional authentication continue to us the plugin normally
  • After 2 days of monitoring I see nothing unusual in the logs after the user has re-connected
  • Regression tested Chimera proxy
  • Regression tested authentication for browser and desktop

Created a separate issue to triage and address for multiple DMs #371

LGTM!

Huge thanks @MatthewDorner for the continued efforts on this work.

@DHaussermann DHaussermann added 4: Reviews Complete All reviewers have approved the pull request and removed 2: Dev Review Requires review by a core committer 3: QA Review Requires review by a QA tester labels Apr 26, 2023
@mickmister mickmister requested a review from srkgupta April 26, 2023 20:28
@mickmister
Copy link
Contributor

@srkgupta Do you mind giving this another review? I resolved some merge conflicts regarding the p.API -> p.client migration

@DHaussermann Are you able to give this another smoke test?

@DHaussermann DHaussermann added 2: Dev Review Requires review by a core committer 3: QA Review Requires review by a QA tester and removed 4: Reviews Complete All reviewers have approved the pull request labels Apr 27, 2023
@DHaussermann DHaussermann self-requested a review April 28, 2023 21:19
Copy link

@DHaussermann DHaussermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and passed
Sanity checked the functionality now that @mickmister has resolved merge conflicts.

  • Requests for GitLab dada stop when token is known to be invalid
  • DMs are receive
  • Reconnect works and user remains conneted

LGTM!

@DHaussermann DHaussermann added QA Review Done PR has been approved by QA and removed 3: QA Review Requires review by a QA tester labels May 1, 2023
@mickmister mickmister added 4: Reviews Complete All reviewers have approved the pull request and removed 2: Dev Review Requires review by a core committer labels May 2, 2023
@mickmister mickmister requested review from hanzei and removed request for hanzei May 2, 2023 20:16
@mickmister
Copy link
Contributor

Looks like we need @hanzei's approval before this can be merged:

Merging can be performed automatically once the requested changes are addressed.

@mickmister
Copy link
Contributor

@hanzei Can you take another look at this when you have the chance? The merge is currently blocked by the requested changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4: Reviews Complete All reviewers have approved the pull request QA Review Done PR has been approved by QA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Token is not refreshed automatically Plugin should avoid using revoked tokens
10 participants