Skip to content

Conversation

@torcolvin
Copy link
Collaborator

@torcolvin torcolvin commented Oct 10, 2025

CBG-4345 don't panic if role documents can't be read

If a role can't be read, we want to end the connection because we don't know the right role information to use to use. In the case of BLIP, we should exit with a 503 to indicate to the app to retry. At the point we have a auth.User object, we have authenticated and that's probably OK, but there could be an error reading the role document (SDK error, etc).

I changed the handler code to exit with the auth error if we can't read the role document from the user. This is a behavioral change, but I think is the correct one to do, we don't know information about the user. An alternative would be to log an info or error. This is not that dissimilar to the behavior we have now (GetRole would panic, the HTTP handler would exit 500).

This code will need careful review to make sure that the functions and goroutines will terminate correctly.

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

torcolvin and others added 3 commits October 9, 2025 22:10
Refactor tests in prep for CBG-4345 which changes some
return values.

- create helper functions
- replace assert.True(t, x == y) with assert.Equal(t, x, y)
- Create error types to assert on
@torcolvin torcolvin requested a review from Copilot October 10, 2025 14:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses panic issues when role documents cannot be read by converting functions that previously panicked into error-returning functions. The changes ensure graceful error handling throughout the authentication and channel access system, particularly when role documents are missing or corrupted.

Key changes:

  • Modified authentication functions to return errors instead of panicking when role documents can't be read
  • Updated all callers to handle the new error returns
  • Added proper error logging and user-friendly error responses

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
auth/principal.go Updated interface definitions to return errors for channel access methods
auth/user.go Modified user methods to return errors when role loading fails instead of panicking
auth/role.go Updated role methods to return errors for channel access operations
auth/user_collection_access.go Added error handling to collection-specific channel access methods
auth/role_collection_access.go Updated collection role methods to return errors
auth/collection_access.go Modified interface definitions for collection access to include error returns
db/crud.go Updated MakeUserCtx to handle errors from user channel operations
db/changes.go Added error handling throughout changes feed processing
db/design_doc.go Updated view filtering to handle channel access errors
db/functions/function.go Added error handling for channel authorization in functions
db/functions/js_function.go Updated JavaScript function execution to handle user context errors
rest/handler.go Modified audit logging to handle role retrieval errors gracefully
rest/bulk_api.go Updated bulk operations to handle channel access errors properly
rest/admin_api.go Added error handling to principal marshaling
rest/diagnostic_api.go Updated diagnostic API to handle channel access errors
rest/user_api_test.go Updated tests to handle new error returns
rest/role_api_test.go Modified tests to check for errors in channel operations
db/util_testing.go Added error handling to test utilities
db/database_test.go Updated tests to handle new error signatures
auth/user_test.go Modified tests to check for errors in user operations
auth/auth_test.go Updated authentication tests to handle new error returns
auth/collection_access_test.go Added error checking to collection access tests

if availableChannels == nil {
// TODO: CBG-1948
panic("no channels for user?")
base.AssertfCtx(h.ctx(), "User %q has no channels in handleAllDocs", base.UD(h.user.Name()))
Copy link

Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The panic replacement uses base.AssertfCtx followed by returning an error. This is inconsistent - either use the assertion (which may panic in debug builds) or return the error, but not both. Consider removing the assertion and only returning the HTTP error.

Suggested change
base.AssertfCtx(h.ctx(), "User %q has no channels in handleAllDocs", base.UD(h.user.Name()))

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want to assert here, and we still return the same error as a panic but with a better message.

Base automatically changed from CBG-4345-err to main October 20, 2025 19:17
@torcolvin torcolvin self-assigned this Oct 23, 2025
@torcolvin torcolvin force-pushed the CBG-4345-thread-err branch from a45805d to 0b831e5 Compare October 27, 2025 16:27
@torcolvin torcolvin requested a review from Copilot October 27, 2025 16:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

visibleChannels, err = user.InheritedCollectionChannels(base.DefaultScope, base.DefaultCollection)
hasStarChannel = !visibleChannels.Contains("*")
if !applyChannelFiltering {
return // this is an error
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment 'this is an error' is misleading since the function now returns an error value. The early return here should either return an error or the comment should be updated to clarify the intent. Consider returning a proper error if this is truly an error condition.

Suggested change
return // this is an error
// Channel filtering was expected but not applied; returning error
return result, errors.New("channel filtering not applied when expected")

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving existing code as is.


// Subroutine that creates a response row for a document:
output := make(chan *ChangeEntry, len(explicitDocIds))
defer close(output)
Copy link

Copilot AI Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the defer close(output) to the top of the function is incorrect. The channel was previously closed after the loop that populates it completes. If an error occurs during iteration and causes an early return, this defer will close the channel before all values have been sent, potentially causing data loss or goroutine issues.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code uses this channel to signal that this is done, if the channel isn't closed an early return is needed, then this was a latent bug.

@torcolvin torcolvin assigned gregns1 and unassigned torcolvin Oct 27, 2025
@adamcfraser adamcfraser assigned adamcfraser and unassigned gregns1 Oct 28, 2025
Copy link
Collaborator

@adamcfraser adamcfraser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. Generally I don't like including the text "Error" in warning or info logging, made a few suggestions to avoid that, and a few other error handling comments.

// base.InfofCtx(user.auth.LogCtx, base.KeyAccess, "User %s role %q = %v", base.UD(user.Name_), base.UD(name), base.UD(role))
if err != nil {
panic(fmt.Sprintf("Error getting user role %q: %v", name, err))
return nil, fmt.Errorf("Error getting user role %q: %w", name, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's going in logs, I think we want base.UD(name).

previousChannels = col.user.InheritedCollectionChannels(col.ScopeName, col.Name)
previousChannels, err = col.user.InheritedCollectionChannels(col.ScopeName, col.Name)
if err != nil {
base.WarnfCtx(ctx, "Error getting previous channels for user %q: %v", base.UD(col.user.Name()), err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
base.WarnfCtx(ctx, "Error getting previous channels for user %q: %v", base.UD(col.user.Name()), err)
base.WarnfCtx(ctx, "Unable to retrieve inherited channels for user %q: %v", base.UD(col.user.Name()), err)

changedChannels = col.user.InheritedCollectionChannels(col.ScopeName, col.Name).CompareKeys(previousChannels)
channels, err := col.user.InheritedCollectionChannels(col.ScopeName, col.Name)
if err != nil {
base.WarnfCtx(ctx, "Error getting channels for user %q: %v", base.UD(col.user.Name()), err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
base.WarnfCtx(ctx, "Error getting channels for user %q: %v", base.UD(col.user.Name()), err)
base.WarnfCtx(ctx, "Unable to retrieve inherited channels for user %q: %v", base.UD(col.user.Name()), err)

var err error
channelsSince, channelsRemoved, err = col.user.FilterToAvailableCollectionChannels(col.ScopeName, col.Name, chans)
if err != nil {
base.WarnfCtx(ctx, "Error filtering to available channels for user %q: %v", base.UD(col.user.Name()), err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
base.WarnfCtx(ctx, "Error filtering to available channels for user %q: %v", base.UD(col.user.Name()), err)
base.WarnfCtx(ctx, "Unable to filter to available channels for user %q: %v", base.UD(col.user.Name()), err)

newChannelsSince, _ := col.user.FilterToAvailableCollectionChannels(col.ScopeName, col.Name, chans)
newChannelsSince, _, err := col.user.FilterToAvailableCollectionChannels(col.ScopeName, col.Name, chans)
if err != nil {
base.WarnfCtx(ctx, "Error filtering to available channels for user %q: %v", base.UD(col.user.Name()), err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
base.WarnfCtx(ctx, "Error filtering to available channels for user %q: %v", base.UD(col.user.Name()), err)
base.WarnfCtx(ctx, "Unable to filter to available channels for user %q: %v", base.UD(col.user.Name()), err)

if db.user.CanSeeCollectionChannel(db.ScopeName, db.Name, channel) && (removal == nil || removal.Seq > options.Since.Seq) {
canSee, err := db.user.CanSeeCollectionChannel(db.ScopeName, db.Name, channel)
if err != nil {
return nil, err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places we're wrapping the underlying error with additional context. I think that makes sense to help identify exactly where the failure occurred during changes processing - any reason to not do so here as well?

currentChannels := user.InheritedCollectionChannels(dsName.ScopeName(), dsName.CollectionName())
currentChannels, err := user.InheritedCollectionChannels(dsName.ScopeName(), dsName.CollectionName())
if err != nil {
return nil, fmt.Errorf("error getting user channels: %w", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("error getting user channels: %w", err)
return nil, fmt.Errorf("error retrieving inherited user channels: %w", err)

}
roleNames, err := getSGUserRolesForAudit(dbCtx, h.user)
if err != nil {
base.InfofCtx(h.ctx(), base.KeyHTTP, "Error getting user roles for audit logging: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
base.InfofCtx(h.ctx(), base.KeyHTTP, "Error getting user roles for audit logging: %v", err)
base.InfofCtx(h.ctx(), base.KeyHTTP, "Unable to retrieve user roles for audit logging, will be omitted from audit entry: %v", err)

@torcolvin torcolvin assigned torcolvin and unassigned adamcfraser Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants