-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor input logic with collection helper #679
base: main
Are you sure you want to change the base?
Conversation
This unnecessary passed a pointer to the server struct around, when we can be more specific to just pass a config around instead. In passing move the isCloudInternalDatabase next to the function since it is the only caller.
These are unused and its not clear they would be required in the near future.
The functions collecting data in input/postgres often need auxilliary information like the Postgres version, the global collection options, or the server configuration. They also sometimes need access to the self test utilities, or the logger, which need to be passed along as well. This has historically caused a lot of churn in the argument lists (which grow very long), as well as inconsistencies in the ordering. To unify all of the above, introduce the CollectionHelper, which is intended to be passed as the second argument (after context) in all input/postgres collection functions.
Previously multiple code paths have been querying the database whether the currently connected user is superuser/quasi-superuser/pg_monitor role, which we really only need to do once. With the new collection helper we now have a better facility for this. Since we already query pg_roles, this removes use of pg_has_role completely, instead relying on the initial pg_roles lookup and finding the members of the connected role via Go code instead. Notably this causes a potential behavior change, since we now at most go from a relationship like "pganalyze" to "my_monitoring" to "pg_monitor", without allowing further intermediary roles like pg_has_role would. In practice this should be acceptable.
slices.Contains(memberOf, roleByName["rds_superuser"].Oid) || | ||
slices.Contains(memberOf, roleByName["azure_pg_admin"].Oid) || | ||
slices.Contains(memberOf, roleByName["cloudsqlsuperuser"].Oid) | ||
connectedAsMonitoringRole := slices.Contains(memberOf, roleByName["pg_monitor"].Oid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per PR review, preference to keep a database connection check here (vs implementing our own pg_has_role).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think doing this once is great, but doing it in Go does not seem like a good trade-off to me.
@@ -17,40 +16,40 @@ SELECT schemaname, tablename, attname, inherited, null_frac, avg_width, n_distin | |||
WHERE schemaname NOT IN ('pg_catalog', 'information_schema') | |||
` | |||
|
|||
func GetColumnStats(ctx context.Context, logger *util.Logger, db *sql.DB, globalCollectionOpts state.CollectionOpts, systemType string, dbName string, server *state.Server, postgresVersion state.PostgresVersion) (state.PostgresColumnStatsMap, error) { | |||
func GetColumnStats(ctx context.Context, h CollectionHelper, db *sql.DB, dbName string) (state.PostgresColumnStatsMap, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per PR review, prefer using a pointer here
"github.com/pganalyze/collector/util" | ||
) | ||
|
||
type CollectionHelper struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment explaining what this does
Also rename to "Collection" or "CollectionState"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed in PR review, it's not really just state, so I would avoid the State suffix. We could also use a synonym of Collection like Lot (as in "a number of units of an article, a single article, or a parcel of articles offered as one item (as in an auction sale)") or Batch (I think there's too much potential for confusion with that one, but throwing it out there).
ps.CollectedAt = time.Now() | ||
|
||
ts.Version, err = postgres.GetPostgresVersion(ctx, logger, connection) | ||
h, err := postgres.NewCollectionHelper(ctx, logger, server, globalCollectionOpts, connection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still like just postgres.NewCollection
, as I mentioned in PR Review. Helper
doesn't really say anything, and conceptually I think it makes sense that a Collection is what eventually builds a Snapshot.
version, err := postgres.GetPostgresVersion(ctx, logger, db) | ||
if err != nil { | ||
return "", fmt.Errorf("error collecting Postgres version: %s", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these removed because we're moving the version check earlier in later patches?
slices.Contains(memberOf, roleByName["rds_superuser"].Oid) || | ||
slices.Contains(memberOf, roleByName["azure_pg_admin"].Oid) || | ||
slices.Contains(memberOf, roleByName["cloudsqlsuperuser"].Oid) | ||
connectedAsMonitoringRole := slices.Contains(memberOf, roleByName["pg_monitor"].Oid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think doing this once is great, but doing it in Go does not seem like a good trade-off to me.
See individual commits.
The motivation for doing this, besides making the code easier to work with, is to reduce repeated small queries we can answer from information already collected instead. This PR does that by moving pg_has_role lookups to the collection helper (and a single pg_roles query when the helper is initialized), and a separate follow-up PR will do this for the checks for helper functions using a single pg_proc query (see #680).