Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor input logic with collection helper #679

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

lfittl
Copy link
Member

@lfittl lfittl commented Feb 17, 2025

See individual commits.

The motivation for doing this, besides making the code easier to work with, is to reduce repeated small queries we can answer from information already collected instead. This PR does that by moving pg_has_role lookups to the collection helper (and a single pg_roles query when the helper is initialized), and a separate follow-up PR will do this for the checks for helper functions using a single pg_proc query (see #680).

This unnecessary passed a pointer to the server struct around, when we
can be more specific to just pass a config around instead.

In passing move the isCloudInternalDatabase next to the function since
it is the only caller.
These are unused and its not clear they would be required in the near
future.
The functions collecting data in input/postgres often need auxilliary
information like the Postgres version, the global collection options,
or the server configuration. They also sometimes need access to the
self test utilities, or the logger, which need to be passed along as well.

This has historically caused a lot of churn in the argument lists
(which grow very long), as well as inconsistencies in the ordering.

To unify all of the above, introduce the CollectionHelper, which is
intended to be passed as the second argument (after context) in all
input/postgres collection functions.
Previously multiple code paths have been querying the database whether the currently
connected user is superuser/quasi-superuser/pg_monitor role,
which we really only need to do once. With the new collection helper we
now have a better facility for this.

Since we already query pg_roles, this removes use of pg_has_role
completely, instead relying on the initial pg_roles lookup and finding
the members of the connected role via Go code instead.

Notably this causes a potential behavior change, since we now at most
go from a relationship like "pganalyze" to "my_monitoring" to
"pg_monitor", without allowing further intermediary roles like pg_has_role
would. In practice this should be acceptable.
slices.Contains(memberOf, roleByName["rds_superuser"].Oid) ||
slices.Contains(memberOf, roleByName["azure_pg_admin"].Oid) ||
slices.Contains(memberOf, roleByName["cloudsqlsuperuser"].Oid)
connectedAsMonitoringRole := slices.Contains(memberOf, roleByName["pg_monitor"].Oid)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per PR review, preference to keep a database connection check here (vs implementing our own pg_has_role).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think doing this once is great, but doing it in Go does not seem like a good trade-off to me.

@@ -17,40 +16,40 @@ SELECT schemaname, tablename, attname, inherited, null_frac, avg_width, n_distin
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
`

func GetColumnStats(ctx context.Context, logger *util.Logger, db *sql.DB, globalCollectionOpts state.CollectionOpts, systemType string, dbName string, server *state.Server, postgresVersion state.PostgresVersion) (state.PostgresColumnStatsMap, error) {
func GetColumnStats(ctx context.Context, h CollectionHelper, db *sql.DB, dbName string) (state.PostgresColumnStatsMap, error) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per PR review, prefer using a pointer here

"github.com/pganalyze/collector/util"
)

type CollectionHelper struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment explaining what this does

Also rename to "Collection" or "CollectionState"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in PR review, it's not really just state, so I would avoid the State suffix. We could also use a synonym of Collection like Lot (as in "a number of units of an article, a single article, or a parcel of articles offered as one item (as in an auction sale)") or Batch (I think there's too much potential for confusion with that one, but throwing it out there).

ps.CollectedAt = time.Now()

ts.Version, err = postgres.GetPostgresVersion(ctx, logger, connection)
h, err := postgres.NewCollectionHelper(ctx, logger, server, globalCollectionOpts, connection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like just postgres.NewCollection, as I mentioned in PR Review. Helper doesn't really say anything, and conceptually I think it makes sense that a Collection is what eventually builds a Snapshot.

version, err := postgres.GetPostgresVersion(ctx, logger, db)
if err != nil {
return "", fmt.Errorf("error collecting Postgres version: %s", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these removed because we're moving the version check earlier in later patches?

slices.Contains(memberOf, roleByName["rds_superuser"].Oid) ||
slices.Contains(memberOf, roleByName["azure_pg_admin"].Oid) ||
slices.Contains(memberOf, roleByName["cloudsqlsuperuser"].Oid)
connectedAsMonitoringRole := slices.Contains(memberOf, roleByName["pg_monitor"].Oid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think doing this once is great, but doing it in Go does not seem like a good trade-off to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants