Skip to content

Conversation

@pooknull
Copy link
Contributor

@pooknull pooknull commented Oct 10, 2025

K8SPS-548 Powered by Pull Request Badge

https://perconadev.atlassian.net/browse/K8SPS-548

DESCRIPTION

This PR was originally tried to address rare gr-self-healing test failures, but the initial change did not resolve the issue and was removed.

The PR still contains a small fix for comparePrimaryPurged function. Previously, the function always returned false because the map returned by runSQL used keys such as GTID_SUBTRACT() instead of GTID_SUBTRACT. The function couldn't find GTID_SUBTRACT and returned false

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PS version?
  • Does the change support oldest and newest supported Kubernetes version?

Copilot AI review requested due to automatic review settings October 10, 2025 13:20
@pull-request-size pull-request-size bot added the size/L 100-499 lines label Oct 10, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes rare test failures in the gr-self-healing test by addressing type safety issues and improving error handling in MySQL shell operations. The changes ensure better compatibility with different MySQL shell JSON response formats that could cause intermittent test failures.

  • Changed string type assertions to handle any types from JSON responses
  • Added proper error handling and type checking for SQL result parsing
  • Removed unnecessary blank lines in test files for cleaner formatting

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
cmd/bootstrap/gr/recovery_method_test.go Updated mock SQL runner to use any type and removed extra blank lines in tests
cmd/bootstrap/gr/recovery_method.go Added type assertions and error handling for SQL results, updated function signatures
cmd/bootstrap/gr/group_replication.go Changed SQL result type from string to any, added new primary partition check method

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copilot AI review requested due to automatic review settings October 11, 2025 12:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment on lines 605 to 618
inPrimary, err := shell.checkIfInPrimaryPartition(ctx)
if err != nil {
return errors.Wrap(err, "check if member in primary partition")
}
if !inPrimary {
log.Printf("Instance (%s) is not in primary partition. Starting full cluster crash recovery...", localShell.host)

if err := handleFullClusterCrash(ctx, mysqlshVer); err != nil {
return errors.Wrap(err, "handle full cluster crash")
}

// force restart container
os.Exit(1)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this: handleFullClusterCrash is going to create /var/lib/mysql/full-cluster-crash in this pod and it is going to trigger recovery in operator. but do we need to reboot the whole cluster in this case? if pod is not in the primary partition it needs to be restarted but do we need to touch other pods? probably i am missing something since we have the same check in liveness probe and that would be enough if restarting the pod was enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the only way to handle the situation described in the PR description. Maybe we should store a counter of pod restarts and trigger a full cluster crash if there are too many restarts?

Copilot AI review requested due to automatic review settings October 15, 2025 11:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copilot AI review requested due to automatic review settings October 27, 2025 17:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@pooknull pooknull changed the title K8SPS-548: fix rare gr-self-healing test failures K8SPS-548: fix comparePrimaryPurged function Oct 28, 2025
@pooknull pooknull marked this pull request as ready for review October 28, 2025 11:59
Copilot AI review requested due to automatic review settings October 28, 2025 11:59
@pooknull pooknull requested a review from gkech as a code owner October 28, 2025 11:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@pooknull pooknull requested a review from egegunes October 28, 2025 12:18
@JNKPercona
Copy link
Collaborator

Test Name Result Time
async-ignore-annotations-8-4 passed 00:06:26
async-global-metadata-8-4 passed 00:09:08
async-upgrade-8-0 passed 00:12:20
async-upgrade-8-4 passed 00:12:34
auto-config-8-4 passed 00:24:02
config-8-4 passed 00:16:53
config-router-8-0 passed 00:07:09
config-router-8-4 passed 00:07:28
demand-backup-minio-8-0 passed 00:20:25
demand-backup-minio-8-4 passed 00:20:43
demand-backup-cloud-8-4 passed 00:20:15
async-data-at-rest-encryption-8-0 passed 00:13:55
async-data-at-rest-encryption-8-4 passed 00:13:25
gr-global-metadata-8-4 passed 00:13:44
gr-data-at-rest-encryption-8-0 passed 00:14:29
gr-data-at-rest-encryption-8-4 passed 00:14:38
gr-demand-backup-minio-8-4 passed 00:12:25
gr-demand-backup-cloud-8-4 passed 00:21:16
gr-demand-backup-haproxy-8-4 passed 00:09:42
gr-finalizer-8-4 passed 00:06:11
gr-haproxy-8-0 passed 00:04:38
gr-haproxy-8-4 passed 00:04:01
gr-ignore-annotations-8-4 passed 00:05:30
gr-init-deploy-8-0 passed 00:09:45
gr-init-deploy-8-4 passed 00:08:58
gr-one-pod-8-4 passed 00:05:54
gr-recreate-8-4 passed 00:17:48
gr-scaling-8-4 passed 00:07:40
gr-scheduled-backup-8-4 passed 00:16:23
gr-security-context-8-4 passed 00:09:52
gr-self-healing-8-4 passed 00:28:01
gr-tls-cert-manager-8-4 passed 00:09:19
gr-users-8-4 passed 00:05:20
gr-upgrade-8-0 passed 00:09:08
gr-upgrade-8-4 passed 00:09:21
haproxy-8-0 passed 00:08:22
haproxy-8-4 passed 00:08:13
init-deploy-8-0 passed 00:06:48
init-deploy-8-4 passed 00:05:47
limits-8-4 passed 00:06:27
monitoring-8-4 passed 00:17:28
one-pod-8-0 passed 00:05:28
one-pod-8-4 passed 00:05:20
operator-self-healing-8-4 passed 00:11:22
pvc-resize-8-4 passed 00:07:50
recreate-8-4 passed 00:12:28
scaling-8-4 passed 00:10:51
scheduled-backup-8-0 passed 00:17:23
scheduled-backup-8-4 passed 00:16:19
service-per-pod-8-4 passed 00:06:15
sidecars-8-4 passed 00:04:28
smart-update-8-4 passed 00:09:15
storage-8-4 passed 00:04:01
telemetry-8-4 passed 00:06:06
tls-cert-manager-8-4 passed 00:10:12
users-8-0 passed 00:07:36
users-8-4 passed 00:07:34
version-service-8-4 passed 00:11:56
Summary Value
Tests Run 58/58
Job Duration 01:53:48
Total Test Time 10:36:42

commit: 703ecc3
image: perconalab/percona-server-mysql-operator:PR-1125-703ecc3e

sub, err := strconv.Atoi(v)
if err != nil {
return false
s, ok := v.(float64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why float and not int? as far as i remember SELECT GTID_SUBTRACT('%s', '%s') = '' returns 0 or 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L 100-499 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants