Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gitlab_rs_init can stay forever waiting for first node to become primary #311

Open
italovalcy opened this issue Jun 11, 2024 · 0 comments
Open

Comments

@italovalcy
Copy link
Collaborator

Hi,

I've been observing that some CI/CD jobs are not finishing and failing on the MongoDB init script with the following error:

Configuring replica set
Wrote 172.17.0.4:27017,172.17.0.5:27018,172.17.0.6:27019 to /tmp/host_seeds.txt
Waiting for node mongo1 to become primary
Waiting for the first node to be PRIMARY, current: None
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
Waiting for the first node to be PRIMARY, current: SECONDARY
....

The above log keeps going up to the configured timeout of the pipeline.

Running the CI/CD job manually to troubleshoot, I was able to identify two things:

  1. We were supposing to be configuring the "first node" as the PRIMARY (as far as I could understand from the gitlab_rs_init.py script), however we are using the priority i * 10 == 10 to do that, while mongodb replicaset config suggest the usage of higher values to increase the likelihood of a node to become primary (https://www.mongodb.com/docs/manual/reference/replica-configuration/)

A number that indicates the relative likelihood of a replica set member to become the primary.
To increase the likelihood that a member becomes the primary, specify a higher priority value for that member.
To decrease the likelihood that a member becomes the primary, specify a lower priority value for that member.

  1. Sometimes the primary node won't be the first member of the replSetGetStatus command (probably due to the issue 1 above?):
>>> client = MongoClient(hosts[first_node]["host_port"], directConnection=True)
>>> response = client.admin.command("replSetGetStatus")
>>> response["members"][0]
{'_id': 1, 'name': '172.17.0.4:27017', 'health': 1.0, 'state': 2, 'stateStr': 'SECONDARY', 'uptime': 603, 'optime': {'ts': Timestamp(1718093647, 1), 't': 2}, 'optimeDate': datetime.datetime(2024, 6, 11, 8, 14, 7), 'lastAppliedWallTime': datetime.datetime(2024, 6, 11, 8, 14, 7, 339000), 'lastDurableWallTime': datetime.datetime(2024, 6, 11, 8, 14, 7, 339000), 'syncSourceHost': '172.17.0.5:27018', 'syncSourceId': 2, 'infoMessage': '', 'configVersion': 1, 'configTerm': 2, 'self': True, 'lastHeartbeatMessage': ''}
>>> response["members"][1]
{'_id': 2, 'name': '172.17.0.5:27018', 'health': 1.0, 'state': 2, 'stateStr': 'SECONDARY', 'uptime': 541, 'optime': {'ts': Timestamp(1718093647, 1), 't': 2}, 'optimeDurable': {'ts': Timestamp(1718093647, 1), 't': 2}, 'optimeDate': datetime.datetime(2024, 6, 11, 8, 14, 7), 'optimeDurableDate': datetime.datetime(2024, 6, 11, 8, 14, 7), 'lastAppliedWallTime': datetime.datetime(2024, 6, 11, 8, 14, 7, 339000), 'lastDurableWallTime': datetime.datetime(2024, 6, 11, 8, 14, 7, 339000), 'lastHeartbeat': datetime.datetime(2024, 6, 11, 8, 14, 15, 555000), 'lastHeartbeatRecv': datetime.datetime(2024, 6, 11, 8, 14, 16, 101000), 'pingMs': 0, 'lastHeartbeatMessage': '', 'syncSourceHost': '172.17.0.6:27019', 'syncSourceId': 3, 'infoMessage': '', 'configVersion': 1, 'configTerm': 2}
>>> response["members"][2]
{'_id': 3, 'name': '172.17.0.6:27019', 'health': 1.0, 'state': 1, 'stateStr': 'PRIMARY', 'uptime': 541, 'optime': {'ts': Timestamp(1718093647, 1), 't': 2}, 'optimeDurable': {'ts': Timestamp(1718093647, 1), 't': 2}, 'optimeDate': datetime.datetime(2024, 6, 11, 8, 14, 7), 'optimeDurableDate': datetime.datetime(2024, 6, 11, 8, 14, 7), 'lastAppliedWallTime': datetime.datetime(2024, 6, 11, 8, 14, 7, 339000), 'lastDurableWallTime': datetime.datetime(2024, 6, 11, 8, 14, 7, 339000), 'lastHeartbeat': datetime.datetime(2024, 6, 11, 8, 14, 15, 559000), 'lastHeartbeatRecv': datetime.datetime(2024, 6, 11, 8, 14, 15, 561000), 'pingMs': 0, 'lastHeartbeatMessage': '', 'syncSourceHost': '', 'syncSourceId': -1, 'infoMessage': '', 'electionTime': Timestamp(1718093137, 1), 'electionDate': datetime.datetime(2024, 6, 11, 8, 5, 37), 'configVersion': 1, 'configTerm': 2}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant