-
Notifications
You must be signed in to change notification settings - Fork 338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix syncEgressFirewall (truncate ACL names) and SetupMaster (stop recreating cluster-wide PGs) #3466
Conversation
This commit adds a test to showcase that since syncEgressFirewall isn't calling libovsdbops.BuildACL directly, we are not truncating ACL names. Note that we really need ovn-org/libovsdb#338 for our test server to start screaming for long names. Signed-off-by: Surya Seetharaman <[email protected]>
e42cecb
to
f752ec4
Compare
This commit ensures we truncate names as a precaution also in CreateOrUpdateACLsOps so that our bases are covered since not all code snippets call BuildACL directly Signed-off-by: Surya Seetharaman <[email protected]>
f752ec4
to
7bfe50e
Compare
I think there is yet another bug, perhaps introduced during 4.13 when @npinaeva moved the EF rules to port groups. When I restart OVNK master I see nothing is happening in sync egress firewall:
In the function, it should print every single egress firewall found when it tries to find via predicate:
However, it finds nothing, which means no ACLs actually exist after restart. Even though there was one:
So I notice in the logs this ACL was deleted, because the port group was updated to have no ACLs....
Now I see in the port group create code:
port group ops code includes updating ACL list:
We need to fix this too. Do you mind including it in this PR please @tssurya ? |
there is one more acl that needs unique name, and includes namespace name https://github.com/ovn-org/ovn-kubernetes/blob/master/go-controller/pkg/ovn/multicast.go#L95-L97 |
@trozet your bug can be fixed separately as all acls will be recreated on restart, the only problem is that there will be no egress firewall acls for some time during sync |
yup fixed this, if an existing clusterRtrPG or clusterPG exists then no need to re-update/re-create with empty acls/ports there
fix seems to work, I'll add this as a new commit. |
In SetupMaster, we always call CreateOrUpdatePortGroupsOps with empty ACLs and PGs for the cluster-wide port group and cluster-wide-router-PG. This is disruptive during upgrades since momentarily all efw ACLs and multicast ACLs will be wiped out. This commit changes this to first check if the PG already exists, if then no need to do anything. Each of those features are responsible for ensuring ACLs, Ports are good on those PGs they own. NOTE: This bug was an issue for multicast and started being an issue for egf from ovn-org@bd29f41 Before that we didn't have ACLs on cluster wide PG. Signed-off-by: Surya Seetharaman <[email protected]>
After talking with Nadia, we decided to add a namespaceKey to extID func to keep things unique and comply with isEquivalentACL match criteria. Will do a new commit for this one too. |
#3470 keeping multicast acl issue separate from EFW. |
if pg == nil { | ||
// we didn't find an existing clusterPG, let's create a new empty PG (fresh cluster install) | ||
// Create a cluster-wide port group that all logical switch ports are part of | ||
pg := libovsdbops.BuildPortGroup(types.ClusterPortGroupName, types.ClusterPortGroupName, nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, but cleaning up ports from this port group even when it exists may be intended, since there is no cleanup for ports AFAIK. So we may need some new sync function for nodes to delete possibly stale ports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea that was also a concern I had but couldn't really find if this was intended or something that we just did. I hope the ACLs are synced elsewhere in each of these features.
I'd Hope when node add/dels happen we do remove these ports from these port groups. Also these mp0 ports and router ports are not supposed to change that frequently in real envs.
Regardless a cleanup on syncNodes might be a good part for syncing things when nodes come up and ensuring we are keeping a clean list? -> since risk is less I could do this in a FUP if it makes sense
@trozet -> WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we saw delete node will remove the LSP from the port group so the chances of leaking are small and shouldn't have meaningful impact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trozet checked a node deletion, we do remove k8s*
ports when a node goes away, that automatically removes it from the PG. So we are good here.
This commit ensures we truncate names as a precaution
also in CreateOrUpdateACLsOps so that our bases are
covered since not all code snippets call BuildACL
directly
NOTE: The first commit adds a test to showcase that we actually weren't truncating. Test server is not smart enough to scream like real ovsdb server so it won't complain :/ See ovn-org/libovsdb#338.
Second commit does the fix and changes the test case