Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IP configuration for multicard instances #2031

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

nkvetsinski
Copy link
Contributor

Issue #, if available:

Description of changes:

This PR is a followup from a previous one, where we decided to change the approach and let nodeadm generate .network files that will be handled by systemd-networkd service.

In this PR, nodeadm will create /etc/systemd/network/80-card${index}.network files for each non 0 indexed card. It will skip cards that are 0 indexed or non 0 cards that don't have IP configured from EC2.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done

I tested pod-pod networking and also utilizing the non 0 indexed interfaced from pods that were running in the host network namespace. Here are some test results:

[root@ip-192-168-159-103 bin]# networkctl
IDX LINK           TYPE     OPERATIONAL SETUP
  1 lo             loopback carrier     unmanaged
  2 ens32          ether    routable    configured <-- card 0
  3 ens65          ether    routable    configured <-- card 1
  4 ens129         ether    routable    configured <-- card 2
  5 ens161         ether    routable    configured <-- card 3
  6 eni5f7fe46578c ether    degraded    unmanaged
  7 ens33          ether    routable    unmanaged
  8 enidbaf41213d1 ether    degraded    unmanaged
  9 enib0d66203618 ether    degraded    unmanaged
 10 enife15ebf23d2 ether    degraded    unmanaged
 11 eniefd28430286 ether    degraded    unmanaged
 12 eni936a70cb9fa ether    degraded    unmanaged

12 links listed.
[root@ip-192-168-159-103 bin]# networkctl status ens65 --no-pager
● 3: ens65                      
                     Link File: /usr/lib/systemd/network/99-default.link
                  Network File: /etc/systemd/network/80-card1.network
                         State: routable (configured)
                  Online state: online
                          Type: ether
                          Path: pci-0000:20:01.0
                        Driver: ena
                        Vendor: Amazon.com, Inc.
                         Model: Elastic Network Adapter (ENA)
             Alternative Names: enp32s1
              Hardware Address: 0e:ff:c1:01:cb:df
                           MTU: 9001 (min: 128, max: 9216)
                         QDisc: mq
  IPv6 Address Generation Mode: eui64
      Number of Queues (Tx/Rx): 96/96
                       Address: 192.168.133.151 (DHCP4 via 192.168.128.1)
                                fe80::cff:c1ff:fe01:cbdf
                           DNS: 192.168.0.2
                Search Domains: us-west-2.compute.internal
             Activation Policy: up
           Required For Online: yes
               DHCP4 Client ID: IAID:0x5430a1e6/DUID

Oct 30 16:10:56 localhost systemd-networkd[24170]: ens65: Configuring with /usr/lib/systemd/network/80-ec2.network.
Oct 30 16:10:57 localhost systemd-networkd[24170]: ens65: Link UP
Oct 30 16:10:57 localhost systemd-networkd[24170]: ens65: Gained carrier
Oct 30 16:10:57 localhost systemd-networkd[24170]: ens65: DHCPv4 address 192.168.133.151/18, gateway 192.168.128.1 acquired from 192.168.128.1
Oct 30 16:10:57 localhost systemd-networkd[24170]: ens65: Gained IPv6LL
Oct 30 16:10:57 localhost systemd-networkd[24170]: ens65: DHCPv6 address 2600:1f14:2322:ad02:a01:6942:ea2a:ea50/128 (valid for 7min 29s, preferred for 2min 19s)
Oct 30 16:11:18 ip-192-168-159-103.us-west-2.compute.internal systemd-networkd[24170]: ens65: Reconfiguring with /etc/systemd/network/80-card1.network.
Oct 30 16:11:18 ip-192-168-159-103.us-west-2.compute.internal systemd-networkd[24170]: ens65: DHCP lease lost
Oct 30 16:11:18 ip-192-168-159-103.us-west-2.compute.internal systemd-networkd[24170]: ens65: DHCPv6 lease lost
Oct 30 16:11:19 ip-192-168-159-103.us-west-2.compute.internal systemd-networkd[24170]: ens65: DHCPv4 address 192.168.133.151/18, gateway 192.168.128.1 acquired from 192.168.128.1

Route tables:

[root@ip-192-168-159-103 bin]# ip route show table main
default via 192.168.128.1 dev ens32 proto dhcp src 192.168.159.103 metric 1024 
192.168.0.2 via 192.168.128.1 dev ens32 proto dhcp src 192.168.159.103 metric 1024 
192.168.128.0/18 dev ens32 proto kernel scope link src 192.168.159.103 metric 1024 
192.168.128.1 dev ens32 proto dhcp scope link src 192.168.159.103 metric 1024 
192.168.146.27 dev enife15ebf23d2 scope link 
192.168.154.58 dev enib0d66203618 scope link 
192.168.168.137 dev eni936a70cb9fa scope link 
192.168.178.167 dev eni5f7fe46578c scope link 
192.168.187.31 dev eniefd28430286 scope link 
192.168.191.152 dev enidbaf41213d1 scope link 

[root@ip-192-168-159-103 bin]# ip rule show
0:    from all lookup local
512:    from all to 192.168.178.167 lookup main
512:    from all to 192.168.191.152 lookup main
512:    from all to 192.168.154.58 lookup main
512:    from all to 192.168.146.27 lookup main
512:    from all to 192.168.187.31 lookup main
512:    from all to 192.168.168.137 lookup main
1024:    from all fwmark 0x80/0x80 lookup main
32763:    from 192.168.133.151 lookup 1003 proto static
32764:    from 192.168.169.102 lookup 1002 proto static
32765:    from 192.168.159.90 lookup 1001 proto static
32766:    from all lookup main
32767:    from all lookup default

[root@ip-192-168-159-103 bin]# ip route show table 1001
default via 192.168.128.1 dev ens129 proto dhcp src 192.168.159.90 metric 1024 
192.168.0.2 via 192.168.128.1 dev ens129 proto dhcp src 192.168.159.90 metric 1024 
192.168.128.0/18 dev ens129 proto dhcp scope link src 192.168.159.90 metric 1024 
192.168.128.1 dev ens129 proto dhcp scope link src 192.168.159.90 metric 1024
 
[root@ip-192-168-159-103 bin]# ip route show table 1002
default via 192.168.128.1 dev ens161 proto dhcp src 192.168.169.102 metric 1024 
192.168.0.2 via 192.168.128.1 dev ens161 proto dhcp src 192.168.169.102 metric 1024 
192.168.128.0/18 dev ens161 proto dhcp scope link src 192.168.169.102 metric 1024 
192.168.128.1 dev ens161 proto dhcp scope link src 192.168.169.102 metric 1024
 
[root@ip-192-168-159-103 bin]# ip route show table 1003
default via 192.168.128.1 dev ens65 proto dhcp src 192.168.133.151 metric 1024 
192.168.0.2 via 192.168.128.1 dev ens65 proto dhcp src 192.168.133.151 metric 1024 
192.168.128.0/18 dev ens65 proto dhcp scope link src 192.168.133.151 metric 1024 
192.168.128.1 dev ens65 proto dhcp scope link src 192.168.133.151 metric 1024 

See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.

@cartermckinnon
Copy link
Member

@M00nF1sh can you take a look at this?

@Pavani-Panakanti
Copy link
Contributor

LGTM

@@ -89,11 +96,79 @@ func (a *networkingAspect) ensureEKSNetworkConfiguration(cfg *api.NodeConfig) er
return nil
}

func (a *networkingAspect) ensureMulticardNetworkConfiguration(cfg *api.NodeConfig) error {
var networkRestartRequired bool
routeTableId := 1001
Copy link
Member

@M00nF1sh M00nF1sh Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this routeTableId chosen to be starts with 1001 for those multi-card ENIs? is it chosen to align with our AL2? Maybe better to align with AL2023's default behavior if there is no specific reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose it to align with AL2.

Copy link
Member

@M00nF1sh M00nF1sh Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm ok with the 1000 route table thing, but prefer to align with AL2023's default behavior whenever possible. e.g. what's the route table routes for those enis set to if launch on a normal AL2023 instead of EKS's ones. i have no idea why 1000 were chosen for EKS Al2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of have a hacky "1000" route table, unless there are reasons. the major concern i have is this hard-coded 1000 might conflicts with other products who uses secondary route tables(vpc-cni for example). So the closer it aligns with AL2023's default behavior, the less surprise to customers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, the reason for having the routes in separate route tables was because I was aligning with what we have in AL2. However nowadays CNI actually skips non-zero cards so we'll go with the approach of adding the routes for non-zero cards in the main routing table.

networkRestartRequired = true
}

if networkRestartRequired {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe we should combine the "reloadNetworkConfigurations" with ensureEKSNetworkConfiguration to do reload once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can restart the network in the Setup function, after we configure both primary and multicard interfaces.

continue
}

networkInterfaceConfName := fmt.Sprintf("80-card%d.network", card.CardIndex)
Copy link
Member

@M00nF1sh M00nF1sh Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, it should be possible to have multiple ENIs per network card(with different deviceIndex), have we tested this behavior?
seems it won't work if you have multiple ENI on a network card

Copy link
Contributor Author

@nkvetsinski nkvetsinski Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. The problem we face is just the name of the file right? I can add the deviceIndex in the name: 80-card%d-%d.network, or we can use the mac address too. Do you have any preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd like to avoid couple to cardIndex&deviceIndex(it's complicated, see https://github.com/amazonlinux/amazon-ec2-net-utils/blob/e01f53f278eeb13bbdc856da921584944c825286/lib/lib.sh#L344)

I prefer to use 70-<eni-xxxx>.network where eni-xxxx is the eni id, you can get it from ec2Metadata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants