Skip to content

Add Support for CAPI IPAM Contract #671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
schrej opened this issue Jul 14, 2022 · 18 comments · Fixed by #769
Closed

Add Support for CAPI IPAM Contract #671

schrej opened this issue Jul 14, 2022 · 18 comments · Fixed by #769
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue is ready to be actively worked on.

Comments

@schrej
Copy link
Member

schrej commented Jul 14, 2022

User Story

As a operator I would like to use the new CAPI IPAM contract with metal3 in order to integrate with different IPAM solutions.

Detailed Description

The CAPI IPAM contract is now implemented and released as part of CAPI 1.2.0-rc.0.

In order to support it, we'll need to be able to reference IP Pools implemented by various providers. Looking at the current API, I think we can easily do so by extending FromPool in the DataTemplate with apiGroup and kind parameters. The new fields should either default to the metal3-ip-address-manager types, or just stay empty. A new reference would then look like this:

- key: "something"
  name: "some-pool"
  apiGroup: "ipam.cluster.x-k8s.io"
  kind: "InClusterIPPool"

Since only optional fields are added, the change would be fully backwards compatible and doesn't require a new API version.

In code we can then differenciate based on apiGroup and kind, and either create a metal3-ip-address IPAddressClaim, or a CAPI one (ipam.cluster.x-k8s.io).

Anything else you would like to add:

I'm happy to work on this.

/kind feature

@metal3-io-bot metal3-io-bot added kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Jul 14, 2022
@schrej
Copy link
Member Author

schrej commented Jul 14, 2022

As I had a hunch that it wasn't that simple when I had a look last time, I double checked, and it isn't. For the more relevant parts, e.g. NetworkDataIPv4, the reference to the pool is just a single string field called ipAddressFromIPPool. So here we would need to add a new field that allows to set a reference. The new field should then probably also allow to reference metal3-ip-address-manager pools, so users don't get confused as much.

I'd propose something simple like fromPool, or poolRef.

@Rozzii
Copy link
Member

Rozzii commented Jul 20, 2022

/triage accepted
Thank you for opening this issue @schrej we would really like to include this feature!

@metal3-io-bot metal3-io-bot added triage/accepted Indicates an issue is ready to be actively worked on. and removed needs-triage Indicates an issue lacks a `triage/foo` label and requires one. labels Jul 20, 2022
@Rozzii
Copy link
Member

Rozzii commented Jul 20, 2022

/assign @schrej

@fmuyassarov
Copy link
Member

As discussed in the last community meeting, I'm +1 for the change.
For the new fields discussed in the description, perhaps defaulting them to metal3 ipam types would help to understand it better which ipam is being used instead of leaving it empty and assuming that metal3 ipam is in use.
/cc @kashifest any thoughts on the addition? I don't see any blockers and @schrej is willing to contribute.

@kashifest
Copy link
Member

/cc @kashifest any thoughts on the addition? I don't see any blockers and @schrej is willing to contribute.

+1 for this. @schrej Thanks a lot for working on this.

@schrej
Copy link
Member Author

schrej commented Aug 17, 2022

The current IPAM contract does not support specifying DNS servers. What was the reason to specify that on the IPPool/IPAddress resources? Do we still need that?

From what I can see, the current implementation would allow to reference a pool to get a Gateway from, even if the server machine has no interface that references the same pool. An IP would still be allocated from the pool, and only the Gateway would be used. Same for DNS.
Should we change the logic to only allow to reference pools for Gateway information which are also referenced as part of an interface or as part of metadata?

@Rozzii
Copy link
Member

Rozzii commented Aug 31, 2022

@furkatgofurov7 @lentzi90

@schrej
Copy link
Member Author

schrej commented Aug 31, 2022

Maybe to make this a little clearer: Practically it only makes sense to allocate an IP address from a pool if it's either referenced from m3dt.Spec.MetaData.IPAddressesFromPool or m3dt.Spec.NetworkData.Networks.{IPv4,IPv6}. If no such reference exists, the address itself remains unused.
But with the current implementation, any reference to an IP pool (interestingly even from something like m3dt.Spec.NetworkData.Networks.IPv4DHCP, which is contradicting the ip-address-manager a bit) will result in an IP getting allocated.

My proposal is to change that behaviour to only allocate addresses when the address will be used. Any other references are only valid when the address is actually used.

A workaround for people abusing the ip-address-manager to only manage DNS or Gateways would be to allocate an address for metadata, and then just not use it.

@lentzi90
Copy link
Member

lentzi90 commented Sep 7, 2022

+1 for changing the behavior to only allocate addresses when the address will be used.

Perhaps a bit premature, but any thoughts on how one would "upgrade" an existing cluster to the CAPI IPAM?

@furkatgofurov7
Copy link
Member

The current IPAM contract does not support specifying DNS servers. What was the reason to specify that on the IPPool/IPAddress resources? Do we still need that?

I had a discussion about this with the person (left the project) who mostly worked on IPAM, there seems to be no specific reason for that and it was mostly to be close as possible to a DHCP answer. And also, we do not think we need that.

From what I can see, the current implementation would allow to reference a pool to get a Gateway from, even if the server machine has no interface that references the same pool. An IP would still be allocated from the pool, and only the Gateway would be used. Same for DNS.
Should we change the logic to only allow to reference pools for Gateway information which are also referenced as part of an interface or as part of metadata?

Makes sense to me, +1

@schrej
Copy link
Member Author

schrej commented Sep 7, 2022

Regarding the behaviour change regarding IP allocation:
What's metal3's backwards compatibility policy, and how will we handle that? Can we just change that with a minor version because "it doesn't make sense"? Do we need a new API version? Does that mean we need to wait for v2?

Rough thoughs regarding upgrades (would require a node roll)

  • import existing Claims and IPs to a CAPI compatible ipam provider (e.g. the in-cluster provider)
  • create a new template that references the pool the IPs were imported in
  • that will trigger a node roll which allocates IPs from the new pool
  • remove the imported claims of old nodes to free up the IPs in the new pool

@furkatgofurov7
Copy link
Member

furkatgofurov7 commented Sep 13, 2022

Regarding the behaviour change regarding IP allocation: What's metal3's backwards compatibility policy, and how will we handle that? Can we just change that with a minor version because "it doesn't make sense"? Do we need a new API version? Does that mean we need to wait for v2?

Rough thoughs regarding upgrades (would require a node roll)

* import existing Claims and IPs to a CAPI compatible ipam provider (e.g. the in-cluster provider)

* create a new template that references the pool the IPs were imported in

* that will trigger a node roll which allocates IPs from the new pool

* remove the imported claims of old nodes to free up the IPs in the new pool

We are planning to discuss these in the upcoming community meeting (14/09/22) a bit more in detail and also get feedback from the community.

@schrej
Copy link
Member Author

schrej commented Sep 14, 2022

Decision from from community meeting on 14. Sep. 2022:

  • we're not going to change existing allocation behaviour related to ip-address-manager
  • migration should be done with a transparent proxy mode if ip-address-manager to ensure smooth operation

@furkatgofurov7
Copy link
Member

Maybe for reference and not to forget about it, here is 4d271bc the feature implementation using preAllocation, where claim name constructed from BMH name + ippool name if feature is enabled, otherwise keeps the current behaviour (dataname + ippool name)

@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 13, 2022
@schrej
Copy link
Member Author

schrej commented Dec 14, 2022

/remove-lifecycle stale

@metal3-io-bot metal3-io-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 14, 2022
@metal3-io-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues will close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@metal3-io-bot metal3-io-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 14, 2023
@furkatgofurov7
Copy link
Member

/remove-lifecycle stale

@metal3-io-bot metal3-io-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants