Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IBM] VPN enablement and integration with Azure #334

Merged
merged 63 commits into from
Jun 24, 2024

Conversation

Cohen-J-Omer
Copy link
Collaborator

This PR provides VPN support for the IBM cloud plugin, including necessary adjustments to the controller and Azure plugin, to support BGP disabled VPN connections.
This PR is an extension of PR#179, that was automatically closed due to a project name change.
Future PR will make the necessary adjustments to the IBM and GCP plugins, supporting IBM-GCP connectivity.

Discussion Topics:

  • Integration test file still contains the ibm build tag, as Github's workflow wouldn't be able to run it without IBM account credentials. When the issue is handled, I'll move the IBM-Azure test to multicloud_test.go.
  • I aim to add an IBM RunPingConnectivityCheck(), equivalent to Azure's. This method will require ssh login (a key is produced by auth.go), as I'm unaware of a network watcher equivalent on IBM.

@Cohen-J-Omer Cohen-J-Omer requested a review from a team as a code owner May 22, 2024 17:05
Copy link
Collaborator

@seankimkdy seankimkdy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @Cohen-J-Omer! I have one major thing and a couple minor things overall.

Major

  • Make the plugin_integration_test.go and plugin_manual_test.go run as part of the CI/CD pipeline. This is a must do.

Minor

  • Better error messages instead of returning err blindly (helps debug)
  • Doc updates

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot mentions of address spaces. Can you clarify what these "address spaces" are? Is this the address space of the resource, VPN, etc.? And incorporate that into naming protobuf fields.

Copy link
Collaborator Author

@Cohen-J-Omer Cohen-J-Omer May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seankimkdy These could be either subnets of the VPC (Vnet on Azure) containing the VM, or the VM's private IP address, when the former is not available. Do you have a more suitable name in mind?

defer azure.TeardownAzureTesting(azureSubscriptionId, azureResourceGroupName)
}

azureNamespace := "test" + uuid.NewString()[:4]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why randomize the namespace?

Copy link
Collaborator Author

@Cohen-J-Omer Cohen-J-Omer May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until PR-290 is merged, resource cleanup doesn't take place for users without resource group privileges. Randomized namespace and VM names allows us to run the tests consecutively.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is no longer needed?

// NOTE: if user doesn't have resource group privileges set azureResourceGroupName to match an existing resource group
var azureResourceGroupName string = "challenge-1377"

func TestMain(m *testing.M) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this function for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It allows me to perform the necessary setup before tests execution.

@@ -156,6 +158,8 @@ func (c *CloudClient) translateSecurityGroupRule(
return nil, nil
}

// returns SecurityGroupRule object converted from an abstract rule whose concrete
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// returns SecurityGroupRule object converted from an abstract rule whose concrete

if err != nil {
fmt.Println(err.Error())
}
go func(){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we move this into go routine?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In internal/cli/startup/startup.go : its already within a go routine

	go func() {
		tagservice.Setup(6379, e.tagPort, e.clearKeys)
	}()

Copy link
Collaborator Author

@Cohen-J-Omer Cohen-J-Omer May 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seankimkdy As discussed on Discord, tagservice and kvstore service now run on separate go routine. What's your opinion on the matter?

@praveingk
Copy link
Collaborator

@Cohen-J-Omer I tried building the PR in my local workspace, It seems to build. I am not sure why build is failing here.

Copy link
Collaborator

@praveingk praveingk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cohen-J-Omer Let me know when the review comments are fixed, and I can go over them.

@Cohen-J-Omer
Copy link
Collaborator Author

Cohen-J-Omer commented Jun 4, 2024

@Cohen-J-Omer Let me know when the review comments are fixed, and I can go over them.

I believe I addressed all your comments.

Copy link
Collaborator

@smcclure20 smcclure20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work! Just a few comments.

@@ -152,6 +152,20 @@ func (s *azurePluginServer) AddPermitListRules(ctx context.Context, req *paragli
return nil, err
}

// Get subnet address space
subnetAddressPrefixes := []string{}
for _, subnet := range resourceVnet.Properties.Subnets {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can take only the default subnet. For instance, kubernetes clusters get their own subnets. I think this should use all address spaces in the virtual network like on line 203.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed @smcclure20.
I'm currently working on the solution. It incorporates an addition of a method to the cloud plugin interface(paraglider.proto), and consequent changes to the controller(orchestartor) and the 3 cloud plugins.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smcclure20 i've made the necessary changes to support the request.

utils.Log.Printf("Failed to get subnets for VPN deployments with error: %+v", err)
return nil, err
}
if len(subnets) == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Azure, we used a design where the vpn gateway had it own subnet in the vpc so that it could be separated from the other resources cleanly. Might that be worth doing here?

// if rule exists, returns false.
// - routing conflict - occurs when 2 rules share the same zone, destination and priority.
// - rule duplication - if specified rule matches a rule on the following fields: destination, zone and nextHopConnection.
func (c *CloudClient) getAvailablePriority(routeData *vpcv1.CreateVPCRoutingTableRouteOptions) (bool, int64, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I can understand (since priorities in routing tables is not common), what is this used for? Behavior other than longest-prefix match? Can we just use on priority across the board?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • General info: Routing table routes are designated with a default priority value of 2 upon creation, unless provided with a value in the range of [0, 4].
    When a set of rules share the same prefix, i.e., direct traffic to the same destination, the rule with the lowest priority value will be used.
  • In our project's use-case priorities are meant to distinguish two routes that share the same {zone, destination, priority}, but are using a different VPN connection (tunnel).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would that occur? If all the {zone, destination, priority} is the same, why would the tunnel be different?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smcclure20, IBM and Azure (might be gcp as well) cloud plugins are implementing active-active VPN gateways, by using 2 tunnels for redundancy/high availability.

}

// add unique rules to security group
for _, ibmRule := range ibmRules {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause weird behavior. From the user's perspective, I can create 2 rules with all the same parameters but different names. And I will refer to each rule by its respective name. So here, both names would be pointing to the same underlying rule since they get deduplicated. I don't think we need this step.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @smcclure20, I'm strongly against skipping this step.

  • IBM's security group rules don't have a name field, nor are they taggable, so your example creates identical entities from the cloud's perspective. Your example input (two identical rules with different tags) will not lead to the creation of 2 rules, as the second insertion attempt will be refused.
  • Removing the unique property check, will also remove the idempotent property from AddPermitListRules, cluttering the security group with duplicates.
    To avoid this implication, the user will have to manually look up existing permit rules outside paraglider (because it doesn't support resource fetchers), before invoking this function via glide, shifting responsibilities to users and breaking their natural work flow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cohen-J-Omer:
Correct me if I am wrong: If I add two rules with same intent of allowing ping between host A and host B, but, with different names. The second rule will fail with Rule %+v already exists for security group ID %v.

@smcclure20 Is this behavior expected? since two identical rules are not allowed. If the above behavior is not expected, we can have two options :

  1. The second rule name would override the first rule name, and only one rule is added in the security group.
  2. The second rule is added, But Both rule names will point to same rule ID (underlying rule in the security group). We have to handle the deletion and modification correctly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is preferred to fail in that case since the cloud can't support it and option 2 requires too much work on our end imo.

@Cohen-J-Omer : There is no idempotence property any more though, at least not when defining the identity of the rule as it properties. Rules, from the paraglider perspective, are identified by their names. And if an user tries to create a rule with the same name as one that exists, it should update that rule (or leave it the same if nothing has changed). However, if there are two rules with the same parameters but different names, this should fail for the reasons stated above. So I still don't see a need for deduplication on our end.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, That makes sense.
@smcclure20 I have created #389 and assigned myself based on the discussion, since this is a day 0 code, and not related to this PR :)

defer azure.TeardownAzureTesting(azureSubscriptionId, azureResourceGroupName)
}

azureNamespace := "test" + uuid.NewString()[:4]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is no longer needed?

@Cohen-J-Omer Cohen-J-Omer requested a review from praveingk June 8, 2024 22:34
praveingk and others added 2 commits June 11, 2024 11:27
Copy link
Collaborator

@praveingk praveingk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Looks good.. Before merging the PR, we would need to fix the test, either disable the integration test until we have IBM Cloud credentials embedded in gihub actions, or wait for it to be merged.
cc: @smcclure20, @seankimkdy

Copy link
Collaborator

@smcclure20 smcclure20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Cohen-J-Omer, there are still unit tests failing. I am fine with merging with the integration tests failing for now until cloud credits are resolved, but the unit tests should be passing.

Also, could you disable the integration tests some how (ex. remove the integration tag at the top of the file) so that it won't break the general build when we merge?

"github.com/stretchr/testify/require"
)

var resourceGroupID = ibmCommon.GetIBMResourceGroupID()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used and is causing the test to fail since the env variable isn't set. The unit tests should be able to run regardless of whether the cloud creds, etc are set up

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @smcclure20,

  • resourceGroupID is used in the cleanup function TestCleanup. IDE might have failed to pick it up due to the build tag.
  • TestCleanup was removed to avoid needlessly invoking it via the make test workflow.
  • Unit tests were decoupled from the PARAGLIDER_IBM_RESOURCE_GROUP_ID environment var.

@smcclure20 smcclure20 self-requested a review June 24, 2024 23:52
@smcclure20 smcclure20 merged commit cf8041d into paraglider-project:main Jun 24, 2024
6 of 8 checks passed
aviweit pushed a commit to aviweit/paraglider that referenced this pull request Jul 9, 2024
* Fix: bug when translating icmp SG rule

* [IBM] Initial VPN gateway support

* CreateRouteBasedVPN now returns VPN if already exists in the namespace&region

* [Fix]  set consistent rules values for hash comparison

* Add remoteAddress fields to interface & adjust implementation

* Added flag requirement for project teardown function

* Create custom IBM IKE/IPsec policies compatible with Azure

* [FIX] reintroduce unit test from PR#165 to handle file renaming conflict

* Adjusted manual test for unit test of PR#165

* Support inter-cloud connectivity via `AddPermitListRules`

* Azure adaptations for IBM VPN

* Controller adaptations for IBM VPN

* Interface adaptations for IBM VPN

* Updated docs and test messages

* Modify route priority assignment to accommodate more routes

* Adjust IBM plugin to interface changes

* Added integration tests

* GetNumVpnConnections support IBM-Azure connection

* update mod and sum files

* remove tag search log message

* sdk test adjusted for sgID instead of name

* Accommodate region reliant IBM's route based VPN gateway

* update variables and docs due to project name change

* - Add permit list and vpn integration tests
- Update tests to post rebase implementation

* Adjust plugin's VPN methods to operate without region info

* update mod and sum files

* run tagging and kv servers on a separate go routine

* update README with testing segment

* - Add integration tests
- Update manual tests

* adjustments to interface updates

* removed outdated notes for integration tests

* remove internal testing file

* style: formatting, linting and minor doc updates

* Update README

* Add instructions to prerequisites.rst

* remove manual_test and transfer test variables to integration test

* acquire api key env var PARAGLIDER_IBM_API_KEY instead of file

* integrate use of INVISINETS_TEST_PERSIST

* update logs and docs

* replace hardcoded segment in `GetZonesOfRegion`with an API call

* [Fix] Icmp fields consistency for all rules

* replace `azureResourceGroupName` with  env var `PARAGLIDER_AZURE_RESOURCE_GROUP`

* docs(IBM server&sdk): add log messages

* docs(sdk) update docs of security group rule translation methods

* docs(sdk): replace cluttering service response with transaction id.

* build(sdk) update import of go-sdk-core module on transit_gateway.go

* style: move logs to error clauses

* style(paraglider.proto): replace camel case with underscores

* style(sdk): rename TerminateParagliderDeployments

* style(Azure plugin): move `bgpStatus` outside of loop

* test(IBM): add VPN integration test

* test(IBM): remove resource group privilege verifications thanks to PR#290 and add a flag for Azure Network Watcher

* docs(IBM): update code documentation

* fix(VPN): support changes to IBM's vpc-go-sdk module

* test(multicloud): move & adjust `TestMulticloudIBMAzure` to `mutlcloud_test.go`

* Fix gomod entries

Signed-off-by: Pravein-Govindan-Kannan <[email protected]>

* feat(interface&plugins): incorporate entire vpc/vnet address space to VPN connectivity

* style(Interface): rename method `GetResourceSubnetsAddress` to `GetNetworkAddressSpaces`

* fix(IBM unit tests): replace permit list rule targets with public addresses to avoid error in `GetPermitListRulePeeringCloudInfo`

* test(IBM): temporarily disable IBM integration tests

* refactor: move`GetIBMResourceGroupID` to `TerminateParagliderDeployments`

* test: remove cleanup function from unit tests

* tests: retrieve xCorrelationId header safely (missing in testing)

---------

Signed-off-by: Pravein-Govindan-Kannan <[email protected]>
Co-authored-by: Pravein-Govindan-Kannan <[email protected]>
Signed-off-by: Avi Weit <[email protected]>

Signed-off-by: Avi Weit <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants