Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add sshtunneler worker #19317

Open
wants to merge 3 commits into
base: 3.6
Choose a base branch
from
Open

chore: add sshtunneler worker #19317

wants to merge 3 commits into from

Conversation

kian99
Copy link
Contributor

@kian99 kian99 commented Mar 26, 2025

This PR builds on #19285, that PR should land first. It also requires the changes from #19301.

This PR builds on the internal/sshtunneler package and adds it into a worker so that a singleton object can be passed to dependents like the sshserver worker who need to create SSH tunnels to machines.

This PR has roughly 3 components,

  • The worker in internal/worker/sshtunneler that aims to be very simple, providing dependency injection.
  • Changes to the existing sshtunneler facade to support ControllerAddresses and MachineHostKeys methods to retrieve the controller machine's public addresses, and any machines public host keys, respectively. Additionally, some cleanup to this package.
  • A client for the already existing sshtunneler facade in api/controller/sshtunneler.

Additionally some refactoring that was done along the way,

  • To avoid ambiguity between the number of structs/methods that include the term host-key, I've renamed the sshserver's controller facade methods and structs to explicitly mention the word "virtual".
    The client facade for sshclient has the same issues and while I've renamed the RPC structs, I have not changed the naming of the facade to avoid backwards compatibility issues.

In a follow-up PR I will plug the tunneler worker into its dependants.

I suggest reviewing this PR commit-by-commit.

Checklist

  • Code style: imports ordered, good names, simple structure, etc
  • Comments saying why design decisions were made
  • Go unit tests, with comments saying what you're testing
  • Integration tests, with comments saying what you're testing
  • doc.go added or updated in changed packages

QA steps

  1. make install
  2. juju bootstrap lxd test-tunneler --build-agent
  3. Verify logs don't have any errors starting the tunneler worker.

Links

Jira card: JUJU-7549

@jujubot jujubot added the 4.0 label Mar 26, 2025
@kian99 kian99 changed the base branch from main to 3.6 March 26, 2025 10:56
@kian99 kian99 added 3.6 and removed 4.0 labels Mar 26, 2025
@kian99 kian99 marked this pull request as draft March 26, 2025 10:57
@kian99 kian99 force-pushed the tunneler-worker branch 2 times, most recently from e3d4300 to 88fe282 Compare March 26, 2025 13:15
@kian99 kian99 changed the title Tunneler worker chore: add sshTunneler worker Mar 26, 2025
@kian99 kian99 force-pushed the tunneler-worker branch 3 times, most recently from 457ebe9 to 3878b49 Compare March 28, 2025 11:32
@kian99 kian99 mentioned this pull request Mar 28, 2025
5 tasks
@kian99 kian99 changed the title chore: add sshTunneler worker chore: add sshtunneler worker Mar 28, 2025
@kian99 kian99 force-pushed the tunneler-worker branch 4 times, most recently from a2541af to 592009f Compare March 31, 2025 14:38
@kian99 kian99 marked this pull request as ready for review March 31, 2025 15:01
@kian99 kian99 force-pushed the tunneler-worker branch 2 times, most recently from aeb1735 to c7e60fd Compare April 1, 2025 10:18
@kian99 kian99 force-pushed the tunneler-worker branch 3 times, most recently from 8f529b1 to f7fb948 Compare April 8, 2025 16:38
@kian99 kian99 requested a review from hpidcock April 8, 2025 16:39
@@ -45,3 +49,18 @@ func (f *Facade) RemoveSSHConnRequest(arg params.SSHConnRequestRemoveArg) (param
}
return params.ErrorResult{}, nil
}

// ControllerAddresses returns the specified machine's public addresses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a conceptual question: do we want machine agents to ssh to public addresses or private ones? idk which is guaranteed to be reachable from the unit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Using ScopeMatchPublic returns the public addresses as the first priority, then ScopeCloudLocal then ScopeFanLocal, ScopeUnknown. Arguably, ScopeCloudLocal might make more sense if that can reduce latency between the controller and the units. I'll leave this comment unresolved for others to chime in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wallyworld @hpidcock any thoughts on this?

@kian99 kian99 force-pushed the tunneler-worker branch from f7fb948 to 041a970 Compare April 9, 2025 10:03
@@ -253,7 +258,12 @@ func (tt *Tracker) wait(ctx context.Context, recv chan (net.Conn), privateKey go
case conn := <-recv:
// We now have ownership of the connection, so we should close it
// if the SSH dial fails.
sshClient, err := tt.dialer.Dial(conn, defaultUser, privateKey)
//
// Safely ignore the host key since we are connecting through an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid asserting something you can't prove. We know the inbound connection came from the machine agent, not that the port the machine agent connected us to is the sshd of that machine.

I suggest you change the InsertSSHConnRequest facade method to return the expected host keys for the machine we are expecting to connect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, but i wouldn't cram it into the InsertSSHConnRequest method response. Better to have a separate method, no?

Then again, we control the machine agent and if somebody compromised it, we have bigger problems, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one fundamental issue I see here, around how we get the unit's host key. We could have the unit send it to us when it establishes the reverse-tunnel. Then we establish the SSH session using the keys the unit gave us, but that seems pointless.

Normally a client would get a server's host key through some side channel but unless the controller already knows the host keys for machines, we would be relying on the machine telling us its host key right.

@@ -45,3 +49,18 @@ func (f *Facade) RemoveSSHConnRequest(arg params.SSHConnRequestRemoveArg) (param
}
return params.ErrorResult{}, nil
}

// ControllerAddresses returns the specified machine's public addresses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wallyworld @hpidcock any thoughts on this?

@@ -253,7 +258,12 @@ func (tt *Tracker) wait(ctx context.Context, recv chan (net.Conn), privateKey go
case conn := <-recv:
// We now have ownership of the connection, so we should close it
// if the SSH dial fails.
sshClient, err := tt.dialer.Dial(conn, defaultUser, privateKey)
//
// Safely ignore the host key since we are connecting through an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, but i wouldn't cram it into the InsertSSHConnRequest method response. Better to have a separate method, no?

Then again, we control the machine agent and if somebody compromised it, we have bigger problems, right?

Create an tunnel tracker object within a sshtunneler worker.
kian99 added 2 commits April 11, 2025 15:10
When dialling the machine, we verify its host key by using the keys known to the Juju controller.

To avoid ambiguity between the number of structs/methods that include the term host-key, I've renamed the sshserver controller facade methods and structs to explicitly mention the word "virtual".

The client facade for `sshclient` has the same issues but I have not changed the naming of the facade there to avoid backwards compatability issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants