Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: assign units to machines #19459

Open
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

SimonRichardson
Copy link
Member

@SimonRichardson SimonRichardson commented Apr 8, 2025

When creating an application or adding units for a given application, we're going to use a strategy different from the legacy state implementation. We know the charm uuid for the unit because everything is statically known upfront. This means we can have non-nullable foreign keys in the unit, which points to the charm uuid. We no longer check for the implicit setting of a charm uuid to indicate that the unit has come online. We can if required use the presence table for that. Another advantage of this changeset is that we can then remove the unitassigner worker once machines have been fully implemented.

As part of the changeset, this removes the stub server implementation for assigning units. Removing the technical debt for scafolding units and machines.

Note

We don't correctly set the status or status-history of a machine when we create them. We'll need to record that in a future PR.

Warning

The model is usable once it's reached the destination, as the machine sequence is broken. This requires machines to be correctly migrated, which might end up being a lot of work.


This pull request includes significant changes to the Juju codebase, primarily focusing on the removal of the StubService and associated code cleanup. The most important changes include the removal of the StubService from various files, updates to the AssignUnits method, and cleanup of related tests.

Removal of StubService:

Code Cleanup:

Updates to AssignUnits Method:

These changes streamline the codebase by removing deprecated services and simplifying unit assignment logic.

QA steps

Important

We can place directly onto a machine --to 0/lxd/0, but it doesn't quite work. The sequencing isn't right because machines aren't wired correctly using the domain services.

Note

Removal of a parent and child machine (the parent machine will be created first, then the child machine will be placed in the parent) using juju deploy ubuntu --to lxd will fail because of a foreign constraint issue. This is because the machine domain does not have the concept of the machine_parent table. This will be fixed once we have the removal jobs for machines.

$ juju bootstrap lxd test
$ juju add-model default
$ juju deploy ubuntu
$ juju deploy ubuntu ubuntux
$ juju deploy ubuntu ubuntux --to lxd

View the database to see that things are correctly sequenced.

$ juju ssh -m controller 0
$ sudo /var/lib/juju/tools/machine-0/jujud db-repl --machine-id=0
repl (controller)> .switch model-default
repl (model-default)> SELECT * FROM sequence
namespace               value
machine_sequence        1
container_sequence_1    0

repl (model-default)> SELECT * FROM machine
uuid                                    name    net_node_uuid                           life_id base    nonce   password_hash_algorithm_id      password_hash   force_destroyed agent_started_at        hostname        is_controller     keep_instance
7fe5102f-a561-4c38-85b3-dd76e29ccff0    0       462ef0ef-d24f-414e-8d20-35f51be657c3    0       <nil>   <nil>   <nil>                           <nil>           false           <nil>                   <nil>           false    false
296e1dd2-b8df-4c7d-83e0-d2446bb78316    1       2284571a-df99-4760-8cbb-24f7f9fe94be    0       <nil>   <nil>   <nil>                           <nil>           false           <nil>                   <nil>           false    false
1d9a438e-3221-4746-8abd-c7dd60fed92e    1/lxd/0 bbabd5a3-9d2e-4d01-8da5-c0a0b2a00ce6    0       <nil>   <nil>   <nil>                           <nil>           false           <nil>                   <nil>           false    false

repl (model-default)> SELECT * FROM unit
uuid                                    name            life_id application_uuid                        net_node_uuid                           charm_uuid                              password_hash_algorithm_id      password_hash
74956059-4805-4034-806d-4679df73d67e    ubuntu/0        0       5e1ac67a-6939-4ba7-87e7-e5e5e209766e    462ef0ef-d24f-414e-8d20-35f51be657c3    49c19aaa-2356-48d1-8048-00293be0ecb5    0                               wLEAOzdZ+uQhJxHjNyVzckqI
854b54a6-b084-409f-8d22-62bc390d6e3e    ubuntuz/0       0       18b942fe-4793-4f1c-83bf-d06830187496    bbabd5a3-9d2e-4d01-8da5-c0a0b2a00ce6    49c19aaa-2356-48d1-8048-00293be0ecb5    <nil>                           <nil>

repl (model-default)> SELECT * FROM net_node
uuid
462ef0ef-d24f-414e-8d20-35f51be657c3
2284571a-df99-4760-8cbb-24f7f9fe94be
bbabd5a3-9d2e-4d01-8da5-c0a0b2a00ce6

repl (model-default)> SELECT * FROM unit_parent
ERROR failed to execute query: no such table: unit_parent
repl (model-default)> SELECT * FROM machine_parent
machine_uuid                            parent_uuid
1d9a438e-3221-4746-8abd-c7dd60fed92e    296e1dd2-b8df-4c7d-83e0-d2446bb78316

Adding units:

$ juju add-model other
$ juju deploy ubuntu -n 2
$ juju add-unit ubuntu -n 2
repl (controller)> .switch model-other
repl (model-other)> SELECT * FROM sequence;
namespace               value
machine_sequence        3

Assigning to a machine

Note

This doesn't work completely as we're not watching for the unit and machine correctly, but the database for a unit is correct.

$ juju add-model test
$ juju deploy ubuntu
$ juju deploy ubuntu ubuntuw --to 0

The net_node_uuid is correct for the machine:

repl (model-test)> SELECT * FROM unit
uuid                                    name            life_id application_uuid                        net_node_uuid                           charm_uuid                              password_hash_algorithm_id      password_hash
a7748731-82e5-436b-8ff9-acfe27cd3dda    ubuntu/0        0       bcbae87e-7359-4832-89ab-ff77baf08fa5    d68f4253-1fa8-4e6e-89d9-b629482a7d00    24881ab4-6567-40c5-8cca-3ed1a741f30b    <nil>                           <nil>

repl (model-test)> SELECT * FROM unit
uuid                                    name            life_id application_uuid                        net_node_uuid                           charm_uuid                              password_hash_algorithm_id      password_hash
a7748731-82e5-436b-8ff9-acfe27cd3dda    ubuntu/0        0       bcbae87e-7359-4832-89ab-ff77baf08fa5    d68f4253-1fa8-4e6e-89d9-b629482a7d00    24881ab4-6567-40c5-8cca-3ed1a741f30b    0                               y0eAW3FxAP7vEq6S9b9bNIbF
99737697-6965-42fd-8291-6d5fe62deb42    ubuntuw/0       0       384b8e9b-5861-4dea-884f-90150709e004    d68f4253-1fa8-4e6e-89d9-b629482a7d00    24881ab4-6567-40c5-8cca-3ed1a741f30b    <nil>                           <nil>

repl (model-test)> SELECT * FROM machine
uuid                                    name    net_node_uuid                           life_id base    nonce   password_hash_algorithm_id      password_hash   force_destroyed agent_started_at        hostname        is_controller   keep_instance
d31a2b07-52f6-4c1b-88e8-ec4fb887169a    0       d68f4253-1fa8-4e6e-89d9-b629482a7d00    0       <nil>   <nil>   <nil>                           <nil>           false           <nil>                   <nil>           false           false

repl (model-test)> SELECT * FROM net_node
uuid
d68f4253-1fa8-4e6e-89d9-b629482a7d00

Links

Jira card: JUJU-7699

@jujubot jujubot added the 4.0 label Apr 8, 2025
@SimonRichardson SimonRichardson force-pushed the unit-assignment branch 5 times, most recently from 8c08938 to 7e0b902 Compare April 10, 2025 11:26
@SimonRichardson SimonRichardson marked this pull request as ready for review April 10, 2025 11:50
Copy link
Member

@nvinuesa nvinuesa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, looking good. Few questions though.

Comment on lines -48 to -49
// StubService is the interface used to interact with the stub service. A special
// service which collects temporary methods required to wire together together
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chef's kiss

Comment on lines +204 to +214
// Try and get the placement for this unit. If it doesn't exist,
// then the default behaviour is to create a new machine for the
// unit.
var placement *instance.Placement
if i < len(spec.Placement) {
var err error
placement, err = instance.ParsePlacement(spec.Placement[i])
if err != nil {
return params.ControllersChanges{}, errors.Annotate(err, "parsing placement")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we don't want to improve existing code (outside the domains), but since this is being added now, why not put it directly into one of the service methods in the app domain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actively going to remove instance placement in favour of the domain placement. For now, we need to keep it there, so we go from string on the params, to the instance placement, to the domain placement. Future patches will remove it.

@SimonRichardson SimonRichardson requested a review from tlm as a code owner April 10, 2025 14:57
@SimonRichardson SimonRichardson force-pushed the unit-assignment branch 2 times, most recently from 0351a49 to 895979c Compare April 11, 2025 08:03
Copy link
Member

@manadart manadart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a few suggestions on this one.

Lock the placement parsing logic into tests
The following adds tests to ensure that we're correctly placing
the net node machines with the right sequence number.
This adds machine placement container tests. This does checking of
the gaps after deletions. Another negative test round needs to be
also done i.e. missing machines, etc...
When we have a parent-child relationship, the scope is put in the
middle. Except that we don't correctly verify the scope until we
look at the state level. With this change, we do it at the service
level to prevent additional queries.
This was a bit of an oversight, at the service layer it's per
unit placement, not per request placement. The state was correctly
modelled out.
Some code was just not exercised as it was in an impossible state.
We can use the bootstrap worker to create the machine for us, no
need for a setup phase.
If we don't pass the right container type (lxd) we can't get the
machine name correct.

Lastly, I fix the sequence name to build a pattern for managing
namespaced sequences. This makes it easier to get right and then there
is always a pattern that people follow.
There was also an extra space in the machine sql which was not
required.
If an application doesn't have a lease holder, don't fail a
migration. One will be selected once imported.
Now that we've wired up unit assignment to machines, we do actually
need to create a machine and ensure it's correctly wired up now.
We're not bootstrapping a machine in an adhoc fashion anymore.
Now that we correctly wired up units with machines, ensure that
we correctly associate with the machine, or at least create a machine
if it's missing.
To allow the model to be usable after model migration, we need to
transfer the sequences across. We should be able to preserve the
sequencing there after.
The method was becoming too large, so farm some of the parsing out
to a function.
We want to be very explicit about when importing that you're
importing a machine for IAAS. We don't want to mention placement,
as it gives the wrong impression.

Future work will require bifurcating the import application for
IAAS and CAAS, so we can drop the if statements. That's out of
scope for this PR.
We're going to have a deployment domain in the future, move the
placement to the new domain.
To indicate why we only want to store provider defined scopes as
a comment. Also suggest how we would implement scripts ;-)
fix the extra case, we're already dealing with uint64
The machine and application sequence names have offical constants
in 3.6, so we should respect those when we import. We might need
to add a parsing step on the sequence domain to convert one
sequence to another. 4.0 -> 4.0 it's fine, it's when it's 3.6 -> 4.0.
Copy link
Member

@manadart manadart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need some changes.

@SimonRichardson
Copy link
Member Author

It's become clear that the import application logic needs to change. We can bifurcate when we import the model, no need to carry along types that are only for CAAS or only for IAAS. I'll set some time in the future to do that.

Copy link
Member

@nvinuesa nvinuesa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. Approve contingent to Joe's changes being applied.

@SimonRichardson
Copy link
Member Author

/build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants