Skip to content

Ironic deployment guide documentation #1010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
232 changes: 232 additions & 0 deletions doc/source/configuration/ironic.rst
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd need to reference this doc from configuration/index.rst in order for it to be included in the docs.

Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
======
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the information you have @assumptionsandg. I think we need to consider what the scope of this doc is. We have some general notes on ironic deployment in https://docs.google.com/document/d/1H3hGYzJzieX8w7phxS3xmD6i6OEZgiHmBjvZ7EJEbU4/edit that are possibly more detailed than this but not public. We might also consider whether a generic description of ironic config might be better placed in kayobe or kolla-ansible upstream docs.

Docs that might belong in SKC are generally either

  1. our opinionated way of doing things, that wouldn't belong upstream
  2. dependent on additional config in SKC

What was the original request for these docs about? Was it to cover the use of overcloud ironic to manage hypervisors? That could be considered under 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the purpose was originally managing hypervisors but I wasn't sure whether to include all of the RAL documentation I had up to this point, I do still need to add the section about managing hypervisors after deployment.

Maybe it would make sense to split the deployment part out and propose that upstream? The openstack-config part might be too opinionated for upstream though.

Ironic
======

Ironic networking
=================

Ironic will require the workload provisioning and cleaning networks to be
configured in ``networks.yml``

The workload provisioning network will require an allocation pool for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would helpful to clarify what these pools are used for. I.e. note that one pool is for statically assigned IPs that will be used by OpenStack Ironic services, and the other pools (Neutron ones) are for dynamically assigning IPs to hosts being provisioned / inspected. I like the idea of suggesting a /16 to start with, to leave room for growth.

Ironic inspection and for Neutron, an example configuration is shown
below.

.. code-block:: yaml

# Workload provisioning network IP information.
provision_wl_net_cidr: "172.0.0.0/16"
provision_wl_net_allocation_pool_start: "172.0.0.4"
provision_wl_net_allocation_pool_end: "172.0.0.6"
provision_wl_net_inspection_allocation_pool_start: "172.0.1.4"
provision_wl_net_inspection_allocation_pool_end: "172.0.1.250"
provision_wl_net_neutron_allocation_pool_start: "172.0.2.4"
provision_wl_net_neutron_allocation_pool_end: "172.0.2.250"
provision_wl_net_neutron_gateway: "172.0.1.1"

The cleaning network will also require a Neutron allocation pool.

.. code-block:: yaml

# Cleaning network IP information.
cleaning_net_cidr: "172.1.0.0/16"
cleaning_net_allocation_pool_start: "172.1.0.4"
cleaning_net_allocation_pool_end: "172.1.0.6"
cleaning_net_neutron_allocation_pool_start: "172.1.2.4"
cleaning_net_neutron_allocation_pool_end: "172.1.2.250"
cleaning_net_neutron_gateway: "172.1.0.1"

OpenStack Config
================

Overcloud Ironic will require a router to exist between the internal API
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen a site use a neutron router for this yet. It's an interesting idea. Is there a specific reason for using one? Normally, at least in the past, we have avoided a dedicated router: https://docs.google.com/document/d/1H3hGYzJzieX8w7phxS3xmD6i6OEZgiHmBjvZ7EJEbU4/edit#heading=h.gyl3n9h885m

It would be helpful to start with why a router is required. Specifically, because the TFTP/HTTP server hosting the deploy images is bound exclusively to internal API network, rather than all possible networks the node might PXE boot on. When baremetal nodes PXE boot, they are given the location of the TFTP/HTTP server to fetch the images from. Since the cleaning/provisioning/inspection network is generally not the same as the internal API network, routing is required.

network and the provision workload network, a way to achieve this is by
using `OpenStack Config <https://github.com/stackhpc/openstack-config>`
to define the internal API network in Neutron and set up a router with
Copy link
Member

@dougszumski dougszumski Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this configuration? I imagine you still need policy based routing here to prevent traffic bypassing the router? Eg. Node on cleaning network tries HTTP fetch of IPA image, request is routed to a controller, controller replies not via the router, but directly to the node via the cleaning interface. The node sees that the traffic came back from a different MAC, and ignores it.

a gateway.

It not necessary to define the provision and cleaning networks in this
configuration as they will be generated during

.. code-block:: console

kayobe overcloud post configure

The openstack config file could resemble the network, subnet and router
configuration shown below:

.. code-block:: yaml

networks:
- "{{ openstack_network_intenral }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/intenral/internal

openstack_network_internal:
name: "internal-net"
project: "admin"
provider_network_type: "vlan"
provider_physical_network: "physnet1"
provider_segmentation_id: 458
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: reference VLAN variable

shared: false
external: true

subnets:
- "{{ openstack_subnet_internal }}"
openstack_subnet_internal:
name: "internal-net"
project: "admin"
cidr: "10.10.3.0/24"
enable_dhcp: true
allocation_pool_start: "10.10.3.3"
allocation_pool_end: "10.10.3.3"

openstack_routers:
- "{{ openstack_router_ironic }}"

openstack_router_ironic:
- name: ironic
project: admin
interfaces:
- net: "provision-net"
subnet: "provision-net"
portip: "172.0.1.1"
- net: "cleaning-net"
subnet: "cleaning-net"
portip: "172.1.0.1"
network: internal-net

To provision baremetal nodes in Nova you will also require setting a flavour
speciifc to that type of baremetal host. You will need to replace the custom
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/speciifc/specific

resource ``resources:CUSTOM_<YOUR_BAREMETAL_RESOURCE_CLASS>`` placeholder with
the resource class of your baremetal hosts, you will also need this later when
configuring the baremetal-compute inventory.

.. code-block:: yaml

openstack_flavors:
- "{{ openstack_flavor_baremetal_A }}"
# Bare metal compute node.
openstack_flavor_baremetal_A:
name: "baremetal-A"
ram: 1048576
disk: 480
vcpus: 256
extra_specs:
"resources:CUSTOM_<YOUR_BAREMETAL_RESOURCE_CLASS>": 1
"resources:VCPU": 0
"resources:MEMORY_MB": 0
"resources:DISK_GB": 0

Enabling conntrack
==================

UEFI booting requires conntrack_helper to be configured on the Ironic neutron
router, this is due to TFTP traffic being dropped due to being UDP. You will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another complexity that would go away if we didn't use the Neutron router. Assuming conntrack is already enabled on the controller.

need to define some extension drivers in ``neutron.yml`` to ensure conntrack is
enabled in neutron server.

.. code-block:: yaml

kolla_neutron_ml2_extension_drivers:
port_security
conntrack_helper
dns_domain_ports

The neutron l3 agent also requires conntrack to be set as an extension in
``kolla/config/neutron/l3_agent.ini``

.. code-block:: ini

[agent]
extensions = conntrack_helper

It is also required to load the conntrack kernel module ``nf_nat_tftp``,
``nf_conntrack`` and ``nf_conntrack_tftp`` on network nodes. You can load these
modules using modprobe or define these in /etc/module-load.

The Ironic neutron router will also need to be configured to use
conntrack_helper.

.. code-block:: json

"conntrack_helpers": {
"protocol": "udp",
"port": 69,
"helper": "tftp"
}

Currently it's not possible to add this helper via the OpenStack CLI, to add
this to the Ironic router you will need to make a request to the API directly,
for example via cURL.

.. code-block:: console

curl -g -i -X POST \
http://<internal_api_vip>:9696/v2.0/routers/<ironic_router_uuid>/conntrack_helpers \
-H "Accept: application/json" \
-H "User-Agent: openstacksdk/2.0.0 keystoneauth1/5.4.0 python-requests/2.31.0 CPython/3.9.18" \
-H "X-Auth-Token: <issued_token>" \
-d '{ "conntrack_helper": {"helper": "tftp", "protocol": "udp", "port": 69 } }'

TFTP server
===========

By default the Ironic TFTP server (ironic_pxe container) will call the UEFI
boot file ``ipxe-x86_64.efi`` instead of ``ipxe.efi`` meaning no boot file will
be sent during the PXE boot process in the default configuration.

As of now this is solved by using a hack workaround by changing the boot file
in the ``ironic_pxe`` container. To do this you will need to enter the
container and rename the file manually.

.. code-block:: console

docker exec ironic_pxe “mv /tftpboot/ipxe-x86_64.efi /tftpboot/ipxe.efi”

Baremetal inventory
===================

To begin enrolling nodes you will need to define them in the hosts file.

.. code-block:: ini

[r1]
hv1 ipmi_address=10.1.28.16
hv2 ipmi_address=10.1.28.17

[r1:vars]
ironic_driver=redfish
resource_class=<your_resource_class>
redfish_system_id=<your_redfish_systen_id>
redfish_verify_ca=<your_redfish_verify_ca>
redfish_username=<your_redfish_username>
redfish_password=<your_redfish_password>

[baremetal-compute:children]
r1

The typical layout for baremetal nodes are separated by racks, for instance
in rack 1 we have the following configuration set up where the BMC addresses
are defined for all nodes, and Redfish information such as username, passwords
and the system ID are defined for the rack as a whole.

You can add more racks to the deployment by replicating the rack 1 example and
adding that as an entry to the baremetal-compute group.

Node enrollment
===============

When nodes are defined in the inventory you can begin enrolling them by
invoking the Kayobe commmand (Note that only the Redfish driver is supported
by this command)

.. code-block:: console

kayobe baremetal compute register

Following registration, the baremetal nodes can be inspected and made
available for provisioning by Nova via the Kayobe commands

.. code-block:: console

kayobe baremetal compute inspect
kayobe baremetal compute provide