Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images Improvements [Breaking Changes] #461

Merged
merged 18 commits into from
Apr 30, 2019

Conversation

BenTheElder
Copy link
Member

@BenTheElder BenTheElder commented Apr 29, 2019

Overall this PR will make cluster boot faster and lighter, however it is a breaking change to the images versus the kind binary, newer images will require a newer kind binary and a newer kind binary will require newer images.

Base Image Changes:

  • switch to Ubuntu 19.04 base (newer dependencies)
  • switch from dockerd to containerd (lighter, faster, easier to debug)
  • remove unnecessary packages
  • add crictl v1.14.0
  • remove more unnecessary files in clean-install & fix shellcheck lint
  • entrypoint is now a small shellscript instead of a go binary
    • easier portability / reproducibility (COPY only, no host building)
    • also smaller, less dependency

Node Image Changes:

  • preload images in to containerd's content store instead of storing archives (smaller & faster!)

Cluster Create Changes:

  • stop waiting for dockerd to be ready (containerd boots ~instantly)
  • stop loading tar archives (already loaded at build time)
  • No more node fixups on the host side (proxy settings, mounts etc.. entrypoint now handles these)
    • No more SIGUSR1
  • product_name is overridden

The entrypoint changes should make #148 much simpler, we only need to tackle the IP issues.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 29, 2019
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 29, 2019
@BenTheElder
Copy link
Member Author

Just making this a breaking change cut down complexity quite a bit, will clean this up some more along with some other fixes and get it in to unblock other work like #453

@BenTheElder
Copy link
Member Author

BenTheElder commented Apr 29, 2019 via email

conntrack iptables iproute2 ethtool socat util-linux mount ebtables udev kmod aufs-tools \
bash rsync \
containerd \
conntrack iptables iproute2 ethtool socat util-linux mount ebtables udev kmod \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need gnupg2 in order to build the node images with apt

/kind build node-image --base-image kindest/base:containerd --image kindest/node:containerd --type apt
E: gnupg, gnupg2 and gnupg1 do not seem to be installed, but one of them is required for this operation
(23) Failed writing body
ERRO[22:37:41] Adding Kubernetes apt key failed! exit status 255
ERRO[22:37:41] Image build Failed! exit status 255
Error: error building node image: exit status 255

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively type=apt can install gnupg2 first if we decide not to get rid of it.

@BenTheElder
Copy link
Member Author

BenTheElder commented Apr 29, 2019 via email

@@ -255,7 +255,7 @@ func (c *BuildContext) buildImage(dir string) error {
}()
}
if err != nil {
log.Errorf("Image build Failed! %v", err)
log.Errorf("Image build Failed! Failed to create build container: %v", err)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wound up adding some more details to these to aid debugging while working on this. They're not strictly related, but it's a small change.

}

// load the docker image artifacts into the docker daemon
node.LoadImages()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the equivilant is now done at image build time

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 29, 2019
@BenTheElder BenTheElder requested review from amwat and krzyzacy April 30, 2019 00:01
@@ -57,7 +57,7 @@ func (a *Action) Execute(ctx *actions.ActionContext) error {
kubeVersion, err := node.KubeVersion()
if err != nil {
// TODO(bentheelder): logging here
return errors.Wrap(err, "failed to get kubernetes version from node: %v")
return errors.Wrap(err, "failed to get kubernetes version from node")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this formatting was incorrect


// helper that calls `try()`` in a loop until the deadline `until`
// has passed or `try()`returns true, returns wether try ever returned true
func tryUntil(until time.Time, try func() bool) bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same code, but the only user is over here now (directly above) so I moved it.

// Deletes the machine-id embedded in the node image and regenerate a new one.
// This is necessary because both kubelet and other components like weave net
// use machine-id internally to distinguish nodes.
if err := handle.Command("rm", "-f", "/etc/machine-id").Run(); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entrypoint does this now

// we need to change a few mounts once we have the container
// we'd do this ahead of time if we could, but --privileged implies things
// that don't seem to be configurable, and we need that flag
if err := node.FixMounts(); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entrypoint does this now

return err
}

if nodes.NeedProxy() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entrypoint does this now

}

// signal the node container entrypoint to continue booting into systemd
if err := node.SignalStart(); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's nothing to signal now

// SignalStart sends SIGUSR1 to the node, which signals our entrypoint to boot
// see images/node/entrypoint
func (n *Node) SignalStart() error {
return docker.Kill("SIGUSR1", n.name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably remove the docker.Kill call since this was the only user afaik. that can maybe be another PR though, this PR is already a bit big

@@ -135,105 +125,6 @@ func (n *Node) CopyFrom(source, dest string) error {
return docker.CopyFrom(n.name, source, dest)
}

// WaitForDocker waits for Docker to be ready on the node
// it returns true on success, and false on a timeout
func (n *Node) WaitForDocker(until time.Time) bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary now, containerd starts very very very fast in testing so far (not worth measuring), also we don't need to wait to load images, because we don't need to load images

}

// LoadImages loads image tarballs stored on the node into docker on the node
func (n *Node) LoadImages() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image loading is at build time now

@BenTheElder
Copy link
Member Author

/hold cancel
will need follow-ups but this should move things forward a lot and help unblock, will follow-up with:

  • code cleanup (mostly removing some now dead-code)
  • HA rework
  • docs updates
  • kind build node-image --type=apt fixes
  • ...

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 30, 2019
Copy link
Contributor

@amwat amwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Apr 30, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amwat, aojea, BenTheElder

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@BenTheElder
Copy link
Member Author

/hold cancel

if err := n.Command("mount", "-o", "remount,ro", "/sys").Run(); err != nil {
return err
}
// kubernetes needs shared mount propagation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenTheElder these are omitted in fix_mount in the entrypoint. We are adopting kind as a cluster provider in Network Service Mesh https://github.com/networkservicemesh/networkservicemesh/blob/master/docs/guide-kind.md and the latest version is not working for us. The issues is that we are using /var/lib mounts which is relying on / being a shared mount.
What would be the best approach here? Shall we consider updating fix_mounts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can update fix_mounts, sorry about that!

These were a hold over from some previous docker-in-docker tooling predating kind, I haven't found much (anything?) documenting why you'd need to do this on the kubernetes host and conformance was passing without them, so I removed them as voodoo figuring kubelet must handle this 🤦‍♂
Looking again now I do see: https://kubernetes.io/docs/concepts/storage/volumes/#configuration


side note:

It is worth noting that kind as any other Kubernetes deployment tool would expect that the machine that hosts the Docker has at least 4 CPU cores and 4 GB of RAM. That is specifically pointed for OSX users in the official docs.

Filed #485, we should make those more accurate in our docs and then send y'all an update 🙃

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed #486, will send a fix momentarily

@nickolaev
Copy link

Thanks, @BenTheElder, indeed I have found these mentions about the MountFlags=shared all over the place but was puzzled how this maps to kind. Apparently, it needs to be done manually here.

jcsirot pushed a commit to jcsirot/compose-on-kubernetes that referenced this pull request Jul 11, 2019
jcsirot pushed a commit to jcsirot/compose-on-kubernetes that referenced this pull request Jul 11, 2019
jcsirot pushed a commit to jcsirot/compose-on-kubernetes that referenced this pull request Jul 11, 2019
jcsirot pushed a commit to jcsirot/compose-on-kubernetes that referenced this pull request Jul 11, 2019
Now use official kind node images and adapt how the compose-on-kube images are loaded in kind cluster in e2e tests (images archives in /kind/images are not loaded at cluster creation anymore, see kubernetes-sigs/kind#461)

Signed-off-by: Jean-Christophe Sirot <[email protected]>
stg-0 pushed a commit to stg-0/kind that referenced this pull request Feb 22, 2024
…sigs#461)

* version v0.18.0-alpha

* update docs for v0.17.0

* fix kind version in readme

* comments-update-buildcontext

* trying get oci versions

* integrated default behaviour

* added managing for chart version

* fixed getting last version

* updated dependendies

* fixed versions sorting

* formated error

* remove clusterOperatorChart variable

* Added cluster_operator_image_version

* Updated CHANGELOG

---------

Co-authored-by: Benjamin Elder <[email protected]>
Co-authored-by: Daman <[email protected]>
Co-authored-by: Kubernetes Prow Robot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants