Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

health_status events are too noisy/redundant #24003

Open
harish2704 opened this issue Sep 18, 2024 · 1 comment · May be fixed by #24005
Open

health_status events are too noisy/redundant #24003

harish2704 opened this issue Sep 18, 2024 · 1 comment · May be fixed by #24005
Labels
kind/bug Categorizes issue or PR as related to a bug. stale-issue

Comments

@harish2704
Copy link

harish2704 commented Sep 18, 2024

Issue Description

  1. the health_status events emitted by podman is too noisy.
    • That is, health_status event is emitted for each health check attempts . I think it should only be emitted when ever there is a change in health_status status of a container.
  2. Docker handles this case in better way. Docker is emitting different events (exec_create and exec_die) for health check attempts and only emit health_status event whenever there is an actual status/state change.
    • In the above way other applications which implements docker/podman service discovery can safely and easly depend on docker/podman for service discovery
    • For eg: tools like Traefik proxy implements docker/podman service discovery by reloading confiuration for every health_status event.
    • Because of the noisy behaviour, there will be considerable CPU resource utilization if we try to use tools like Traefik along with podman.
    • I am not raising this bug report just because podman is not implementing docker's behaviour, but I believe docker is handling this case in better way than podman

As a secure container runtime, I see very big opportunity for Podman . Bugs like this will prevent people from switching to podman .
I found this issue while using Coolify along with podman where considerable CPU resource is wasted due to this bug

Steps to reproduce the issue

Steps to reproduce the issue

  1. clone https://github.com/harish2704/podman-health-check-bug.
  2. run cd for-podman
  3. run podman-compose up -d
  4. watch events emitted by podman by running podman events --format json | jq -r '[.time, .Name, .Status, .health_status] | @tsv'

Describe the results you received

The output I am seeing is given below.

1726677092      for-podman_site1_1      health_status   healthy
1726677092      for-podman_site0_1      health_status   healthy
1726677093      for-podman_site2_1      health_status   healthy
1726677093      for-podman_site1_1      health_status   healthy
1726677093      for-podman_site0_1      health_status   healthy
1726677094      for-podman_site2_1      health_status   healthy
1726677094      for-podman_site1_1      health_status   healthy
1726677094      for-podman_site0_1      health_status   healthy
1726677096      for-podman_site1_1      health_status   healthy
1726677096      for-podman_site0_1      health_status   healthy
1726677096      for-podman_site2_1      health_status   healthy
1726677098      for-podman_site2_1      health_status   healthy
1726677098      for-podman_site0_1      health_status   healthy
1726677098      for-podman_site1_1      health_status   healthy
1726677100      for-podman_site2_1      health_status   healthy
1726677102      for-podman_site0_1      health_status   healthy
1726677102      for-podman_site1_1      health_status   healthy
1726677103      for-podman_site2_1      health_status   healthy
1726677103      for-podman_site0_1      health_status   healthy
1726677103      for-podman_site1_1      health_status   healthy
1726677105      for-podman_site2_1      health_status   unhealthy
1726677105      for-podman_site1_1      health_status   unhealthy
1726677105      for-podman_site0_1      health_status   unhealthy
1726677106      for-podman_site1_1      health_status   unhealthy
1726677106      for-podman_site2_1      health_status   unhealthy
1726677107      for-podman_site0_1      health_status   unhealthy
1726677108      for-podman_site0_1      health_status   unhealthy
1726677108      for-podman_site1_1      health_status   unhealthy
1726677108      for-podman_site2_1      health_status   unhealthy
1726677110      for-podman_site2_1      health_status   unhealthy

In the above , we can see that most of the health_status events are duplicates. ( doesn't contain any new information )

Describe the results you expected

I am expecting the health_status event only when there is a status/change change

podman info output

host:
  arch: amd64
  buildahVersion: 1.37.0
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-2.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: '
  cpuUtilization:
    idlePercent: 96.27
    systemPercent: 1.64
    userPercent: 2.09
  cpus: 12
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: kde
    version: "40"
  eventLogger: journald
  freeLocks: 2012
  hostname: fedora-desk
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 524288
      size: 65536
  kernel: 6.10.9-200.fc40.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 337436672
  memTotal: 16680095744
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.2-2.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.2
    package: netavark-1.12.2-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.17-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.17
      commit: 000fa0d4eeed8938301f3bcf8206405315bc1017
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240906.g6b38f07-1.fc40.x86_64
    version: |
      pasta 0^20240906.g6b38f07-1.fc40.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: ""
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-2.fc40.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.5
  swapFree: 7665238016
  swapTotal: 8589930496
  uptime: 28h 59m 43.00s (Approximately 1.17 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /home/harish/.config/containers/storage.conf
  containerStore:
    number: 20
    paused: 0
    running: 4
    stopped: 16
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/harish/.local/share/containers/storage
  graphRootAllocated: 236221104128
  graphRootUsed: 91409645568
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 50
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/harish/.local/share/containers/storage/volumes
version:
  APIVersion: 5.3.0-dev
  Built: 1726676363
  BuiltTime: Wed Sep 18 21:49:23 2024
  GitCommit: 62c101651ff85daa0370d5031b9e2b3b4c5f16be
  GoVersion: go1.22.6
  Os: linux
  OsArch: linux/amd64
  Version: 5.3.0-dev

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

Yes

Additional environment details

My OS

NAME="Fedora Linux"
VERSION="40 (KDE Plasma)"

Additional information

No response

@harish2704 harish2704 added the kind/bug Categorizes issue or PR as related to a bug. label Sep 18, 2024
harish2704 added a commit to harish2704/podman that referenced this issue Sep 20, 2024
Emit event only if there is a change in health_status
Fixes containers#24003

Resolves containers#24005 (comment)

Pass additional isChanged flag to event creation function

Fix health check events for docker api
Signed-off-by: Harish Karumuthil <[email protected]>
Copy link

A friendly reminder that this issue had no activity for 30 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. stale-issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant