Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Podman with custom network dns not working #23957

Open
flixman opened this issue Sep 15, 2024 · 22 comments
Open

Podman with custom network dns not working #23957

flixman opened this issue Sep 15, 2024 · 22 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. network Networking related issue or feature pasta pasta(1) bugs or features

Comments

@flixman
Copy link

flixman commented Sep 15, 2024

Issue Description

Similarly to this issue, using run I can reach the internet:

podman run --rm -it --name testcontainer <registry>/gitea/runners/podman:extended podman login -u <login> -p <password> <registry>

However, should I create the network separately and then use it, I cannot do it:

podman network create --subnet 10.1.0.0/24 --gateway 10.1.0.1 testnet
podman run --rm -it --network testnet --name testcontainer <registry>/gitea/runners/podman:extended podman login -u <login> -p <password> <registry>

returns Error: authenticating creds for "<registry>": pinging container registry <registry>: Get "https://registry/v2/": dial tcp: lookup <registry>: Temporary failure in name resolution

Steps to reproduce the issue

  1. create the network
  2. run the container attached to that network.

Describe the results you received

The container cannot reach the internet

Describe the results you expected

The container works with a customized network the same it works with the default network.

podman info output

host:
  arch: amd64
  buildahVersion: 1.37.2
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-1:2.1.12-1
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: e8896631295ccb0bfdda4284f1751be19b483264'
  cpuUtilization:
    idlePercent: 96.7
    systemPercent: 1.33
    userPercent: 1.97
  cpus: 16
  databaseBackend: sqlite
  distribution:
    distribution: arch
    version: unknown
  eventLogger: journald
  freeLocks: 2024
  hostname: altair
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.10.9-arch1-2
  linkmode: dynamic
  logDriver: journald
  memFree: 3601637376
  memTotal: 15909453824
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.2-1
      path: /usr/lib/podman/aardvark-dns
      version: aardvark-dns 1.12.2
    package: netavark-1.12.2-1
    path: /usr/lib/podman/netavark
    version: netavark 1.12.2
  ociRuntime:
    name: crun
    package: crun-1.17-1
    path: /usr/bin/crun
    version: |-
      crun version 1.17
      commit: 000fa0d4eeed8938301f3bcf8206405315bc1017
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-2024_09_06.6b38f07-1
    version: |
      pasta 2024_09_06.6b38f07
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 0
  swapTotal: 0
  uptime: 22h 55m 6.00s (Approximately 0.92 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  docker.io:
    Blocked: false
    Insecure: false
    Location: docker.io
    MirrorByDigestOnly: false
    Mirrors:
    - Insecure: false
      Location: <registry>
      PullFromMirror: ""
    Prefix: docker.io
    PullFromMirror: ""
  search:
  - docker.io
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 1
    stopped: 1
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/user/.local/share/containers/storage
  graphRootAllocated: 500856545280
  graphRootUsed: 246386896896
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 89
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 5.2.2
  Built: 1724352649
  BuiltTime: Thu Aug 22 20:50:49 2024
  GitCommit: fcee48106a12dd531702d729d17f40f6e152027f
  GoVersion: go1.23.0
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.2

Podman in a container

No

Privileged Or Rootless

Rootless

Upstream Latest Release

No

Additional environment details

No response

Additional information

When using the default network, when it works, I get an ip address on the address space of the host.

@flixman flixman added the kind/bug Categorizes issue or PR as related to a bug. label Sep 15, 2024
@Luap99 Luap99 added the network Networking related issue or feature label Sep 16, 2024
@Luap99
Copy link
Member

Luap99 commented Sep 16, 2024

What does cannot reach the internet mean? Your error shows a problem resolving a dns name, do you actually have no network connectivity or is just dns failing?
Does dns/networking work inside podman unshare --rootless-netns?

@flixman
Copy link
Author

flixman commented Sep 16, 2024

@Luap99 That is interesting! Let's see:

Running in my container with a custom network created through podman network create --subnet 10.1.0.0/24 --gateway 10.1.0.1 testnet:
telnet 10.1.0.1 53, works
telnet 8.8.8.8 53, works
dig www.google.com @8.8.8.8, works
dig www.google.com: error ";; communications error to 10.1.0.1#53: timed out"

Running inside podman unshare --rootless-netns:
dig www.google.com, works

How is possible that I can telnet, from inside my container, to port 53... but then dig returns an error??

@Luap99
Copy link
Member

Luap99 commented Sep 16, 2024

Ok thanks for checking, this means that aardvark-dns is not responding on udp I would guess. telnet uses tcp not udp. You could try to use dig +tcp ... to see if dns works on tcp.

Can you a check that aardvark-dns is running (when you have the container running) and if so please provide the output of podman unshare --rootless-netns ss -tulpn.

@flixman
Copy link
Author

flixman commented Sep 16, 2024

dig +tcp ... returns the timeout as well, and aardvark-dns is running. The output of podman unshare --rootless-netns ss -tulpn is:

Netid          State           Recv-Q          Send-Q                   Local Address:Port                     Peer Address:Port          Process                                            
udp            UNCONN          0               0                             10.1.0.1:53                            0.0.0.0:*              users:(("aardvark-dns",pid=38793,fd=12))          
tcp            LISTEN          0               1024                          10.1.0.1:53                            0.0.0.0:*              users:(("aardvark-dns",pid=38793,fd=13))

Additionally: should I attach strace to the running aardvark-dns and its forks, when doing the dig (with eider udp or tcp), I get similar traces:

[pid 38801] accept4(13, {sa_family=AF_INET, sin_port=htons(35163), sin_addr=inet_addr("10.1.0.3")}, [128 => 16], SOCK_CLOEXEC|SOCK_NONBLOCK) = 5
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 5, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1342190720, u64=140416707998848}}) = 0
[pid 38801] accept4(13, 0x7fb59b5fb9d0, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
[pid 38801] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 38801] epoll_wait(4, [{events=EPOLLIN|EPOLLOUT, data={u32=1342190720, u64=140416707998848}}, {events=EPOLLIN, data={u32=0, u64=0}}], 1024, 2956) = 2
[pid 38801] recvfrom(5, "\0007", 2, 0, NULL, NULL) = 2
[pid 38801] recvfrom(5, "\260\356\1 \0\1\0\0\0\0\0\1\3www\6google\3com\0\0\1\0\1"..., 55, 0, NULL, NULL) = 55
[pid 38801] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 14
[pid 38801] connect(14, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("169.254.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 14, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1946168448, u64=140417311976576}}) = 0
[pid 38801] epoll_wait(4, [], 1024, 3212) = 0
[pid 38801] epoll_wait(4, [], 1024, 1726) = 0
[pid 38801] epoll_wait(4, [], 1024, 59) = 0
[pid 38801] epoll_ctl(7, EPOLL_CTL_DEL, 14, NULL) = 0
[pid 38801] close(14)                   = 0
[pid 38801] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 14
[pid 38801] connect(14, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("192.168.178.4")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 14, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=2348821376, u64=140417714629504}}) = 0
[pid 38801] write(6, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 38801] epoll_wait(4, [{events=EPOLLIN, data={u32=0, u64=0}}], 1024, 2306) = 1
[pid 38801] epoll_wait(4, [], 1024, 2306) = 0
[pid 38801] epoll_wait(4, [], 1024, 2686) = 0
[pid 38801] epoll_wait(4, [{events=EPOLLIN, data={u32=3002696448, u64=99616179192576}}], 1024, 4) = 1
[pid 38801] accept4(13, {sa_family=AF_INET, sin_port=htons(34497), sin_addr=inet_addr("10.1.0.3")}, [128 => 16], SOCK_CLOEXEC|SOCK_NONBLOCK) = 15
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 15, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1946168448, u64=140417311976576}}) = 0
[pid 38801] accept4(13, 0x7fb59b5fb9d0, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EAGAIN (Resource temporarily unavailable)
[pid 38801] epoll_wait(4, [{events=EPOLLIN|EPOLLOUT|EPOLLRDHUP, data={u32=1342190720, u64=140416707998848}}, {events=EPOLLIN|EPOLLOUT, data={u32=1946168448, u64=140417311976576}}], 1024, 3) = 2
[pid 38801] recvfrom(15, "\0007", 2, 0, NULL, NULL) = 2
[pid 38801] recvfrom(15, "%\332\1 \0\1\0\0\0\0\0\1\3www\6google\3com\0\0\1\0\1"..., 55, 0, NULL, NULL) = 55
[pid 38801] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 17
[pid 38801] connect(17, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("169.254.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 17, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1946170880, u64=140417311979008}}) = 0
[pid 38801] epoll_wait(4, [], 1024, 2)  = 0
[pid 38801] epoll_ctl(7, EPOLL_CTL_DEL, 14, NULL) = 0
[pid 38801] close(14)                   = 0
[pid 38801] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 14
[pid 38801] connect(14, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("84.116.46.21")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 38801] epoll_ctl(7, EPOLL_CTL_ADD, 14, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=1946162816, u64=140417311970944}}) = 0
[pid 38801] epoll_wait(4, [{events=EPOLLOUT, data={u32=1946162816, u64=140417311970944}}], 1024, 1401) = 1
[pid 38801] getsockopt(14, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
[pid 38801] setsockopt(14, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 38801] sendto(14, "\0007", 2, MSG_NOSIGNAL, NULL, 0) = 2
[pid 38801] sendto(14, "^}\1 \0\1\0\0\0\0\0\1\3www\6google\3com\0\0\1\0\1"..., 55, MSG_NOSIGNAL, NULL, 0) = 55
[pid 38801] futex(0x5a99b2f936f8, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 38801] epoll_wait(4,  <unfinished ...>
[pid 38799] <... futex resumed>)        = 0
[pid 38799] futex(0x5a99b2f936f8, FUTEX_WAIT_BITSET_PRIVATE, 12, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
[pid 38801] <... epoll_wait resumed>[{events=EPOLLIN|EPOLLOUT, data={u32=1946162816, u64=140417311970944}}], 1024, 1369) = 1
[pid 38801] recvfrom(14, "\0;", 2, 0, NULL, NULL) = 2
[pid 38801] recvfrom(14, "^}\201\200\0\1\0\1\0\0\0\1\3www\6google\3com\0\0\1\0\1"..., 59, 0, NULL, NULL) = 59
[pid 38801] recvfrom(14, 0x7fb574004cb0, 2, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)

meaning: the request reaches aardvark-dns in both cases, but seems that aardvark-dns is not able to query the DNS itself?

@Luap99
Copy link
Member

Luap99 commented Sep 17, 2024

Do you have any aardvark-dns errors logged in journald?

The strace part shows a tcp request if I read this right. The async epoll API that we are using makes reading the strace a bit harder but it seems we are trying to connect to upstream servers but then it just removes the fd from the epoll again but I do not see any error logged or any write/read from the socket which seems very odd.
Although in the end it seems to succeed when connecting 84.116.46.21 but I guess by that time the original client timed out (ref containers/aardvark-dns#482 (comment))

What is the content of /etc/resolv.conf on the host and inside podman unshare --rootless-netns? And when you say dig inside podman unshare --rootless-netns worked which upstream server did it use?

@Luap99 Luap99 changed the title Podman with network options cannot reach the internet Podman with custom network dns not working Sep 17, 2024
@flixman
Copy link
Author

flixman commented Sep 17, 2024

In /etc/resolv.conf I have a bunch of name servers, and inside podman unshare --rootless-netns I have the same, but a new one gets prepended to the list nameserver 169.254.0.1. When executing dig www.google.com inside podman unshare --rootless-netns I get three timeouts for 169.254.0.1, and then successfully works for another one (using UDP, by the way).

With the container running, dig www.google.com results on aardvark-dns on the host writting a number of "dns request got empty response" messages on the log.

@Luap99
Copy link
Member

Luap99 commented Sep 17, 2024

169.254.0.1

This is the special dns forward address we use for pasta so this address is expected to work there. If it doesn't it sounds like pasta bug, if you look in journald do you see a warning from pasta that it didn't find nameservers?

You can also just test from the cli with pasta --config-net --dns-forward 169.254.0.1 dig google.com @169.254.0.1. If this fails this is a pasta bug.

@Luap99 Luap99 added the pasta pasta(1) bugs or features label Sep 17, 2024
@flixman
Copy link
Author

flixman commented Sep 18, 2024

Indeed, it fails:

$ pasta --config-net --dns-forward 169.254.0.1 dig google.com @169.254.0.1
Multiple default IPv4 routes, picked first
Multiple default IPv6 routes, picked first
;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out
;; communications error to 169.254.0.1#53: timed out

; <<>> DiG 9.20.1 <<>> google.com @169.254.0.1
;; global options: +cmd
;; no servers could be reached

@Luap99
Copy link
Member

Luap99 commented Sep 18, 2024

Multiple default IPv4 routes, picked first
Multiple default IPv6 routes, picked first

How do the routes look like in the container (pasta --config-net ip route)?, if the routes are fine then you can use the --pcap option to capture a pcap file so we can have a look at the packages being send, i.e.
pasta --config-net --pcap /tmp/dns.pcap --dns-forward 169.254.0.1 dig google.com @169.254.0.1

cc @sbrivio-rh @dgibson

@flixman
Copy link
Author

flixman commented Sep 18, 2024

The routes seem to be fine:

$ pasta --config-net ip route
Multiple default IPv4 routes, picked first
Multiple default IPv6 routes, picked first
default via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
84.116.46.20 via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
84.116.46.21 via 192.168.178.1 dev wlp2s0 proto dhcp metric 600 
192.168.178.0/24 dev wlp2s0 proto kernel scope link metric 600 
192.168.178.0/24 dev wlp2s0 proto kernel scope link src 192.168.178.129 metric 600 
192.168.178.1 dev wlp2s0 proto dhcp scope link metric 600 

Please, find attached the trace dns.pcap.txt (remove the .txt suffix. Seems GH does not support .pcap):

@Luap99
Copy link
Member

Luap99 commented Sep 18, 2024

Please, find attached the trace dns.pcap.txt (remove the .txt suffix. Seems GH does not support .pcap):

    4   0.007273 192.168.178.129 → 169.254.0.1  DNS 93 Standard query 0xaab5 A google.com OPT
   12   5.012949 192.168.178.129 → 169.254.0.1  DNS 93 Standard query 0xaab5 A google.com OPT
   13  10.018316 192.168.178.129 → 169.254.0.1  DNS 93 Standard query 0xaab5 A google.com OPT

The requests was send out but never a reply, can you also do a packet capture on the host to see if pasta makes a actual requests to the upstream server there or if pasta eats it internally and never forwards. I wonder is pasta somehow failed to parse resolv.conf for the servers but in this case it should it should print this as warning like the "multiple default routes" warning. There is also --debug pasta option which also logs the internal packet flow so maybe there is something interesting in there.

But I guess at this point I have to leave it to @sbrivio-rh and @dgibson (the pasta maintainers) if they have a clue here.

@flixman
Copy link
Author

flixman commented Sep 18, 2024

@sbrivio-rh @dgibson: I have run again the pasta command with the --debug option. Can you guys give me a hand?

$ pasta --debug --config-net --dns-forward 169.254.0.1 dig google.com @169.254.0.1
0.0010: Multiple default IPv4 routes, picked first
0.0010: Multiple default IPv6 routes, picked first
0.0118: Template interface: wlp2s0 (IPv4), wlp2s0 (IPv6)
0.0118: Namespace interface: wlp2s0
0.0118: MAC:
0.0118:     host: 9a:55:9a:55:9a:55
0.0118:     NAT to host 127.0.0.1: 192.168.178.1
0.0118: DHCP:
0.0119:     assign: 192.168.178.129
0.0119:     mask: 255.255.255.0
0.0119:     router: 192.168.178.1
0.0119: DNS:
0.0119:     192.168.178.4
0.0119:     84.116.46.21
0.0119:     84.116.46.20
0.0119:     84.116.46.21
0.0119:     169.254.0.1
0.0119:     192.168.178.1
0.0119:     192.168.178.4
0.0119: DNS search list:
0.0119:     .
0.0119:     NAT to host ::1: fe80::4ad3:43ff:feda:bb88
0.0119: NDP/DHCPv6:
0.0120:     assign: 2001:1c00:1804:b700:f2b3:fadc:4fa3:f578
0.0120:     router: fe80::4ad3:43ff:feda:bb88
0.0120:     our link-local: fe80::4ad3:43ff:feda:bb88
0.0120: DNS:
0.0120:     2001:b88:1002::10
0.0120:     2001:b88:1202::10
0.0120:     2001:730:3e42:1000::53
0.0120:     2001:b88:1002::10
0.0120: DNS search list:
0.0120:     .
0.0186: SO_PEEK_OFF not supported
0.0305: Flow 0 (NEW): FREE -> NEW
0.0305: Flow 0 (INI): NEW -> INI
0.0305: Flow 0 (INI): TAP [192.168.178.129]:39909 -> [169.254.0.1]:53 => ?
0.0306: Flow 0 (TGT): INI -> TGT
0.0306: Flow 0 (TGT): TAP [192.168.178.129]:39909 -> [169.254.0.1]:53 => HOST [0.0.0.0]:39909 -> [192.168.178.4]:53
0.0306: Flow 0 (UDP flow): TGT -> TYPED
0.0306: Flow 0 (UDP flow): TAP [192.168.178.129]:39909 -> [169.254.0.1]:53 => HOST [0.0.0.0]:39909 -> [192.168.178.4]:53
0.0308: Flow 0 (UDP flow): Side 0 hash table insert: bucket: 41306
0.0308: Flow 0 (UDP flow): TYPED -> ACTIVE
0.0308: Flow 0 (UDP flow): TAP [192.168.178.129]:39909 -> [169.254.0.1]:53 => HOST [0.0.0.0]:39909 -> [192.168.178.4]:53
0.0487: ICMP error on UDP socket 179: No route to host
;; communications error to 169.254.0.1#53: timed out
5.0351: Flow 1 (NEW): FREE -> NEW
5.0351: Flow 1 (INI): NEW -> INI
5.0351: Flow 1 (INI): TAP [192.168.178.129]:57747 -> [169.254.0.1]:53 => ?
5.0351: Flow 1 (TGT): INI -> TGT
5.0352: Flow 1 (TGT): TAP [192.168.178.129]:57747 -> [169.254.0.1]:53 => HOST [0.0.0.0]:57747 -> [192.168.178.4]:53
5.0352: Flow 1 (UDP flow): TGT -> TYPED
5.0352: Flow 1 (UDP flow): TAP [192.168.178.129]:57747 -> [169.254.0.1]:53 => HOST [0.0.0.0]:57747 -> [192.168.178.4]:53
5.0353: Flow 1 (UDP flow): Side 0 hash table insert: bucket: 235154
5.0353: Flow 1 (UDP flow): TYPED -> ACTIVE
5.0353: Flow 1 (UDP flow): TAP [192.168.178.129]:57747 -> [169.254.0.1]:53 => HOST [0.0.0.0]:57747 -> [192.168.178.4]:53
5.0498: ICMP error on UDP socket 244: No route to host
;; communications error to 169.254.0.1#53: timed out
10.0406: Flow 2 (NEW): FREE -> NEW
10.0406: Flow 2 (INI): NEW -> INI
10.0407: Flow 2 (INI): TAP [192.168.178.129]:59697 -> [169.254.0.1]:53 => ?
10.0407: Flow 2 (TGT): INI -> TGT
10.0407: Flow 2 (TGT): TAP [192.168.178.129]:59697 -> [169.254.0.1]:53 => HOST [0.0.0.0]:59697 -> [192.168.178.4]:53
10.0407: Flow 2 (UDP flow): TGT -> TYPED
10.0407: Flow 2 (UDP flow): TAP [192.168.178.129]:59697 -> [169.254.0.1]:53 => HOST [0.0.0.0]:59697 -> [192.168.178.4]:53
10.0408: Flow 2 (UDP flow): Side 0 hash table insert: bucket: 10518
10.0408: Flow 2 (UDP flow): TYPED -> ACTIVE
10.0408: Flow 2 (UDP flow): TAP [192.168.178.129]:59697 -> [169.254.0.1]:53 => HOST [0.0.0.0]:59697 -> [192.168.178.4]:53
10.0590: ICMP error on UDP socket 245: No route to host
;; communications error to 169.254.0.1#53: timed out

; <<>> DiG 9.20.1 <<>> google.com @169.254.0.1
;; global options: +cmd
;; no servers could be reached

@sbrivio-rh
Copy link
Collaborator

I was looking into this right now. Quick question: is 2001:b88:1002::10 a valid resolver? What happens if you dig passt.top @2001:b88:1002::10?

@sbrivio-rh
Copy link
Collaborator

Same for 192.168.178.4: does it work?

@flixman
Copy link
Author

flixman commented Sep 18, 2024

Thank you for your help! Yes, 192.168.178.4 is valid. About the "dig passt.top @2001:b88:1002::10", this also seems to work:

$ pasta --config-net --dns-forward 169.254.0.1 dig passt.top @2001:b88:1002::10
Multiple default IPv4 routes, picked first
Multiple default IPv6 routes, picked first

; <<>> DiG 9.20.1 <<>> passt.top @2001:b88:1002::10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40536
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;passt.top.                     IN      A

;; ANSWER SECTION:
passt.top.              300     IN      A       88.198.0.164

;; Query time: 60 msec
;; SERVER: 2001:b88:1002::10#53(2001:b88:1002::10) (UDP)
;; WHEN: Wed Sep 18 20:45:38 CEST 2024
;; MSG SIZE  rcvd: 54

@sbrivio-rh
Copy link
Collaborator

Weird, because when pasta (and not a process running under pasta) tries to contact 192.168.178.4, it gets an error ("No route to host"). That might be an ICMP error or netfilter (nftables or iptables) blocking it.

How do routes look like on the host (not the ones pasta copies)? Any particular firewalling rule pasta could hit?

@flixman
Copy link
Author

flixman commented Sep 19, 2024

with respect to the routes on the host, this is how they look like:

$ ip route
default via 192.168.178.1 dev wlp2s0 proto dhcp src 192.168.178.129 metric 600 
default via 192.168.178.1 dev eno1 proto dhcp src 192.168.178.213 metric 800 
84.116.46.20 via 192.168.178.1 dev wlp2s0 proto dhcp src 192.168.178.129 metric 600 
84.116.46.20 via 192.168.178.1 dev eno1 proto dhcp src 192.168.178.213 metric 800 
84.116.46.21 via 192.168.178.1 dev wlp2s0 proto dhcp src 192.168.178.129 metric 600 
84.116.46.21 via 192.168.178.1 dev eno1 proto dhcp src 192.168.178.213 metric 800 
192.168.178.0/24 dev wlp2s0 proto kernel scope link src 192.168.178.129 metric 600 
192.168.178.0/24 dev eno1 proto kernel scope link src 192.168.178.213 metric 800 
192.168.178.1 dev wlp2s0 proto dhcp scope link src 192.168.178.129 metric 600 
192.168.178.1 dev eno1 proto dhcp scope link src 192.168.178.213 metric 800 

and about the firewall settings, I do not have any rules in nft that can justify this behavior: nft_rules.txt

@dgibson
Copy link
Collaborator

dgibson commented Sep 20, 2024

Well, I'm deeply baffled. You're able to manually contact the DNS server from the host, but when pasta tries it gets an ICMP error. We could try to get a packet capture on the host - perhaps that would shed some more light on where the error is originating. In fact, even better would be to get two different packet traces on the host: one querying the nameserver directly from the host with dig, the second doing a similar query from the container via pasta. Perhaps we'll see some difference that helps explain things.

@sbrivio-rh
Copy link
Collaborator

...or maybe it has something to do with us bind()ing and connect()ing UDP sockets (dig doesn't do that) when two sets of almost-identical routes (metrics, interface, and source differ) are present?

I can try and see if it can be reproduced with a dummy interface with similar routes.

@dgibson
Copy link
Collaborator

dgibson commented Sep 20, 2024

...or maybe it has something to do with us bind()ing and connect()ing UDP sockets (dig doesn't do that)

Doesn't bind() or doesn't connect()? I'm pretty sure it has to do one of them in order to receive anything at all.

when two sets of almost-identical routes (metrics, interface, and source differ) are present?

I can try and see if it can be reproduced with a dummy interface with similar routes.

@sbrivio-rh
Copy link
Collaborator

sbrivio-rh commented Sep 20, 2024

...or maybe it has something to do with us bind()ing and connect()ing UDP sockets (dig doesn't do that)

Doesn't bind() or doesn't connect()? I'm pretty sure it has to do one of them in order to receive anything at all.

Whoops, sorry, I just assumed. It actually does both:

$ strace -e connect,bind dig root-servers.net @1.1.1.1 >/dev/null
bind(11, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
connect(11, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("1.1.1.1")}, 16) = 0
+++ exited with 0 +++

but it bind()s to 0.0.0.0, port 0, so that's not quite the bind()ing we do.

@dgibson
Copy link
Collaborator

dgibson commented Sep 20, 2024

I think binding to 0.0.0.0:0 is basically a no-op. Which means I thnk the kernel will implicitly bind the socket at connect() time to an address and port of the kernel's choosing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. network Networking related issue or feature pasta pasta(1) bugs or features
Projects
None yet
Development

No branches or pull requests

4 participants