Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VR: DHCP doesn't work on new instance "not using configured address ... because it is leased to ..." #10436

Open
salfers opened this issue Feb 20, 2025 · 7 comments
Milestone

Comments

@salfers
Copy link

salfers commented Feb 20, 2025

problem

After creating a new instance in a shared network it would not successfully get an IP. DHCP requests were not answered.

Investigating the Virtual Router reveals the following:

# tail /var/log/dnsmasq.log 
Feb 20 10:59:34 dnsmasq-dhcp[1014]: not using configured address 62.113.210.163 because it is leased to 1e:00:cf:00:08:8d
Feb 20 10:59:34 dnsmasq-dhcp[1014]: DHCPDISCOVER(eth0) 1e:00:ac:00:08:8d no address available
Feb 20 10:59:49 dnsmasq-dhcp[1014]: not using configured address 62.113.210.163 because it is leased to 1e:00:cf:00:08:8d
Feb 20 10:59:49 dnsmasq-dhcp[1014]: DHCPDISCOVER(eth0) 1e:00:ac:00:08:8d no address available
Feb 20 10:59:55 dnsmasq-dhcp[1014]: not using configured address 62.113.210.163 because it is leased to 1e:00:cf:00:08:8d
Feb 20 10:59:55 dnsmasq-dhcp[1014]: DHCPDISCOVER(eth0) 1e:00:ac:00:08:8d no address available
Feb 20 11:00:02 dnsmasq-dhcp[1014]: not using configured address 62.113.210.163 because it is leased to 1e:00:cf:00:08:8d
Feb 20 11:00:02 dnsmasq-dhcp[1014]: DHCPDISCOVER(eth0) 1e:00:ac:00:08:8d no address available
Feb 20 11:00:09 dnsmasq-dhcp[1014]: not using configured address 62.113.210.163 because it is leased to 1e:00:cf:00:08:8d
Feb 20 11:00:09 dnsmasq-dhcp[1014]: DHCPDISCOVER(eth0) 1e:00:ac:00:08:8d no address available

1e:00:ac:00:08:8d belongs to the newly created VM (called "installdev") and according to the cloudstack UI it should have IP 62.113.210.163.
A VM with 1e:00:cf:00:08:8d indeed exists but is turned off and has a totally different IP.

The information in dnsmasq's data files looks correct:

# egrep '1e:00:cf:00:08:8d|1e:00:ac:00:08:8d' /etc/dhcphosts.txt 
1e:00:cf:00:08:8d,62.113.210.6,backup-test,infinite
1e:00:ac:00:08:8d,62.113.210.163,installdev,infinite
# egrep '1e:00:cf:00:08:8d|1e:00:ac:00:08:8d' /var/lib/misc/dnsmasq.leases 
0 1e:00:cf:00:08:8d 62.113.210.6 backup-test *
0 1e:00:ac:00:08:8d 62.113.210.163 installdev *

Checking the lease file for the same IP reveals older VMs that had this IP but all were already deleted:

# fgrep 62.113.210.163 /var/lib/misc/dnsmasq.leases 
0 1e:00:60:00:08:8d 62.113.210.163 test1 *
0 1e:00:1c:00:08:8d 62.113.210.163 VM-80afbd8e-3002-4ef6-9daf-ec5018b4a76c *
0 1e:00:da:00:08:8d 62.113.210.163 debian-test123 *
0 1e:00:f3:00:08:8d 62.113.210.163 debian-test11 *
0 1e:00:f9:00:08:8d 62.113.210.163 test2 *
0 1e:00:3d:00:08:8d 62.113.210.163 test3 *
0 1e:00:2f:00:08:8d 62.113.210.163 installdev *
0 1e:00:57:00:08:8d 62.113.210.163 installdev *
0 1e:00:b9:00:08:8d 62.113.210.163 installdev *
0 1e:00:35:00:08:8d 62.113.210.163 VM-b1bfa4e2-cbca-4af2-b073-c3080649e78c *
0 1e:00:ac:00:08:8d 62.113.210.163 installdev *

Given the situation I don't understand why dnsmasq think the IP is leased to different host.

versions

CloudStack 4.19.1.1
KVM virtualization

The steps to reproduce the bug

This happened randomly, so unfortunately I don't have reproducible steps.

What to do about it?

Maybe this check in dnsmasq can be turned off? Because Cloudstack is the sole authority on instance IP assignment.

@salfers
Copy link
Author

salfers commented Feb 20, 2025

One more bit of information:
Simply restarting dnsmasq on the VR has fixed this issue. It has been running for a while:

● dnsmasq.service - dnsmasq - A lightweight DHCP and caching DNS server
     Loaded: loaded (/lib/systemd/system/dnsmasq.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2024-11-04 13:01:44 UTC; 3 months 16 days ago
    Process: 125200 ExecReload=/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)

Also according to the dnsmasq log "backup-test" did indeed have this IP once but that was more than a month ago:

Jan 16 11:06:12 dnsmasq-dhcp[1014]: DHCPDISCOVER(eth0) 1e:00:cf:00:08:8d 
Jan 16 11:06:12 dnsmasq-dhcp[1014]: DHCPOFFER(eth0) 62.113.210.163 1e:00:cf:00:08:8d 
Jan 16 11:06:12 dnsmasq-dhcp[1014]: DHCPREQUEST(eth0) 62.113.210.163 1e:00:cf:00:08:8d 
Jan 16 11:06:12 dnsmasq-dhcp[1014]: DHCPACK(eth0) 62.113.210.163 1e:00:cf:00:08:8d backup-test
Jan 16 11:31:45 dnsmasq-dhcp[1014]: DHCPDISCOVER(eth0) 1e:00:cf:00:08:8d 
Jan 16 11:31:45 dnsmasq-dhcp[1014]: DHCPOFFER(eth0) 62.113.210.163 1e:00:cf:00:08:8d 
Jan 16 11:31:45 dnsmasq-dhcp[1014]: DHCPREQUEST(eth0) 62.113.210.163 1e:00:cf:00:08:8d 
Jan 16 11:31:45 dnsmasq-dhcp[1014]: DHCPACK(eth0) 62.113.210.163 1e:00:cf:00:08:8d backup-test
Jan 20 16:13:45 dnsmasq-dhcp[1014]: not using configured address 62.113.210.163 because it is leased to 1e:00:cf:00:08:8d
Jan 20 16:13:48 dnsmasq-dhcp[1014]: not using configured address 62.113.210.163 because it is leased to 1e:00:cf:00:08:8d
[...]
^ apparently the same bug already happened in January

@DaanHoogland
Copy link
Contributor

@salfers , a similar thing was found by a user of redundant networks. Maybe this ticket gives you inspiration: #10182

@weizhouapache
Copy link
Member

@salfers
was the VR stopped before ?

@salfers
Copy link
Author

salfers commented Feb 21, 2025

I don't think so. How do I find out if it was?

@DaanHoogland
Copy link
Contributor

I don't think so. How do I find out if it was?

You can run uptime but the following output you gave suggests it wasn't,

Active: active (running) since Mon 2024-11-04 13:01:44 UTC; 3 months 16 days ago

It would explain that entries were not removed from the leases file, though.
You can browse /var/cache/cloud/processed and look for entries starting with vm_dhcp_entry.json . the entry that should have caused the delete would contain the lease data and a field like "remove":true.

@salfers
Copy link
Author

salfers commented Feb 21, 2025

root@r-378-VM:~# uptime
 09:44:34 up 108 days, 20:42,  1 user,  load average: 0.00, 0.00, 0.00

This matches the 3 month uptime the dnsmasq service had earlier.

You can browse /var/cache/cloud/processed and look for entries starting with vm_dhcp_entry.json

For the MAC that is "blocking" the lease there indeed seems to be no 'remove' entry:

root@r-378-VM:~# zgrep -F "62.113.210.163" /var/cache/cloud/processed/vm_dhcp* | grep -F '1e:00:cf:00:08:8d'
/var/cache/cloud/processed/vm_dhcp_entry.json.30d5c223-a6a5-47be-ad6b-71dad68bc492.gz:{"host_name":"backup-test","mac_address":"1e:00:cf:00:08:8d","ipv4_address":"62.113.210.163","ipv6_address":"2a00:f48:2000:103b:1c00:cfff:fe00:88d","ipv6_duid":"00:03:00:01:1e:00:cf:00:08:8d","dns_addresses":"62.113.210.2","default_gateway":"62.113.210.1","default_entry":true,"remove":false,"type":"dhcpentry","delete_from_processed_cache":false}
/var/cache/cloud/processed/vm_dhcp_entry.json.9fcfa5d8-628f-4ca5-9757-91570248e709.gz:{"host_name":"backup-test","mac_address":"1e:00:cf:00:08:8d","ipv4_address":"62.113.210.163","ipv6_address":"2a00:f48:2000:103b:1c00:cfff:fe00:88d","ipv6_duid":"00:03:00:01:1e:00:cf:00:08:8d","dns_addresses":"62.113.210.2","default_gateway":"62.113.210.1","default_entry":true,"remove":false,"type":"dhcpentry","delete_from_processed_cache":false}
/var/cache/cloud/processed/vm_dhcp_entry.json.beae890c-9192-47f8-9de4-bb74d80dcfa8.gz:{"host_name":"backup-test","mac_address":"1e:00:cf:00:08:8d","ipv4_address":"62.113.210.163","ipv6_address":"2a00:f48:2000:103b:1c00:cfff:fe00:88d","ipv6_duid":"00:03:00:01:1e:00:cf:00:08:8d","dns_addresses":"62.113.210.2","default_gateway":"62.113.210.1","default_entry":true,"remove":false,"type":"dhcpentry","delete_from_processed_cache":false}
/var/cache/cloud/processed/vm_dhcp_entry.json.cd5f98c7-d4ba-425b-abab-42513cd92665.gz:{"host_name":"backup-test","mac_address":"1e:00:cf:00:08:8d","ipv4_address":"62.113.210.163","ipv6_address":"2a00:f48:2000:103b:1c00:cfff:fe00:88d","ipv6_duid":"00:03:00:01:1e:00:cf:00:08:8d","dns_addresses":"62.113.210.2","default_gateway":"62.113.210.1","default_entry":true,"remove":false,"type":"dhcpentry","delete_from_processed_cache":false}

@weizhouapache
Copy link
Member

I did a quick test on a shared network in 4.20.

I do see some commands like below

Executing: dhcp_release $(ip route get 10.200.0.214 | grep eth | head -1 | awk '{print $3}') 10.200.0.214 1e:00:ae:00:03:9e

however, the file /var/lib/misc/dnsmasq.leases is not changed

I've create a draft PR #10440, let's see if it works and no regression

@Pearl1594 Pearl1594 added this to the 4.19.3 milestone Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants