Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debuginfod is ignored if an agent uploads a stripped binary #5465

Open
bobrik opened this issue Jan 26, 2025 · 11 comments
Open

debuginfod is ignored if an agent uploads a stripped binary #5465

bobrik opened this issue Jan 26, 2025 · 11 comments

Comments

@bobrik
Copy link

bobrik commented Jan 26, 2025

Here's systemd on Debian Trixie showing some unresolved symbols:

Image

We can find the corresponding buildid:

$ find projects/parca/data | grep 42a2d21
projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764
projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764/metadata
projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo

It is stripped, so it's not very useful:

$ file projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo
projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=42a2d21759b160fe6556b8c801294dcfd5fc6764, stripped

Metadata points at the file being uploaded by an agent:

$ cat projects/parca/data/debuginfo/42a2d21759b160fe6556b8c801294dcfd5fc6764/metadata
{
  "buildId": "42a2d21759b160fe6556b8c801294dcfd5fc6764",
  "source": "SOURCE_UPLOAD",
  "upload": {
    "id": "52a1a4bd-3717-45a0-bd6c-8a19af7aeb2e",
    "hash": "7ad293fe4e10d873162078e376ec14d3",
    "state": "STATE_UPLOADED",
    "startedAt": "2025-01-20T03:38:30.386740609Z",
    "finishedAt": "2025-01-20T03:38:30.391896602Z"
  },
  "quality": {
    "hasDynsym": true
  }
}

We can consult debuginfod and it will happily fetch us the proper debug info:

$ DEBUGINFOD_URLS=https://debuginfod.elfutils.org/ debuginfod-find debuginfo 42a2d21759b160fe6556b8c801294dcfd5fc6764
/home/ivan/.cache/debuginfod_client/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo

That is not stripped:

$ file /home/ivan/.cache/debuginfod_client/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo
/home/ivan/.cache/debuginfod_client/42a2d21759b160fe6556b8c801294dcfd5fc6764/debuginfo: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=42a2d21759b160fe6556b8c801294dcfd5fc6764, with debug_info, not stripped

It would be good for Parca to check debuginfod servers if debuginfo is present, but incomplete, like in this case.

@brancz
Copy link
Member

brancz commented Jan 28, 2025

Strange! Which version of Parca is this?

What Parca should do is reject the upload in the first place if it can find debuginfos in a debuginfod server.

@bobrik
Copy link
Author

bobrik commented Jan 29, 2025

This was the latest main at the time of writing. If you could point me to the code that does the rejecting, I can do some debugging to see what's not clicking.

@bobrik
Copy link
Author

bobrik commented Feb 2, 2025

I think it might be related to debuginfod not being available. Still, it doesn't seem very productive to upload stripped binaries.

@bobrik
Copy link
Author

bobrik commented Feb 2, 2025

Specifically, debuginfod.elfutils.org does not like many requests at once:

checking debuginfod.elfutils.org 144aa6681b4d21fa0312fe4055b9c0ba1315254e: false request failed: Get "https://debuginfod.elfutils.org/buildid/144aa6681b4d21fa0312fe4055b9c0ba1315254e/debuginfo": read tcp [2601:644:4981:f2e8:eaff:1eff:fed5:f416]:34868->[2600:3c03::f03c:91ff:fe50:73f]:443: read: connection reset by peer

They seem to straight up ban the client IP:

ivan@cube:~$ curl -svo /dev/null https://debuginfod.elfutils.org/buildid/144aa6681b4d21fa0312fe4055b9c0ba1315254e/debuginfo
* Host debuginfod.elfutils.org:443 was resolved.
* IPv6: 2600:3c03::f03c:91ff:fe50:73f
* IPv4: 96.126.110.187
*   Trying [2600:3c03::f03c:91ff:fe50:73f]:443...
* connect to 2600:3c03::f03c:91ff:fe50:73f port 443 from 2601:644:4981:f2e8:eaff:1eff:fed5:f416 port 58934 failed: Connection refused
*   Trying 96.126.110.187:443...
* connect to 96.126.110.187 port 443 from 192.168.1.50 port 38460 failed: Connection refused
* Failed to connect to debuginfod.elfutils.org port 443 after 156 ms: Could not connect to server
* closing connection #0

My laptop is on the same /64 prefix and it can reach it just fine. The machine above also recovers after some time, but it gets banned very easily with 32 concurrent requests:

ivan@cube:~$ wrk -t 1 -c 32 -d 5s https://debuginfod.elfutils.org/buildid/144aa6681b4d21fa0312fe4055b9c0ba1315254e/debuginfo

@fche, is this sort of behavior expected?

@fche
Copy link

fche commented Feb 2, 2025

Several public debuginfod servers apply some throttling self-defense measures against IP addresses that use them too heavily. Many concurrent connections from the same IP address is just such a trigger. I'll nudge up the limits of this server. If you are a heavy user of these services, please consider installing a local caching proxy.

@fche
Copy link

fche commented Feb 2, 2025

For example, in the last 8 hours, this server has received about 7000 duplicate queries for nonexistent build-ids from the same IP address, just seconds apart. A properly functioning debuginfod client would cache the negative hits, but this one does not.

172.203.153.37 - - [02/Feb/2025:16:46:32 +0000] debuginfod.elfutils.org "GET /buildid/69fd2f79f443d687ee083f218f57a8947836b95d/debuginfo HTTP/1.1" 404 9 "-" "parca.dev/debuginfod-client/0.21.0" (-%) 875us -

@bobrik
Copy link
Author

bobrik commented Feb 2, 2025

@fche a local caching proxy does not help with the initial deluge of requests when debuginfod is requested for everything installed.

Would you consider putting Cloudflare in front (via open-source sponsorships) to absorb the shocks and to do both positive and negative caching? With a proper tiered setup you shouldn't get more than ~one request for anything, no matter how well behaving the clients are.

@fche
Copy link

fche commented Feb 2, 2025

How about this: If some other organization wishes to arrange for and oversee a public-interest CDN for debuginfod services, I'd be glad to add it to our published list of servers.

@bobrik
Copy link
Author

bobrik commented Feb 5, 2025

@fche, I started an internal RFC. I'll let you know when it has some progress (over email).

@brancz
Copy link
Member

brancz commented Feb 5, 2025

A couple of clarifying statements since v0.22.0 Parca has had:

  • Negative caching
  • Only requesting debuginfod servers if the build ID is determined to be a GNU build ID

If Cloudflare could help with a cache that would be super helpful either way though especially because there is little we can do about our users not upgrading!

@brancz
Copy link
Member

brancz commented Feb 5, 2025

Still, it doesn't seem very productive to upload stripped binaries.

The reason this happens at this point is that the agent tries to at least find some symbols to upload. At the point where it does that it has already exhausted all other possibilities and it's the only thing left that might offer non-zero chances at symbolizing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants