Guidance on Generative AI usage in Kubernetes Github Orgs #291

dims · 2025-04-16T14:41:47Z

LF has guidance here:
https://www.linuxfoundation.org/legal/generative-ai

(CNCF does not have one yet)

Ladybird has some language:
https://github.com/LadybirdBrowser/ladybird/blob/master/CONTRIBUTING.md#on-usage-of-ai-and-llms

Python has some language:
https://devguide.python.org/getting-started/generative-ai/

What is our (wearing my k8s hat!) position here? what can we write down that we can point people to as inevitably we will see more and more of it?

dims · 2025-04-16T14:42:58Z

for those of you already part of the community/maintainers repo, this issue was triggered by discussion:
https://github.com/community/maintainers/discussions/470

BenTheElder · 2025-04-16T16:27:50Z

for those of you already part of the community/maintainers repo, this issue was triggered by discussion:
https://github.com/community/maintainers/discussions/470

Unfortunately not, I can't see this. But I don't think we should be driven by non-public non-kubernetes-org discussions anyhow.

There was a recent thread in our slack however: https://kubernetes.slack.com/archives/C1TU9EB9S/p1744211437481679

(CNCF does not have one yet)

My understanding was that the CNCF communicated the LF's guidance, and that this was the main outcome of the initial flurry of discussions around this space.

At least, that's what I recall from discussing previously via our GB rep, cc @cblecker.

What is our (wearing my k8s hat!) position here? what can we write down that we can point people to as inevitably we will see more and more of it?

For the copyright angle we fundamentally have to ensure that contributions are CLA-ed and good quality even if code is copy-pasted from stack overflow instead of via genAI.

I think https://www.linuxfoundation.org/legal/generative-ai covers the expectations around the CLA.
We could link this in more places, but ... where exactly? (thinking ... suggestions welcome ...)

Regarding quality bar, spamminess, etc, generated, copy-pasted, human authored ... all need to be held to our standards.
Even before genAI became popular, Kubernetes received a lot of strange, seemingly auto-generated spam and has consistently moderated it and banned users that repeatedly engage in this.

If there are gaps in our standards, we should address those directly.

We cannot actually know how content was authored reliably (if you could accurately detect AI generated content, you have created the perfect oracle for training the next model ...), but we can expect that it be reasonable, non-spammy, high quality, respectful, and accurately attributed / copyrighted.

I think the most productive approach is to focus on general content and behavior policy.

Looking for example at: https://devguide.python.org/getting-started/generative-ai/

Acceptable uses

Some of the acceptable uses of generative AI include:

Assistance with writing comments, especially in a non-native language

Gaining understanding of existing code

Supplementing contributor knowledge for code, tests, and documentation

Unacceptable uses

Maintainers may close issues and PRs that are not useful or productive, including those that are fully generated by AI. If a contributor repeatedly opens unproductive issues or PRs, they may be blocked.

This example seems relatively redundant to having a more general reasonable contribution / moderation policy.

We already have a standing enforcement that spam / poor quality PRs will be closed and that repeated spam will result in blocking by our github moderation team.

I think Kubernetes might need to better document those though, as I'm having difficulty turning up a document that spells this out for github as opposed to slack/discus/...

@kubernetes/owners is there a doc I'm missing that actually spells out how we moderate github?

cc @kubernetes/steering-committee

jberkus · 2025-04-16T19:16:44Z

Per the thread, which had some tangents: LLMs are just a special case of "repeated bad-faith/inadequate submissions". It's not qualitatively different from Hacktoberfest or SO copy-and-paste, or any of the "I removed a single comma" PRs that SIG-Docs gets. LLM may be quantitatively different, but that calls for automation rather than policy.

There's also the fact that current automated LLM-detection tools have disappointingly low accuracy.

Also, note that many younger new contributors will not be aware that there's anything wrong with having an LLM generate a PR. They need to be educated. And that there are non-problematic uses of LLM (like grammar checking or auditing).

My suggestion is that we do two things:

Review contributor docs to make sure that we clearly call out bad-faith/inadequate contributions (all types, not just LLM), explain what they are and why they are not permitted (and will be rejected and banned if repeated).
Keep an eye on LLM-generated submissions, and figure out when we need to develop automation to prevent volume from being a burden on triagers/reviewers. Blocking the specific ones we're getting is going to be more solvable than all GenAi in general.

danwinship · 2025-04-21T14:02:00Z

The LF guidance seems entirely focused on the legal issues, but what about the spamminess issues?

I just reviewed a PR that looked good at a high level, but has some issues once you look more closely, and I'm pretty sure that the submitter has no understanding of what those issues are, and would not be able to address them if I pointed them out (either by rewriting the code themselves or instructing their AI tool in a way that would result in the correct changes). Meaning that they have just wasted my time, because this PR will never turn into a valid contribution.

I think we should say that (for code PRs at least) we do not accept AI-generated PRs from non-org-members. (And make that part of the CLA.)

BenTheElder · 2025-04-21T14:31:43Z

issues, but what about the spamminess issues?

This is discussed above. Spamminess issues aren't new to the org with gen-ai, we've received all sorts of generated spam behavior before this and excessive spammy behavior (especially obviously automated) results in a ban from our org moderators.

I don't know if we actually have this in writing for GitHub yet, but it needs to apply to any automated spamminess.

nudge @kubernetes/owners again -- do we have this written down somewhere?

And make that part of the CLA

To be clear: the LF operates the CLA system, which is focused on licensing and compliance. I don't think we cram this in there.

danwinship · 2025-04-21T15:10:43Z

Spammy "change one word" PRs are annoying but take 10 seconds of your time. I spent an hour reviewing a PR from this same person yesterday, who I had assumed was an eager new contributor who just needed some help getting familiar with our codebase and conventions. But now I assume that they'll just copy my review into their AI tool, have it regenerate the patch, and then push an update with a different set of problems that they don't understand.

IMO, a single AI-generated code PR should be considered already over the threshold for "excessive spammy behavior".

To be clear: the LF operates the CLA system, which is focused on licensing and compliance. I don't think we cram this in there.

Ah, well, we can at least get them to update it with their own stance on AI-generated code and licensing/compliance if they haven't already...

Priyankasaggu11929 · 2025-04-22T08:29:30Z

issues, but what about the spamminess issues?

This is discussed above. Spamminess issues aren't new to the org with gen-ai, we've received all sorts of generated spam behavior before this and excessive spammy behavior (especially obviously automated) results in a ban from our org moderators.

I don't know if we actually have this in writing for GitHub yet, but it needs to apply to any automated spamminess.

nudge @kubernetes/owners again -- do we have this written down somewhere?

@BenTheElder, there's some wording around (umbrella) spam moderation in the moderation guidelines document here, but nothing that specifies the kind(s) of spam – https://github.com/kubernetes/community/blob/master/communication/moderation.md?plain=1#L134-L135

BenTheElder · 2025-04-22T14:52:10Z

Spammy "change one word" PRs are annoying but take 10 seconds of your time.

I wasn't actually talking about those in particular, we've been getting spam with everything from advertising to bizzare inexplicable nonsense and poorly generated noise since as long as Kubernetes was popular. We moderate those. I think the main difference is moderating those quickly was easier because they didn't remotely look like authentic good faith behavior.

So we should have a policy for that, but we seem to be enacting a mostly unwritten one. That needs fixing regardless.

I spent an hour reviewing a PR from this same person yesterday, who I had assumed was an eager new contributor who just needed some help getting familiar with our codebase and conventions. But now I assume that they'll just copy my review into their AI tool, have it regenerate the patch, and then push an update with a different set of problems that they don't understand.

"Repeatedly push a hacky patch they don't understand" is common enough with new drive-by contributors without genAI and isn't inherently a moderation worthy "offense".

If we make it an "offense" only when AI is "detected", see above. This is fraught with false positives and assumes poorly of contributors.

IMO, a single AI-generated code PR should be considered already over the threshold for "excessive spammy behavior".

I don't think we can do that, we also don't ban people for a single instance of mass trivial typo fix PRs.

We just correct the expectations by asking them to help us automatically correct the typos and/or consolidate their PRs. We don't reward the behavior but we also don't escalate until after we've raised it with them, because it's entirely possible they were actually trying to be helpful.

@BenTheElder, there's some wording around (umbrella) spam moderation in the moderation guidelines document here, but nothing that specifies the kind(s) of spam

Right, and that doc also doesn't clearly scope GitHub versus slack etc. I think we clearly need written guidance, the rest of the GitHub management program has reasonably detailed docs with clear policies. Let's split that off as a starting point to capture our current approach more clearly.

Then we can identify what, if any, changes are necessary.

pohly · 2025-04-22T16:02:22Z

For PRs we also have https://www.kubernetes.dev/docs/guide/pull-requests/#trivial-edits and recently extended it with an explanation about fixing linter issues as that was another source of unwanted PRs.

I'm not sure if or how "don't submit AI-generated code that you don't understand" fits there, but perhaps somewhere above it?

divya-mohan0209 · 2025-04-22T16:13:42Z

Assuming that this discussion is not specific to code, we have wording specific to SIG Docs localisation contributions here, whereby we caution contributors to not rely on machine-generated translations alone since it often doesn’t meet our quality standards. There’s also an adjacent discussion that’s ongoing between SIG ContribEx and SIG Docs regarding setting guardrails around submitting AI-generated content as blogs or docs, given how we’ve seen a spike in those kind of submissions since 2023 and the guidelines in our contribution docs haven’t been sufficient to address those occurrences.

danwinship · 2025-04-22T17:15:58Z

it's entirely possible they were actually trying to be helpful

OK, maybe what I want is for us to try to make it clearer that submitting AI-generated PRs is often not going to be helpful.

jberkus · 2025-04-28T18:47:14Z

OK, maybe what I want is for us to try to make it clearer that submitting AI-generated PRs is often not going to be helpful.

Yeah, I think we need to make this clear somewhere. Young contributors these days have been subject to a lot of advertising that AI will give them a jump ahead in learning/knowledge/work, some of it from folks they should be able to trust. I think we need to document for them not only that it's problematic, but WHY -- especially since the reasons such contributions are problematic aren't limited to genAI ones. This should probably live in each of the contributor guides (and the contributor tutorial for Kuberentes).

BenTheElder · 2025-04-28T20:06:29Z

Split out kubernetes/community#8439 regarding documenting github moderation policies so we can then iterate on that aspect.

For PRs we also have https://www.kubernetes.dev/docs/guide/pull-requests/#trivial-edits and recently extended it with an explanation about fixing linter issues as that was another source of unwanted PRs.

I'm not sure if or how "don't submit AI-generated code that you don't understand" fits there, but perhaps somewhere above it?

Probably best in one of the guides linked from this comment which we post to PRs from new contributors:

https://github.com/kubernetes/test-infra/blob/2609879bc1bafee98af45e43d1927841a49eb87c/config/prow/plugins.yaml#L699

It looks like currently we link to the non-rendered form at https://git.k8s.io/community/contributors/guide/pull-requests.md, we should update that to https://www.kubernetes.dev/docs/guide/pull-requests, I'll send that cleanup now ...
EDIT: kubernetes/test-infra#34746

soltysh · 2025-05-07T19:04:17Z

This topic was discussed in public steering meeting on 5/07 (recording) and the steering committee agreed that we don't want to explicitly provide any guidance targeting Generative AI usage. We will defer to SIGs to provide such guidance where appropriate. Although, we will provide a wording (@pohly volunteered to take the initial take) similar to https://www.kubernetes.dev/docs/guide/pull-requests/#trivial-edits which reviewers and approvers will be able to use when closing PRs.

This primarily came out of the discussion around allowing the use of LLMs (kubernetes/steering#291), but isn't limited to it because other tools (search/replace, linters) can have the same effect. The goal is to clarify expected behavior and to give reviewers something that they can link to when they decide that a PR shouldn't get merged.

pohly · 2025-05-08T13:30:33Z

@pohly volunteered to take the initial take

See kubernetes/community#8451

This primarily came out of the discussion around allowing the use of LLMs (kubernetes/steering#291), but isn't limited to it because other tools (search/replace, linters) can have the same effect. The goal is to clarify expected behavior and to give reviewers something that they can link to when they decide that a PR shouldn't get merged.

BenTheElder added the committee/steering Denotes an issue or PR intended to be handled by the steering committee. label Apr 16, 2025

BenTheElder mentioned this issue Apr 28, 2025

explicitly document github moderation policies kubernetes/community#8439

Open

BenTheElder mentioned this issue Apr 28, 2025

welcome message: link to kubernetes.dev rendered guidance instead of github kubernetes/test-infra#34746

Merged

pohly linked a pull request May 8, 2025 that will close this issue

pull-requests.md: add guidance for large and/or automatic edits kubernetes/community#8451

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance on Generative AI usage in Kubernetes Github Orgs #291

Guidance on Generative AI usage in Kubernetes Github Orgs #291

dims commented Apr 16, 2025

dims commented Apr 16, 2025

BenTheElder commented Apr 16, 2025

jberkus commented Apr 16, 2025

danwinship commented Apr 21, 2025

BenTheElder commented Apr 21, 2025

danwinship commented Apr 21, 2025

Priyankasaggu11929 commented Apr 22, 2025

BenTheElder commented Apr 22, 2025

pohly commented Apr 22, 2025

divya-mohan0209 commented Apr 22, 2025 •

edited

Loading

danwinship commented Apr 22, 2025

jberkus commented Apr 28, 2025

BenTheElder commented Apr 28, 2025 •

edited

Loading

soltysh commented May 7, 2025

pohly commented May 8, 2025

Guidance on Generative AI usage in Kubernetes Github Orgs #291

Guidance on Generative AI usage in Kubernetes Github Orgs #291

Comments

dims commented Apr 16, 2025

dims commented Apr 16, 2025

BenTheElder commented Apr 16, 2025

jberkus commented Apr 16, 2025

danwinship commented Apr 21, 2025

BenTheElder commented Apr 21, 2025

danwinship commented Apr 21, 2025

Priyankasaggu11929 commented Apr 22, 2025

BenTheElder commented Apr 22, 2025

pohly commented Apr 22, 2025

divya-mohan0209 commented Apr 22, 2025 • edited Loading

danwinship commented Apr 22, 2025

jberkus commented Apr 28, 2025

BenTheElder commented Apr 28, 2025 • edited Loading

soltysh commented May 7, 2025

pohly commented May 8, 2025

divya-mohan0209 commented Apr 22, 2025 •

edited

Loading

BenTheElder commented Apr 28, 2025 •

edited

Loading