-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting to disable expensive endpoints for anonymous users #33966
Comments
This feature would be very welcomed. I have the same problems as there are AI scrapers which do not even set an appropriate useragent string, that terrorize my Gitea instance for a small open source project. The vast majority of accesses go to Edit: my proposal to block queries with GET parameters seems pointless as even navigating through the issues needs a |
Or perhaps anonymous users will have limited access or reduced traffic. |
What do you think about this? -> Add a config option to block "expensive" pages #34024 |
@wxiaoguang isn't that another solution for #33951 ? |
These 2 are different.
These 2 PRs could also co-exist and won't conflict. And since #33951 has no progress in recent days, I think at least we can implement this issue's proposal to help the users under "AI crawler attack" as a quick solution. |
…o-gitea#34024) Fix go-gitea#33966 ``` ;; User must sign in to view anything. ;; It could be set to "expensive" to block anonymous users accessing some pages which consume a lot of resources, ;; for example: block anonymous AI crawlers from accessing repo code pages. ;; The "expensive" mode is experimental and subject to change. ;REQUIRE_SIGNIN_VIEW = false ``` # Conflicts: # routers/api/v1/api.go # tests/integration/api_org_test.go
1.23 nightly is ready (it is a stable release and will be 1.23.7 soon) It has a new config option: [service]
;; User must sign in to view anything.
;; It could be set to "expensive" to block anonymous users accessing some pages which consume a lot of resources,
;; for example: block anonymous AI crawlers from accessing repo code pages.
;; The "expensive" mode is experimental and subject to change.
;REQUIRE_SIGNIN_VIEW = false Welcome to try and provide feedbacks. |
Works wonderful, thank you. It already reduced the load and the number of accesses both. @wxiaoguang Is it possible to configure what endpoints are considered as “expensive”? I would like to give anonymous users access to the source code, because this is already what one can see on the main repository page, but only the root directory. Also if users are able to checkout the complete repo anonymously, it makes not much sense to restrict the |
Since the |
Normally I would say this is an expensive endpoint. It’s just that in my case the bots hammered on the |
Feature Description
Since AI scrapers are terrorizing the web and flooding innocent gitea instances, it would make sense to have an option to only allow expensive endpoints (like
/src/commit
or/blame
) for logged in users.What I have observed is that crawlers like Claudebot and Bytespider don't respect my robots.txt and decide to crawl every single file from every single commit. For big repositories this can become a massive performance hit since gitea has to run git to be able to serve the requests, which has a lot of overhead. I even enabled a redis cache but since they hit new files all the time it didn't help much.
As a workaround I have configured my reverse proxy nginx to redirect these endpoints to an Anubis instance (https://anubis.techaro.lol/) which seems to kill most of the scrapers or at least wastes their time for long enough to make their DDOS (because that's what it is, really!) less annoying.
However, since this is a solution that works on proxying with nginx, every user sees the Anubis thing before being able to look at commits, even if they are logged in. Therefore it would be preferrable to just have an option to disallow these endpoints. If someone external wants to look at the commits they can just check out the repository and look at the history there.
Screenshots
No response
The text was updated successfully, but these errors were encountered: