We apologize for the long performance degradation today.
Finally, we identified all of the 'tricks' that AI crawlers found today. They no longer bypass the anubis proof of work challenges.
A novelty for us was that AI crawlers seem to not only crawl URLs that are actually presented to them by our frontend, but they converted the URLs into a format that bypassed our filter rules.
By the way, you can track the changes we have been doing via
https://codeberg.org/Codeberg-Infrastructure/scripted-configuration/compare/51618~1..e4aac
@Codeberg AI companies crawl our websites.
We ask that they stop by using the industry standard robots.txt
AI companies ignore those rules.
We start blocking the companies themselves with conventional tools like IP rules.
AI companies start working around those blocks.
We invent ways to specifically make life harder for their crawlers (stuff like Anubis).
AI companies put considerable resources into circumventing that, too.
This industry seriously needs to implode. Fast.
As a next step, AI companies are now offering "their" browser (read: Chromium ever so slightly themed with some company bullshit built in)
In part, this is certainly done to have yet another way to crawl the web, but this time user-directed and indistinguishable from actual human requests.
https://notes.vv221.fr/blackhole.xhtml
- replies
- 0
- announces
- 0
- likes
- 2