Le Livre d'Argent

We apologize for the long performance degradation today.
Finally, we identified all of the 'tricks' that AI crawlers found today. They no longer bypass the anubis proof of work challenges.

A novelty for us was that AI crawlers seem to not only crawl URLs that are actually presented to them by our frontend, but they converted the URLs into a format that bypassed our filter rules.

By the way, you can track the changes we have been doing via

https://codeberg.org/Codeberg-Infrastructure/scripted-configuration/compare/51618~1..e4aac

@Codeberg AI companies crawl our websites.

We ask that they stop by using the industry standard robots.txt

AI companies ignore those rules.

We start blocking the companies themselves with conventional tools like IP rules.

AI companies start working around those blocks.

We invent ways to specifically make life harder for their crawlers (stuff like Anubis).

AI companies put considerable resources into circumventing that, too.

This industry seriously needs to implode. Fast.

As a next step, AI companies are now offering "their" browser (read: Chromium ever so slightly themed with some company bullshit built in)

In part, this is certainly done to have yet another way to crawl the web, but this time user-directed and indistinguishable from actual human requests.

@claudius @Codeberg @lebout2canap Indeed, but in the meantime, if that may help, here's some notes about how a friend of mines managed to setup a trap against scrapper bots. We used it to protect some sites we're managing, and it seems efficient, so better share:

https://notes.vv221.fr/blackhole.xhtml
replies
0
announces
0
likes
2