Recently, Ragithub has been experiencing performance issues. The server is running on an Amazon lightsail. Since last week, its CPU usage has consistently been at 100%. As a result, the website has slowed down, and both rag clone
and rag push
have also become slower.
The problem is attributed to bots. GPT-bot and Claude-bot are making over 100,000 requests per day. However, it's not their fault. It's largely because of a weird design of ragithub's file explorer. When you click on a knowledge-base in "Explore" tab, you'll see a file explorer and the README of the knowledge-base. The file explorer displays files and directories within the knowledge-base. You can expand directories by clicking on them. Each file and directory in the explorer is an <a>
tag. There is no javascript used, and all internal state is stored in a query string. This design has 2 major problems:
<a>
tag if it has a query string that the bot has not encountered before. When the bot clicks the link, the state changes, and a large number of new, unseen query strings are generated. If there are n
directories in a knowledge base, there are 2^n
possible states. As a result, the bots will exhaustively explore the website, making over 2^1000
requests before eventually leaving.I don't want to restrict bots. It would be nice if GPT and Claude know what ragit is about and how it works. I'll instead implement a cache. Let's see what happens then.