Home Explore Blog CI



2025-06-27
author: baehyunsol
tags: news

Ragithub performance issues

Recently, Ragithub has been experiencing performance issues. The server is running on an Amazon lightsail. Since last week, its CPU usage has consistently been at 100%. As a result, the website has slowed down, and both rag clone and rag push have also become slower.

The problem is attributed to bots. GPT-bot and Claude-bot are making over 100,000 requests per day. However, it's not their fault. It's largely because of a weird design of ragithub's file explorer. When you click on a knowledge-base in "Explore" tab, you'll see a file explorer and the README of the knowledge-base. The file explorer displays files and directories within the knowledge-base. You can expand directories by clicking on them. Each file and directory in the explorer is an <a> tag. There is no javascript used, and all internal state is stored in a query string. This design has 2 major problems:

  1. Being stateless, the server has to perform the same computation every time you expand a directory. For example, if there are 5 expanded directories and you click on the 6th one, the query string will contain information about all 6 expanded directories. The server processes the query string and retrieves the contents of the 6 directories, even though the contents of the first 5 directories were previously computed.
  2. It appears that the bots click on an <a> tag if it has a query string that the bot has not encountered before. When the bot clicks the link, the state changes, and a large number of new, unseen query strings are generated. If there are n directories in a knowledge base, there are 2^n possible states. As a result, the bots will exhaustively explore the website, making over 2^1000 requests before eventually leaving.

I don't want to restrict bots. It would be nice if GPT and Claude know what ragit is about and how it works. I'll instead implement a cache. Let's see what happens then.