Creating and Cloning Knowledge-Bases with Rag

## 1. Create Knowledge-Base First, let's say there're text files explaining ai. We'll build a knowledge-base with from the text files. The directory should look like below. ``` ai_tutorials/ | *-- ai_tutorial_1.txt | *-- ai_tutorial_2.txt | *-- ai_tutorial_3.txt | *-- ... and many more txt files ``` Run `cd ai_tutorial; rag init`. You'll see a new directory created like below. ``` ai_tutorials/ | *-- .ragit/ | | | *-- chunks/ | | | *-- configs/ | | | *-- files/ | | | *-- images/ | | | *-- prompts/ | | | *-- index.json | | | *-- models.json | *-- ai_tutorial_1.txt | *-- ai_tutorial_2.txt | *-- ai_tutorial_3.txt | *-- ... and many more txt files ``` `.ragit/` is like `.git/` of git repositories. It saves metadata and chunks. After `rag init`, the knowledge-base is empty. You have to add files to the staging using `rag add` command. Run `rag add --all`. Now you're ready to build the knowledge-base. Run `rag build` to start the work. The default model is `llama3.3-70b-groq` and you need `GROQ_API_KEY` to run. If you want to run gpt-4o-mini, run `rag config --set model gpt-4o-mini`. You can see the list of the models using `rag ls-models`. You can also add models manually to `.ragit/models.json`. ``` elapsed time: 00:33 staged files: 15, processed files: 13 errors: 0 committed chunks: 39 buffered files: 8, buffered chunks: 8 flush count: 1 model: gpt-4o-mini input tokens: 14081 (0.001$), output tokens: 1327 (0.000$) ``` `rag build` takes very long time and money (if you're using a proprietary api). It creates chunks and add title and summary to each chunk, using AI. You can press Ctrl+C to pause the process. You can resume from where you left off by running `rag build` again. (more on [a dedicated document](./commands/build.txt)) ``` ai_tutorials/ | *-- .ragit/ | | | *-- chunks/ | | | | | *-- ... a lot of directories | | | *-- configs/ | | | *-- files/ | | | *-- images/ | | | *-- prompts/ | | | *-- index.json | | | *-- models.json | *-- ai_tutorial_1.txt | *-- ai_tutorial_2.txt | *-- ai_tutorial_3.txt | *-- ... and many more txt files ``` After it's built, you'll see many data files in the `.ragit/` directory. You can ask queries on the knowledge-base now. NOTE: You can ask queries on an incomplete knowledge-base, too. ## 2. (Optional) Clone Knowledge-Bases from web This is the key part. You can download knowledge-bases from the internet and extend your knowledge-base with those. You can also share your knowledge-base with others. First, let's make a fresh directory. Run `mkdir playground; cd playground`. ``` playground ``` Before downloading knowledge-bases, we have to init a rag index. Run `rag init`. ``` playground | *-- .ragit/ | *-- chunks/ | *-- configs/ | *-- files/ | *-- prompts/ | *-- index.json | *-- models.json ``` You'll see an empty rag index. Now we have to download knowledge-bases from the web. I have uploaded a few sample knowledge-bases for you. You can `rag clone` them, like `rag clone http://ragit.baehyunsol.com/sample/git`

This section describes how to create a knowledge-base from text files using the Rag tool, including initializing the knowledge-base, adding files, building the index, and cloning knowledge-bases from the web to extend or share existing knowledge-bases.