774e5c41d-linux

total: 66, pass: 61, fail: 5

fix korean-related tests

Some tests included running tfidf on korean corpus. There was no problem
because the korean tokenizer was always enabled.

Now that "korean" feature is not enabled by default, those tests behave
differently. I removed korean corpus from tests that are not related to
korean and added a new test dedicated to korean.

commit: 774e5c41dbb1039968bfeb0a8c7350b413abaecf
platform: Linux-6.8.0-1024-aws-x86_64-with-glibc2.39
ragit version: ragit 0.4.0-dev
rustc version: rustc 1.87.0 (17067e9ac 2025-05-09)
cargo version: cargo 1.87.0 (99624be96 2025-05-06)
python version: 3.12.3
tested at: 2025-05-29T21:31:38.721609Z (15 days ago)
total elapsed time: 18,484,752 ms

cargo_tests
cargo_features
add_and_rm
add_and_rm2
ignore
recover
clone
clone_empty
pull
server
server_permission
cli
outside
archive
many_chunks
many_jobs
ls
meta
symlink
gh_issue_20
ii
cat_file
generous_file_reader
clean_up_erroneous_chunk
images
markdown_reader
csv_reader
real_repos
real_repos_regression
subdir
tfidf
korean
merge
external_bases
end_to_end dummy
end_to_end llama3.3-70b
audit llama3.3-70b
logs llama3.3-70b
prompts dummy
prompts gpt-4o-mini
prompts gemini-2.0-flash
prompts claude-3.5-sonnet
empty dummy
empty llama3.3-70b
server_chat llama3.3-70b
server_chat gemini-2.0-flash
images2 gpt-4o-mini
images3 gpt-4o-mini
pdl gpt-4o-mini
pdf gpt-4o-mini
svg gpt-4o-mini
web_images gpt-4o-mini
images2 claude-3.5-sonnet
extract_keywords dummy
extract_keywords gpt-4o-mini
orphan_process llama3.3-70b
write_lock llama3.3-70b
ragit_api command-r
query_options llama3.3-70b
query_with_schema llama3.3-70b
models_init
test_home_config_override
config
migrate
migrate2
migrate3

Cases

cargo_tests

elapsed time: 3,783,199 ms

history

cargo_features

elapsed time: 1,909,063 ms

history

add_and_rm

elapsed time: 93,997 ms

history

add_and_rm2

elapsed time: 29,964 ms

history

ignore

elapsed time: 8,699 ms

history

recover

elapsed time: 7,687 ms

history

clone

elapsed time: 575,582 ms

history

clone_empty

elapsed time: 8,015 ms

history

pull

elapsed time: 11,968 ms

history

server

elapsed time: 264,464 ms

history

server_permission

elapsed time: 1,781 ms

Error

'readme'
Traceback (most recent call last):
  File "/home/ubuntu/Documents/ragit/tests/tests.py", line 701, in <module>
    test()
  File "/home/ubuntu/Documents/ragit/tests/server_permission.py", line 46, in server_permission
    assert repo_info1["readme"] == "hello, world"
           ~~~~~~~~~~^^^^^^^^^^
KeyError: 'readme'

history

cli

elapsed time: 15,265 ms

history

outside

elapsed time: 7,002 ms

history

meta

elapsed time: 4,692 ms

history

symlink

elapsed time: 7,215 ms

history

gh_issue_20

elapsed time: 5,660 ms

history

elapsed time: 721,007 ms

Error

tfidf result on term 'search gpg annot select correspond' is not close enough. error: `approximation[2] not in answer`, answer: ['3e0d93ece16c10490435c08b7b755db9a57e53b818a9e62c0000000100000fa3', 'c5719c769542cb0cde49558784948082703f2da9618c29d80000000100000fb3', '6d1b2eeef26e5ce9672e62a7ca43412c66b86ad0e48d27620000000100000fa0', '606389435f969a017ad1cf63a7a30eba0d1a08c743efea9f0000000100000318', 'f386d96798aad5baf548b6985b367932bdc89483b756b515000000010000081f', 'c66345d5ab119b4cf05a6899472b54a4fd0041ee2b83b9f80000000100000fa2', 'bf8735875031f53ccd50e48e6674d9ac64c90f68bb0c7edb0000000100000fa0', '509b4b369f9f9729365a6947ce43335209d934562feeb7220000000100000fa2', '82ad9747a31109a3ef965e4168a0968cb56a448390416e290000000100000bf5', 'b632241f25a98c9320097079669e1acd10afd534e67ec2600000000100000fa2'], approximation: ['3e0d93ece16c10490435c08b7b755db9a57e53b818a9e62c0000000100000fa3', 'b632241f25a98c9320097079669e1acd10afd534e67ec2600000000100000fa2', '90a25e1efdafffab6369490140eecabb90ab0649108feeff0000000100000cd4', 'bf8735875031f53ccd50e48e6674d9ac64c90f68bb0c7edb0000000100000fa0', '5cdbfe828a4a84a4129bda3cc32bb8376914275561fa6a1a0000000100000da8', 'c5719c769542cb0cde49558784948082703f2da9618c29d80000000100000fb3', '0833e100c47da17ca6a2d202310483ed3c08f75ec2cfbf4a0000000100000c67', '1ff3d753fa4b857385f748c5d02a7371332241a8579211f9000000010000075c', '6f305111c4ab2bb2243ce34889afb4f72dff498303da56890000000100000c1e', '6d1b2eeef26e5ce9672e62a7ca43412c66b86ad0e48d27620000000100000fa0']
Traceback (most recent call last):
  File "/home/ubuntu/Documents/ragit/tests/ii.py", line 103, in ii_worker
    raise AssertionError(f"approximation[{i}] not in answer")
AssertionError: approximation[2] not in answer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/Documents/ragit/tests/tests.py", line 701, in <module>
    test()
  File "/home/ubuntu/Documents/ragit/tests/ii.py", line 49, in ii
    ii_worker()
  File "/home/ubuntu/Documents/ragit/tests/ii.py", line 116, in ii_worker
    raise AssertionError(f"tfidf result on term '{term}' is not close enough. error: `{e}`, answer: {answer}, approximation: {approximation}")
AssertionError: tfidf result on term 'search gpg annot select correspond' is not close enough. error: `approximation[2] not in answer`, answer: ['3e0d93ece16c10490435c08b7b755db9a57e53b818a9e62c0000000100000fa3', 'c5719c769542cb0cde49558784948082703f2da9618c29d80000000100000fb3', '6d1b2eeef26e5ce9672e62a7ca43412c66b86ad0e48d27620000000100000fa0', '606389435f969a017ad1cf63a7a30eba0d1a08c743efea9f0000000100000318', 'f386d96798aad5baf548b6985b367932bdc89483b756b515000000010000081f', 'c66345d5ab119b4cf05a6899472b54a4fd0041ee2b83b9f80000000100000fa2', 'bf8735875031f53ccd50e48e6674d9ac64c90f68bb0c7edb0000000100000fa0', '509b4b369f9f9729365a6947ce43335209d934562feeb7220000000100000fa2', '82ad9747a31109a3ef965e4168a0968cb56a448390416e290000000100000bf5', 'b632241f25a98c9320097079669e1acd10afd534e67ec2600000000100000fa2'], approximation: ['3e0d93ece16c10490435c08b7b755db9a57e53b818a9e62c0000000100000fa3', 'b632241f25a98c9320097079669e1acd10afd534e67ec2600000000100000fa2', '90a25e1efdafffab6369490140eecabb90ab0649108feeff0000000100000cd4', 'bf8735875031f53ccd50e48e6674d9ac64c90f68bb0c7edb0000000100000fa0', '5cdbfe828a4a84a4129bda3cc32bb8376914275561fa6a1a0000000100000da8', 'c5719c769542cb0cde49558784948082703f2da9618c29d80000000100000fb3', '0833e100c47da17ca6a2d202310483ed3c08f75ec2cfbf4a0000000100000c67', '1ff3d753fa4b857385f748c5d02a7371332241a8579211f9000000010000075c', '6f305111c4ab2bb2243ce34889afb4f72dff498303da56890000000100000c1e', '6d1b2eeef26e5ce9672e62a7ca43412c66b86ad0e48d27620000000100000fa0']

history

cat_file

elapsed time: 52,765 ms

history

generous_file_reader

elapsed time: 1,579,155 ms

history

clean_up_erroneous_chunk

elapsed time: 3,277 ms

history

images

elapsed time: 8,430 ms

history

markdown_reader

elapsed time: 10,638 ms

history

csv_reader

elapsed time: 8,442 ms

history

real_repos

elapsed time: 1,135,854 ms

Error

Command '['git', 'clone', 'https://git.postgresql.org/git/postgresql.git', '--depth=1']' returned non-zero exit status 128.
Traceback (most recent call last):
  File "/home/ubuntu/Documents/ragit/tests/tests.py", line 701, in <module>
    test()
  File "/home/ubuntu/Documents/ragit/tests/real_repos.py", line 161, in real_repos
    subprocess.run(["git", "clone", r["git-url"], "--depth=1"], check=True)
  File "/usr/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['git', 'clone', 'https://git.postgresql.org/git/postgresql.git', '--depth=1']' returned non-zero exit status 128.

history

real_repos_regression

elapsed time: 16,908 ms

history

subdir

elapsed time: 20,640 ms

history

tfidf

elapsed time: 22,798 ms

history

korean

elapsed time: 6,157 ms

history

merge

elapsed time: 35,577 ms

history

external_bases

elapsed time: 333,060 ms

history

end_to_end dummy

elapsed time: 87,369 ms

history

end_to_end llama3.3-70b

elapsed time: 80,303 ms

history

audit llama3.3-70b

elapsed time: 11,171 ms

history

logs llama3.3-70b

elapsed time: 6,720 ms

history

prompts dummy

elapsed time: 8,383 ms

history

prompts gpt-4o-mini

elapsed time: 59,941 ms

history

prompts gemini-2.0-flash

elapsed time: 39,047 ms

history

prompts claude-3.5-sonnet

elapsed time: 72,866 ms

history

empty dummy

elapsed time: 6,991 ms

history

empty llama3.3-70b

elapsed time: 7,952 ms

history

server_chat llama3.3-70b

elapsed time: 25,055 ms

history

server_chat gemini-2.0-flash

elapsed time: 41,433 ms

history

images2 gpt-4o-mini

elapsed time: 9,996 ms

history

images3 gpt-4o-mini

elapsed time: 9,892 ms

history

pdl gpt-4o-mini

elapsed time: 17,721 ms

history

pdf gpt-4o-mini

elapsed time: 2,096 ms

Error

Command '['cargo', 'run', '--release', '--no-default-features', '--', 'ls-chunks', 'landscape.pdf', '--json']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/home/ubuntu/Documents/ragit/tests/tests.py", line 701, in <module>
    test()
  File "/home/ubuntu/Documents/ragit/tests/tests.py", line 650, in <lambda>
    ("pdf gpt-4o-mini", lambda: pdf(test_model="gpt-4o-mini")),
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/Documents/ragit/tests/pdf.py", line 41, in pdf
    chunks = json.loads(cargo_run(["ls-chunks", pdf["name"], "--json"], stdout=True))
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/Documents/ragit/tests/utils.py", line 87, in cargo_run
    result = subprocess.run(args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['cargo', 'run', '--release', '--no-default-features', '--', 'ls-chunks', 'landscape.pdf', '--json']' returned non-zero exit status 1.

history

svg gpt-4o-mini

elapsed time: 1,920 ms

Error


Traceback (most recent call last):
  File "/home/ubuntu/Documents/ragit/tests/tests.py", line 701, in <module>
    test()
  File "/home/ubuntu/Documents/ragit/tests/tests.py", line 651, in <lambda>
    ("svg gpt-4o-mini", lambda: svg(test_model="gpt-4o-mini")),
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/Documents/ragit/tests/svg.py", line 48, in svg
    assert stat["staged files"] == len(broken_files)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

history

web_images gpt-4o-mini

elapsed time: 48,604 ms

history

images2 claude-3.5-sonnet

elapsed time: 14,068 ms

history

extract_keywords dummy

elapsed time: 2,891 ms

history

extract_keywords gpt-4o-mini

elapsed time: 10,861 ms