Home Explore Blog CI



rustc

1st chunk of `src/rustdoc-internals/search.md`
3c94eb0e3e20553c4733cdb1e19029038dc05cc51d75d9040000000100000fba
# Rustdoc search

Rustdoc Search is two programs: `search_index.rs`
and `search.js`. The first generates a nasty JSON
file with a full list of items and function signatures
in the crates in the doc bundle, and the second reads
it, turns it into some in-memory structures, and
scans them linearly to search.

<!-- toc -->

## Search index format

`search.js` calls this Raw, because it turns it into
a more normal object tree after loading it.
For space savings, it's also written without newlines or spaces.

```json
[
    [ "crate_name", {
        // name
        "n": ["function_name", "Data"],
        // type
        "t": "HF",
        // parent module
        "q": [[0, "crate_name"]],
        // parent type
        "i": [2, 0],
        // type dictionary
        "p": [[1, "i32"], [1, "str"], [5, "Data", 0]],
        // function signature
        "f": "{{gb}{d}}`", // [[3, 1], [2]]
        // impl disambiguator
        "b": [],
        // deprecated flag
        "c": "OjAAAAAAAAA=", // empty bitmap
        // empty description flag
        "e": "OjAAAAAAAAA=", // empty bitmap
        // aliases
        "a": [["get_name", 0]],
        // description shards
        "D": "g", // 3
        // inlined re-exports
        "r": [],
    }]
]
```

[`src/librustdoc/html/static/js/rustdoc.d.ts`]
defines an actual schema in a TypeScript `type`.

| Key | Name                 | Description  |
| --- | -------------------- | ------------ |
| `n` | Names                | Item names   |
| `t` | Item Type            | One-char item type code |
| `q` | Parent module        | `Map<index, path>` |
| `i` | Parent type          | list of indexes |
| `f` | Function signature   | [encoded](#i-f-and-p) |
| `b` | Impl disambiguator   | `Map<index, string>` |
| `c` | Deprecation flag     | [roaring bitmap](#roaring-bitmaps) |
| `e` | Description is empty | [roaring bitmap](#roaring-bitmaps) |
| `p` | Type dictionary      | `[[item type, path]]` |
| `a` | Alias                | `Map<string, index>` |
| `D` | description shards   | [encoded](#how-descriptions-are-stored) |

The above index defines a crate called `crate_name`
with a free function called `function_name` and a struct called `Data`,
with the type signature `Data, i32 -> str`,
and an alias, `get_name`, that equivalently refers to `function_name`.


The search index needs to fit the needs of the `rustdoc` compiler,
the `search.js` frontend,
and also be compact and fast to decode.
It makes a lot of compromises:

* The `rustdoc` compiler runs on one crate at a time,
  so each crate has an essentially separate search index.
  It [merges] them by having each crate on one line
  and looking at the first quoted string.
* Names in the search index are given
  in their original case and with underscores.
  When the search index is loaded,
  `search.js` stores the original names for display,
  but also folds them to lowercase and strips underscores for search.
  You'll see them called `normalized`.
* The `f` array stores types as offsets into the `p` array.
  These types might actually be from another crate,
  so `search.js` has to turn the numbers into names and then
  back into numbers to deduplicate them if multiple crates in the
  same index mention the same types.
* It's a JSON file, but not designed to be human-readable.
  Browsers already include an optimized JSON decoder,
  so this saves on `search.js` code and performs better for small crates,
  but instead of using objects like normal JSON formats do,
  it tries to put data of the same type next to each other
  so that the sliding window used by [DEFLATE] can find redundancies.
  Where `search.js` does its own compression,
  it's designed to save memory when the file is finally loaded,
  not just size on disk or network transfer.


### Parallel arrays and indexed maps

Abstractly, Rustdoc Search data is a table, stored in column-major form.
Most data in the index represents a set of parallel arrays
(the "columns") which refer to the same data if they're at the same position.

Title: Rustdoc Search Index Format
Summary
The Rustdoc Search uses two programs, `search_index.rs` and `search.js`, to generate and utilize a search index for Rust documentation. The search index is stored as a compact JSON file and is designed to be space-efficient and fast to decode. The index contains information about items, function signatures, and aliases in the documented crates. It employs various compression techniques, including parallel arrays, indexed maps, and roaring bitmaps, to minimize the index size. The index is structured in a way that balances the needs of the `rustdoc` compiler, the `search.js` frontend, and overall performance.