Home Explore Blog CI



rustc

5th chunk of `src/rustdoc-internals/search.md`
1d8e2ae6d4859a1482cea759605ab3298cd7cf942d5f8e2d0000000100000f6b
  If it's not within the threshold, the entry is rejected,
  even if the first two pass.
  If it's within the threshold, the path distance is stored
  for ranking.
* [`checkType`] is used only if there's a type filter,
  like the struct in `struct:vec`. If it fails,
  the entry is rejected.

If all four criteria pass
(plus the crate filter, which isn't technically part of the query),
the results are sorted by [`sortResults`].


## Searching by type

Searching by type can be divided into two phases,
and the second phase has two sub-phases.

* Turn names in the query into numbers.
* Loop over each entry in the search index:
   * Quick rejection using a bloom filter.
   * Slow rejection using a recursive type unification algorithm.

In the names->numbers phase, if the query has only one name in it,
the editDistance function is used to find a near match if the exact match fails,
but if there's multiple items in the query,
non-matching items are treated as generics instead.
This means `hahsmap` will match hashmap on its own, but `hahsmap, u32`
is going to match the same things `T, u32` matches
(though rustdoc will detect this particular problem and warn about it).

Then, when actually looping over each item,
the bloom filter will probably reject entries that don't have every
type mentioned in the query.
For example, the bloom query allows a query of `i32 -> u32` to match
a function with the type `i32, u32 -> bool`,
but unification will reject it later.

The unification filter ensures that:

* Bag semantics are respected. If you query says `i32, i32`,
  then the function has to mention *two* i32s, not just one.
* Nesting semantics are respected. If your query says `vec<option>`,
  then `vec<option<i32>>` is fine, but `option<vec<i32>>` *is not* a match.
* The division between return type and parameter is respected.
  `i32 -> u32` and `u32 -> i32` are completely different.

The bloom filter checks none of these things,
and, on top of that, can have false positives.
But it's fast and uses very little memory, so the bloom filter helps.

## Re-exports

[Re-export inlining] allows the same item to be found by multiple names.
Search supports this by giving the same item multiple entries and tracking a canonical path
for any items where that differs from the given path.

For example, this sample index has a single struct exported from two paths:

```json
[
    [ "crate_name", {
        "doc": "Documentation",
        "n": ["Data", "Data"],
        "t": "FF",
        "d": ["The data struct", "The data struct"],
        "q": [[0, "crate_name"], [1, "crate_name::submodule"]],
        "i": [0, 0],
        "p": [],
        "f": "``",
        "b": [],
        "c": [],
        "a": [],
        "r": [[0, 1]],
    }]
]
```

The important part of this example is the `r` array,
which indicates that path entry 1 in the `q` array is
the canonical path for item 0.
That is, `crate_name::Data` has a canonical path of `crate_name::submodule::Data`.

This might sound like a strange design, since it has the duplicate data.
It's done that way because inlining can happen across crates,
which are compiled separately and might not all be present in the docs.

```json
[
  [ "crate_name", ... ],
  [ "crate_name_2", { "q": [[0, "crate_name::submodule"], [5, "core::option"]], ... }]
]
```

In the above example, a canonical path actually comes from a dependency,
and another one comes from an inlined standard library item:
the canonical path isn't even in the index!
The canonical path might also be private.
In either case, it's never shown to the user, and is only used for deduplication.

Associated types, like methods, store them differently.
These types are connected with an entry in `p` (their "parent")
and each one has an optional third tuple element:

    "p": [[5, "Data", 0, 1]]

That's:

- 5: It's a struct
- "Data": Its name
- 0: Its display path, "crate_name"
- 1: Its canonical path, "crate_name::submodule"

Title: Rustdoc Search Index: Type Search, Re-exports, and Canonical Paths
Summary
This section delves into the type search functionality, highlighting the role of the bloom filter for quick rejections and the recursive type unification algorithm for more thorough filtering. The unification filter ensures bag semantics, nesting semantics, and the distinction between return types and parameters are respected. Additionally, it discusses re-exports and how the search index handles them by providing multiple entries for the same item and tracking a canonical path to deduplicate items, especially when inlining occurs across crates. It explains how canonical paths are stored in the `r` array and utilized for deduplication. Finally, it describes how associated types, like methods, store their relationships with parent items and canonical paths.