Home Explore Blog CI



rustc

3rd chunk of `src/profiling/with_perf.md`
e9c59c104205abbb73b4b52068026ea1dcc58aa561262d550000000100000cd2
would analyze NLL performance.

### Installing `perf-focus`

You can install perf-focus using `cargo install`:

```bash
cargo install perf-focus
```

### Example: How much time is spent in MIR borrowck?

Let's say we've gathered the NLL data for a test. We'd like to know
how much time it is spending in the MIR borrow-checker. The "main"
function of the MIR borrowck is called `do_mir_borrowck`, so we can do
this command:

```bash
$ perf focus '{do_mir_borrowck}'
Matcher    : {do_mir_borrowck}
Matches    : 228
Not Matches: 542
Percentage : 29%
```

The `'{do_mir_borrowck}'` argument is called the **matcher**. It
specifies the test to be applied on the backtrace. In this case, the
`{X}` indicates that there must be *some* function on the backtrace
that meets the regular expression `X`. In this case, that regex is
just the name of the function we want (in fact, it's a subset of the name;
the full name includes a bunch of other stuff, like the module
path). In this mode, perf-focus just prints out the percentage of
samples where `do_mir_borrowck` was on the stack: in this case, 29%.

**A note about c++filt.** To get the data from `perf`, `perf focus`
  currently executes `perf script` (perhaps there is a better
  way...). I've sometimes found that `perf script` outputs C++ mangled
  names. This is annoying. You can tell by running `perf script |
  head` yourself — if you see names like `5rustc6middle` instead of
  `rustc::middle`, then you have the same problem. You can solve this
  by doing:

```bash
perf script | c++filt | perf focus --from-stdin ...
```

This will pipe the output from `perf script` through `c++filt` and
should mostly convert those names into a more friendly format. The
`--from-stdin` flag to `perf focus` tells it to get its data from
stdin, rather than executing `perf focus`. We should make this more
convenient (at worst, maybe add a `c++filt` option to `perf focus`, or
just always use it — it's pretty harmless).

### Example: How much time does MIR borrowck spend solving traits?

Perhaps we'd like to know how much time MIR borrowck spends in the
trait checker. We can ask this using a more complex regex:

```bash
$ perf focus '{do_mir_borrowck}..{^rustc::traits}'
Matcher    : {do_mir_borrowck},..{^rustc::traits}
Matches    : 12
Not Matches: 1311
Percentage : 0%
```

Here we used the `..` operator to ask "how often do we have
`do_mir_borrowck` on the stack and then, later, some function whose
name begins with `rustc::traits`?" (basically, code in that module). It
turns out the answer is "almost never" — only 12 samples fit that
description (if you ever see *no* samples, that often indicates your
query is messed up).

If you're curious, you can find out exactly which samples by using the
`--print-match` option. This will print out the full backtrace for
each sample. The `|` at the front of the line indicates the part that
the regular expression matched.

### Example: Where does MIR borrowck spend its time?

Often we want to do more "explorational" queries. Like, we know that
MIR borrowck is 29% of the time, but where does that time get spent?
For that, the `--tree-callees` option is often the best tool. You
usually also want to give `--tree-min-percent` or
`--tree-max-depth`. The result looks like this:

Title: Analyzing MIR Borrowck Performance with `perf focus`
Summary
This section provides further guidance on using `perf focus` to analyze the performance of the MIR borrow checker. It covers using regex matchers to identify the percentage of samples where a specific function (`do_mir_borrowck`) is on the stack. It also addresses the issue of C++ mangled names in `perf script` output and provides a solution using `c++filt`. The section then demonstrates a more complex query to determine how much time MIR borrowck spends in the trait checker. Finally, it introduces the `--tree-callees` option for exploratory queries to identify where MIR borrowck spends its time, along with `--tree-min-percent` or `--tree-max-depth` for controlling the output.