Understanding `--tree-min-percent`, `--tree-max-depth`, and Relative Percentages in `perf focus`

Matcher : {do_mir_borrowck} Matches : 577 Not Matches: 746 Percentage : 43% Tree | matched `{do_mir_borrowck}` (43% total, 0% self) : | rustc_borrowck::nll::compute_regions (20% total, 0% self) : : | rustc_borrowck::nll::type_check::type_check_internal (13% total, 0% self) : : : | core::ops::function::FnOnce::call_once (5% total, 0% self) : : : : | rustc_borrowck::nll::type_check::liveness::generate (5% total, 3% self) : : : | <rustc_borrowck::nll::type_check::TypeVerifier<'a, 'b, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (3% total, 0% self) : | rustc::mir::visit::Visitor::visit_mir (8% total, 6% self) : | <rustc_borrowck::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir_dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (5% total, 0% self) : | rustc_mir_dataflow::do_dataflow (3% total, 0% self) ``` What happens with `--tree-callees` is that - we find each sample matching the regular expression - we look at the code that occurs *after* the regex match and try to build up a call tree The `--tree-min-percent 3` option says "only show me things that take more than 3% of the time". Without this, the tree often gets really noisy and includes random stuff like the innards of malloc. `--tree-max-depth` can be useful too, it just limits how many levels we print. For each line, we display the percent of time in that function altogether ("total") and the percent of time spent in **just that function and not some callee of that function** (self). Usually "total" is the more interesting number, but not always. ### Relative percentages By default, all in perf-focus are relative to the **total program execution**. This is useful to help you keep perspective — often as we drill down to find hot spots, we can lose sight of the fact that, in terms of overall program execution, this "hot spot" is actually not important. It also ensures that percentages between different queries are easily compared against one another. That said, sometimes it's useful to get relative percentages, so `perf focus` offers a `--relative` option. In this case, the percentages are listed only for samples that match (vs all samples). So for example we could get our percentages relative to the borrowck itself like so: ```bash $ perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5 Matcher : {do_mir_borrowck} Matches : 577 Not Matches: 746 Percentage : 100% Tree | matched `{do_mir_borrowck}` (100% total, 0% self) : | rustc_borrowck::nll::compute_regions (47% total, 0% self) [...] : | rustc::mir::visit::Visitor::visit_mir (19% total, 15% self) [...] : | <rustc_borrowck::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir_dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (13% total, 0% self) [...] : | rustc_mir_dataflow::do_dataflow (8% total, 1% self) [...] ``` Here you see that `compute_regions` came up as "47% total" — that means that 47% of `do_mir_borrowck` is spent in that function. Before, we saw 20% — that's because `do_mir_borrowck` itself is only 43% of the total time (and `.47 * .43 = .20`).

This section explains additional options for `--tree-callees` in `perf focus`, including `--tree-min-percent` to filter out minor functions, `--tree-max-depth` to limit the call tree depth, and `--relative` to show percentages relative to the matched samples rather than the total program execution time. The example demonstrates calculating percentages relative to the borrowck itself using `perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5` and explains the difference in total percentages when using the `--relative` option.