Procedural Macro Debugging, Source File Checksums, and Future Debugging Improvements

Get sign off on LLVM maintainers that this is a good idea. 2. Change the DWARF extension. 3. Update the debuggers. Update DWARF readers, expression evaluators. 4. Update Rust compiler. Change it to emit this new information. ### Procedural macro stepping A deeply profound question is that how do you actually debug a procedural macro? What is the location you emit for a macro expansion? Consider some of the following cases - * You can emit location of the invocation of the macro. * You can emit the location of the definition of the macro. * You can emit locations of the content of the macro. RFC: [https://github.com/rust-lang/rfcs/pull/2117] Focus is to let macros decide what to do. This can be achieved by having some kind of attribute that lets the macro tell the compiler where the line marker should be. This affects where you set the breakpoints and what happens when you step it. ## Source file checksums in debug info Both DWARF and CodeView (PDB) support embedding a cryptographic hash of each source file that contributed to the associated binary. The cryptographic hash can be used by a debugger to verify that the source file matches the executable. If the source file does not match, the debugger can provide a warning to the user. The hash can also be used to prove that a given source file has not been modified since it was used to compile an executable. Because MD5 and SHA1 both have demonstrated vulnerabilities, using SHA256 is recommended for this application. The Rust compiler stores the hash for each source file in the corresponding `SourceFile` in the `SourceMap`. The hashes of input files to external crates are stored in `rlib` metadata. A default hashing algorithm is set in the target specification. This allows the target to specify the best hash available, since not all targets support all hash algorithms. The hashing algorithm for a target can also be overridden with the `-Z source-file-checksum=` command-line option. #### DWARF 5 DWARF version 5 supports embedding an MD5 hash to validate the source file version in use. DWARF 5 - Section 6.2.4.1 opcode DW_LNCT_MD5 #### LLVM LLVM IR supports MD5 and SHA1 (and SHA256 in LLVM 11+) source file checksums in the DIFile node. [LLVM DIFile documentation](https://llvm.org/docs/LangRef.html#difile) #### Microsoft Visual C++ Compiler /ZH option The MSVC compiler supports embedding MD5, SHA1, or SHA256 hashes in the PDB using the `/ZH` compiler option. [MSVC /ZH documentation](https://docs.microsoft.com/en-us/cpp/build/reference/zh) #### Clang Clang always embeds an MD5 checksum, though this does not appear in documentation. ## Future work #### Name mangling changes * New demangler in `libiberty` (gcc source tree). * New demangler in LLVM or LLDB. **TODO**: Check the location of the demangler source. [#1157](https://github.com/rust-lang/rustc-dev-guide/issues/1157) #### Reuse Rust compiler for expressions This is an important idea because debuggers by and large do not try to implement type inference. You need to be much more explicit when you type into the debugger than your actual source code. So, you cannot just copy and paste an expression from your source code to debugger and expect the same answer but this would be nice. This can be helped by using compiler. It is certainly doable but it is a large project. You certainly need a bridge to the debugger because the debugger alone has access to the memory. Both GDB (gcc) and LLDB (clang) have this feature. LLDB uses Clang to compile code to JIT and GDB can do the same with GCC. Both debuggers expression evaluation implement both a superset and a subset of Rust. They implement just the expression language, but they also add some extensions like GDB has convenience variables. Therefore, if you are taking this route, then you not only need to do this bridge, but may have to add some mode to let the compiler understand some extensions.

This section discusses debugging procedural macros, source file checksums in debug info, and future improvements for debugging Rust. It covers where to emit location information for macro expansions, how DWARF and CodeView (PDB) support embedding cryptographic hashes of source files for verification, and the recommendation to use SHA256. Future work includes name mangling changes, reusing the Rust compiler for debugger expressions to improve type inference and enable more natural debugging interactions.