POSTED: Dec 10, 2020
TAGS: LLVM Developers Meeting, Toolchain
Event: 2020 LLVM Virtual Developers Meeting
LLVM Debug Information
LLVM supports several debug information formats (DWARF, COFF, CodeView) in different binary formats (ELF, PDB, MacOS). Understanding source-to-DWARF and source-to-COFF mappings can be complex, and it’s a problem commonly encountered when triaging LLVM’s debug information issues. SN Systems have developed the llvm-diva tool to help understand those mappings using logical views.
Figure 1 - Example use case (left), GDB debug session (right)
Figure 1 shows an example bug which can be analyzed more easily with llvm-diva than existing tools. At line 3, the typedef 'TYPE' should be defined as 'unsigned'. Due to a bug in the compiler, GDB incorrectly reports the definition as 'float'.
LLDB does not have support to print a resolved type, same as the ‘whatis’ GDB command. Using its scripting facility (script), we can display only the declared type of the variable.
- lldb.frame.FindVariable("Var_1").GetType().GetCanonicalType() -> float
- lldb.frame.FindVariable("Var_2").GetType().GetCanonicalType() -> unsigned int.
It shows the correct type for ‘Var_1’ and ‘Var_2’.
Figure 2 - DWARF (left) and CodeView (right) debug information dumps for the example code
Figure 2 shows the low-level debug information dumps for the example code in figure 1. These are a close representation of the internal debug information format and requires a good knowledge of those debug information formats to understand it, which limits who can triage and address such bugs. Even for the experts, it can take a lot of time and effort to triage debug information issues due to the complexity of these formats.
An additional problem is how to compare the debug information generated by the different toolchain versions? In the case of development across multiple platforms, what are the differences between the debug information generated by Clang and MS Visual Studio on Windows?
llvm-diva is a command line tool that processes the Debugging Information (DWARF, COFF, CodeView) contained in binary files (ELF, PDB, MacOS) and produces a logical view, which is a high-level representation of the debug information.
The logical view is composed of elements such as: scopes, types, symbols and lines. These elements can display additional information, such as a variable coverage factor, lexical block level, template argument encoding, etc.
The diversity of its command line options enables the creation of very rich logical views, to add more low-level debug information details, such as disassembly code, variables runtime location, internal offsets where the elements are in the binary file, etc.
Figure 3 - Example use case - output from llvm-diva
Using the example code in figure 1, the output from llvm-diva is shown in figure 3. It contains a logical view for the debug information independent of the low-level format representation (DWARF, COFF, CodeView). The logical elements represent the common concepts from a programming language: scopes, symbols, types, lines, etc.
Figure 4 - Architecture of the llvm-diva implementation
Figure 4 shows the llvm-diva architecture. It uses only the LLVM libraries (debug info, targets, support, etc) and it is divided in the following modules:
- Driver: Process the command line options.
- Readers: The modules that parse a specific debug information format using the LLVM libraries and creates the logical view by generating requests to the Core module. Handle the logical view printing and comparison.
- Core: Creates the logical elements, in a common format for all supported debug information formats.
- Printing: Displays the logical elements in 2 formats: text and JSON.
With llvm-diva, we aim to address the following points:
- Which variables are dropped due to optimization?
- Why I can't stop at a particular line?
- Which lines are associated to a specific code range?
- Does the debug information represent the original source?
- What are the semantic differences between different toolchain versions?
The size of the logical view created by llvm-diva depends on the size of the debug information contained in the binary file being processed. It can range from just a few lines, up to thousands of logical elements, which can make the printed output difficult to parse.
llvm-diva supports command line options (including regular expressions) to select the kind of logical elements to be included in the output. The output can include the whole logical view, partial logical views or just specific logical elements, selected by name, pattern or kind.
Figure 5 - Selected Printing of logical view elements
Figure 5 shows the output for different select options: (top) full logical view, (middle) portions of the logical view containing a specific pattern (TYPE), including the parent's hierarchy (bottom) only the logical elements that meet that pattern criteria.
In this mode llvm-diva compares logical views to produce a report with the logical elements that are missing or added. This is a very powerful aid to find semantic differences in debug information produced by different toolchain versions, or even debug information formats.
Figure 6 - Logical View comparison (left) and Logical Element comparison (right)
There are 2 comparison methods: logical view and logical elements. For both comparison methods, the comparison criteria involve the name, source code location, type, lexical scope level. Logical view comparison compares the logical view as a whole unit. For a match, both parents and children must be the same. Logical element comparison just compares elements without considering if their parents are the same.
Llvm-diva helps engineers understand the debug information in their programs in a high level way. We are contributing llvm-diva to the LLVM opensource project so it can be used more widely.
Carlos presented his work at the LLVM Virtual Developers’ Meeting 2020 and can be viewed here.
Duration: 5 mins