This document provides guidance and an overview to high level general features and updates for SUSE Linux Enterprise Server 12. Besides architecture or product-specific information, it also describes the capabilities and limitations of SLES 12. Having debugging symbols available is useful both when running R under a debugger (e.g., R -d gdb) and when using sanitizers and valgrind, all things intended for experts. Debugging symbols (and some others) can be ‘stripped’ on installation by using. Next, n: executes the next step in the function.If you have a variable named n, you’ll need print(n) to display its value. Step into, or s: works like next, but if the next step is a function, it will step into that function so you can explore it interactively.
22.1 Introduction
What do you do when R code throws an unexpected error? What tools do you have to find and fix the problem? This chapter will teach you the art and science of debugging, starting with a general strategy, then following up with specific tools.
I’ll show the tools provided by both R and the RStudio IDE. I recommend using RStudio’s tools if possible, but I’ll also show you the equivalents that work everywhere. You may also want to refer to the official RStudio debugging documentation which always reflects the latest version of RStudio.
NB: You shouldn’t need to use these tools when writing new functions. If you find yourself using them frequently with new code, reconsider your approach. Instead of trying to write one big function all at once, work interactively on small pieces. If you start small, you can quickly identify why something doesn’t work, and don’t need sophisticated debugging tools.
Outline
Section 22.2 outlines a general strategy forfinding and fixing errors.
Section 22.3 introduces you to the
traceback()
functionwhich helps you locate exactly where an error occurred.Section 22.4 shows you how to pause the execution of a functionand launch environment where you can interactively explore what’s happening.
Section 22.5 discusses the challenging problemof debugging when you’re running code non-interactively.
Section 22.6 discusses a handful of non-error problemsthat occassionally also need debugging.
22.2 Overall approach
Finding your bug is a process of confirming the many thingsthat you believe are true — until you find one which is nottrue.
—Norm Matloff
Finding the root cause of a problem is always challenging. Most bugs are subtle and hard to find because if they were obvious, you would’ve avoided them in the first place. A good strategy helps. Below I outline a four step process that I have found useful:
Google!
Whenever you see an error message, start by googling it. If you’re lucky,you’ll discover that it’s a common error with a known solution. Whengoogling, improve your chances of a good match by removing any variablenames or values that are specific to your problem.
You can automate this process with the erroristJames Balamuta, Errorist: Automatically Search Errors or Warnings, 2018, https://github.com/coatless/errorist.
'>107 and searcherJames Balamuta, Searcher: Query Search Interfaces, 2018, https://github.com/coatless/searcher.'>108 packages. See their websites for more details.Make it repeatable
To find the root cause of an error, you’re going to need to execute thecode many times as you consider and reject hypotheses. To make thatiteration as quick possible, it’s worth some upfront investment to makethe problem both easy and fast to reproduce.
Start by creating a reproducible example (Section 1.7).Next, make the example minimal by removing code and simplifying data.As you do this, you may discover inputs that don’t trigger the error.Make note of them: they will be helpful when diagnosing the root cause.
If you’re using automated testing, this is also a good time to create anautomated test case. If your existing test coverage is low, take theopportunity to add some nearby tests to ensure that existing good behaviouris preserved. This reduces the chances of creating a new bug.
Figure out where it is
If you’re lucky, one of the tools in the following section will help you toquickly identify the line of code that’s causing the bug. Usually, however,you’ll have to think a bit more about the problem. It’s a great idea toadopt the scientific method. Generate hypotheses, design experiments to testthem, and record your results. This may seem like a lot of work, but asystematic approach will end up saving you time. I often waste a lot of timerelying on my intuition to solve a bug (“oh, it must be an off-by-one error,so I’ll just subtract 1 here”), when I would have been better off taking asystematic approach.
If this fails, you might need to ask help from someone else. If you’vefollowed the previous step, you’ll have a small example that’s easy toshare with others. That makes it much easier for other people to look atthe problem, and more likely to help you find a solution.
Fix it and test it
Once you’ve found the bug, you need to figure out how to fix it and to checkthat the fix actually worked. Again, it’s very useful to have automatedtests in place. Not only does this help to ensure that you’ve actually fixedthe bug, it also helps to ensure you haven’t introduced any new bugs in theprocess. In the absence of automated tests, make sure to carefully recordthe correct output, and check against the inputs that previously failed.
22.3 Locating errors
Once you’ve made the error repeatable, the next step is to figure out where it comes from. The most important tool for this part of the process is traceback()
, which shows you the sequence of calls (also known as the call stack, Section 7.5) that lead to the error.
Here’s a simple example: you can see that f()
calls g()
calls h()
calls i()
, which checks if its argument is numeric:
When we run f('a')
code in RStudio we see:
Two options appear to the right of the error message: “Show Traceback” and “Rerun with Debug”. If you click “Show traceback” you see:
If you’re not using RStudio, you can use traceback()
to get the same information (sans pretty formatting):
NB: You read the traceback()
output from bottom to top: the initial call is f()
, which calls g()
, then h()
, then i()
, which triggers the error. If you’re calling code that you source()
d into R, the traceback will also display the location of the function, in the form filename.r#linenumber
. These are clickable in RStudio, and will take you to the corresponding line of code in the editor.
22.3.1 Lazy evaluation
One drawback to traceback()
is that it always linearises the call tree, which can be confusing if there is much lazy evaluation involved (Section 7.5.2). For example, take the following example where the error happens when evaluating the first argument to f()
:
You can using rlang::with_abort()
and rlang::last_trace()
to see the call tree. Here, I think it makes it much easier to see the source of the problem. Look at the last branch of the call tree to see that the error comes from j()
calling k()
.
NB: rlang::last_trace()
is ordered in the opposite way to traceback()
. We’ll come back to that issue in Section 22.4.2.4.
22.4 Interactive debugger
Sometimes, the precise location of the error is enough to let you track it down and fix it. Frequently, however, you need more information, and the easiest way to get it is with the interactive debugger which allows you to pause execution of a function and interactively explore its state.
If you’re using RStudio, the easiest way to enter the interactive debugger is through RStudio’s “Rerun with Debug” tool. This reruns the command that created the error, pausing execution where the error occurred. Otherwise, you can insert a call to browser()
where you want to pause, and re-run the function. For example, we could insert a call browser()
in g()
:
browser()
is just a regular function call which means that you can run it conditionally by wrapping it in an if
statement:
In either case, you’ll end up in an interactive environment inside the function where you can run arbitrary R code to explore the current state. You’ll know when you’re in the interactive debugger because you get a special prompt:
In RStudio, you’ll see the corresponding code in the editor (with the statement that will be run next highlighted), objects in the current environment in the Environment pane, and the call stack in the Traceback pane.
22.4.1browser()
commands
As well as allowing you to run regular R code, browser()
provides a few special commands. You can use them by either typing short text commands, or by clicking a button in the RStudio toolbar, Figure 22.1:
Next,
n
: executes the next step in the function. If you have avariable namedn
, you’ll needprint(n)
to display its value.Step into, or
s
:works like next, but if the next step is a function, it will step into thatfunction so you can explore it interactively.Finish, or
f
:finishes execution of the current loop or function.Continue,
c
: leaves interactive debugging and continues regular executionof the function. This is useful if you’ve fixed the bad state and want tocheck that the function proceeds correctly.Stop,
Q
: stops debugging, terminates the function, and returns to the globalworkspace. Use this once you’ve figured out where the problem is, and you’reready to fix it and reload the code.
There are two other slightly less useful commands that aren’t available in the toolbar:
Enter: repeats the previous command. I find this too easy to activateaccidentally, so I turn it off using
options(browserNLdisabled = TRUE)
.where
: prints stack trace of active calls (the interactive equivalent oftraceback
).
22.4.2 Alternatives
There are three alternatives to using browser()
: setting breakpoints in RStudio, options(error = recover)
, and debug()
and other related functions.
22.4.2.1 Breakpoints
In RStudio, you can set a breakpoint by clicking to the left of the line number, or pressing Shift + F9
. Breakpoints behave similarly to browser()
but they are easier to set (one click instead of nine key presses), and you don’t run the risk of accidentally including a browser()
statement in your source code. There are two small downsides to breakpoints:
There are a few unusual situations in which breakpoints will not work.Read breakpoint troubleshooting for more details.
RStudio currently does not support conditional breakpoints.
22.4.2.2recover()
Another way to activate browser()
is to use options(error = recover)
. Now when you get an error, you’ll get an interactive prompt that displays the traceback and gives you the ability to interactively debug inside any of the frames:
You can return to default error handling with options(error = NULL)
.
22.4.2.3debug()
Another approach is to call a function that inserts the browser()
call for you:
debug()
inserts a browser statement in the first line of the specifiedfunction.undebug()
removes it. Alternatively, you can usedebugonce()
to browse only on the next run.utils::setBreakpoint()
works similarly, but instead of taking a functionname, it takes a file name and line number and finds the appropriate functionfor you.
These two functions are both special cases of trace()
, which inserts arbitrary code at any position in an existing function. trace()
is occasionally useful when you’re debugging code that you don’t have the source for. To remove tracing from a function, use untrace()
. You can only perform one trace per function, but that one trace can call multiple functions.
22.4.2.4 Call stack
Unfortunately, the call stacks printed by traceback()
, browser()
& where
, and recover()
are not consistent. The following table shows how the call stacks from a simple nested set of calls are displayed by the three tools. The numbering is different between traceback()
and where
, and recover()
displays calls in the opposite order.
traceback() | where | recover() | rlang functions |
---|---|---|---|
5: stop('...') | |||
4: i(c) | where 1: i(c) | 1: f() | 1. └─global::f(10) |
3: h(b) | where 2: h(b) | 2: g(a) | 2. └─global::g(a) |
2: g(a) | where 3: g(a) | 3: h(b) | 3. └─global::h(b) |
1: f('a') | where 4: f('a') | 4: i('a') | 4. └─global::i('a') |
RStudio displays calls in the same order as traceback()
. rlang functions use the same ordering and numbering as recover()
, but also use indenting to reinforce the hierarchy of calls.
22.4.3 Compiled code
It is also possible to use an interactive debugger (gdb or lldb) for compiled code (like C or C++). Unfortunately that’s beyond the scope of this book, but there are a few resources that you might find useful:
22.5 Non-interactive debugging
Debugging is most challenging when you can’t run code interactively, typically because it’s part of some pipeline run automatically (possibly on another computer), or because the error doesn’t occur when you run same code interactively. This can be extremely frustrating!
This section will give you some useful tools, but don’t forget the general strategy in Section 22.2. When you can’t explore interactively, it’s particularly important to spend some time making the problem as small as possible so you can iterate quickly. Sometimes callr::r(f, list(1, 2))
can be useful; this calls f(1, 2)
in a fresh session, and can help to reproduce the problem.
You might also want to double check for these common issues:
Is the global environment different? Have you loaded different packages?Are objects left from previous sessions causing differences?
Is the working directory different?
Is the
PATH
environment variable, which determines where externalcommands (likegit
) are found, different?Is the
R_LIBS
environment variable, which determines wherelibrary()
looks for packages, different?
22.5.1dump.frames()
dump.frames()
is the equivalent to recover()
for non-interactive code; it saves a last.dump.rda
file in the working directory. Later, an interactive session, you can load('last.dump.rda'); debugger()
to enter an interactive debugger with the same interface as recover()
. This lets you “cheat”, interactively debugging code that was run non-interactively.
22.5.2 Print debugging
If dump.frames()
doesn’t help, a good fallback is print debugging, where you insert numerous print statements to precisely locate the problem, and see the values of important variables. Print debugging is slow and primitive, but it always works, so it’s particularly useful if you can’t get a good traceback. Start by inserting coarse-grained markers, and then make them progressively more fine-grained as you determine exactly where the problem is.
Print debugging is particularly useful for compiled code because it’s not uncommon for the compiler to modify your code to such an extent you can’t figure out the root problem even when inside an interactive debugger.
22.5.3 RMarkdown
Debugging code inside RMarkdown files requires some special tools. First, if you’re knitting the file using RStudio, switch to calling rmarkdown::render('path/to/file.Rmd')
instead. This runs the code in the current session, which makes it easier to debug. If doing this makes the problem go away, you’ll need to figure out what makes the environments different.
If the problem persists, you’ll need to use your interactive debugging skills. Whatever method you use, you’ll need an extra step: in the error handler, you’ll need to call sink()
. This removes the default sink that knitr uses to capture all output, and ensures that you can see the results in the console. For example, to use recover()
with RMarkdown, you’d put the following code in your setup block:
This will generate a “no sink to remove” warning when knitr completes; you can safely ignore this warning.
If you simply want a traceback, the easiest option is to use rlang::trace_back()
, taking advantage of the rlang_trace_top_env
option. This ensures that you only see the traceback from your code, instead of all the functions called by RMarkdown and knitr.
22.6 Non-error failures
There are other ways for a function to fail apart from throwing an error:
A function may generate an unexpected warning. The easiest way to track downwarnings is to convert them into errors with
options(warn = 2)
and use thethe call stack, likedoWithOneRestart()
,withOneRestart()
,regular debugging tools. When you do this you’ll see some extra callswithRestarts()
, and.signalSimpleWarning()
. Ignore these: they areinternal functions used to turn warnings into errors.A function may generate an unexpected message. You can use
rlang::with_abort()
to turn these messages into errors:A function might never return. This is particularly hard to debugautomatically, but sometimes terminating the function and looking at the
traceback()
is informative. Otherwise, use use print debugging,as in Section 22.5.2.The worst scenario is that your code might crash R completely, leaving youwith no way to interactively debug your code. This indicates a bug incompiled (C or C++) code.
If the bug is in your compiled code, you’ll need to follow the links in Section22.4.3 and learn how to use an interactive C debugger(or insert many print statements).
If the bug is in a package or base R, you’ll need to contact the packagemaintainer. In either case, work on making the smallest possiblereproducible example (Section 1.7) to help the developer help you.
Original author(s) | Julian Seward |
---|---|
Developer(s) | Valgrind Development Team[1] |
Stable release | 3.16.1 (June 22, 2020; 10 months ago) [±][2] |
Repository | |
Operating system | Linux macOS Solaris Android[3] |
Type | Profiler, Memory debugger |
License | GNU General Public License |
Website | www.valgrind.org |
Valgrind (/ˈvælɡrɪnd/) is a programming tool for memory debugging, memory leak detection, and profiling.
Valgrind was originally designed to be a freememory debugging tool for Linux on x86, but has since evolved to become a generic framework for creating dynamic analysis tools such as checkers and profilers.
The name Valgrind is a reference to the main entrance of Valhalla from Norse Mythology. During development (before release) the project was named Heimdall; however, the name would have conflicted with a security package.
Overview[edit]
Valgrind is in essence a virtual machine using just-in-time (JIT) compilation techniques, including dynamic recompilation. Nothing from the original program ever gets run directly on the host processor. Instead, Valgrind first translates the program into a temporary, simpler form called Intermediate Representation (IR), which is a processor-neutral, SSA-based form. After the conversion, a tool (see below) is free to do whatever transformations it would like on the IR, before Valgrind translates the IR back into machine code and lets the host processor run it. Valgrind recompiles binary code to run on host and target (or simulated) CPUs of the same architecture. It also includes a GDB stub to allow debugging of the target program as it runs in Valgrind, with 'monitor commands' that allow querying the Valgrind tool for various information.
A considerable amount of performance is lost in these transformations (and usually, the code the tool inserts); usually, code run with Valgrind and the 'none' tool (which does nothing to the IR) runs at 20% to 25% of the speed of the normal program.[4][5]
Tools[edit]
Memcheck[edit]
There are multiple tools included with Valgrind (and several external ones). The default (and most used) tool is Memcheck. Memcheck inserts extra instrumentation code around almost all instructions, which keeps track of the validity (all unallocated memory starts as invalid or 'undefined', until it is initialized into a deterministic state, possibly from other memory) and addressability (whether the memory address in question points to an allocated, non-freed memory block), stored in the so-called V bits and A bits respectively. As data is moved around or manipulated, the instrumentation code keeps track of the A and V bits, so they are always correct on a single-bit level.
In addition, Memcheck replaces the standard C memory allocator with its own implementation, which also includes memory guards around all allocated blocks (with the A bits set to 'invalid'). This feature enables Memcheck to detect off-by-one errors where a program reads or writes outside an allocated block by a small amount. The problems Memcheck can detect and warn about include the following:
- Use of uninitialized memory
- Reading/writing memory after it has been
free
'd - Reading/writing off the end of
malloc
'd blocks
The price of this is lost performance. Programs running under Memcheck usually run 20–30 times slower[6] than running outside Valgrind and use more memory (there is a memory penalty per allocation). Thus, few developers run their code under Memcheck (or any other Valgrind tool) all the time. They most commonly use such tools either to trace down some specific bug, or to verify that there are no latent bugs (of the kind Memcheck can detect) in the code.
Alternatives To Valgrind
Other tools[edit]
Alternatives To Valgrind On Windows
In addition to Memcheck, Valgrind has several other tools:[7]
- None, runs the code in the virtual machine without performing any analysis and thus has the smallest possible CPU and memory overhead of all tools. Since valgrind itself provides a trace back from a segmentation fault, the none tool provides this traceback at minimal overhead.
- Addrcheck, similar to Memcheck but with much smaller CPU and memory overhead, thus catching fewer types of bugs. Addrcheck has been removed as of version 3.2.0.[8]
- Massif, a heapprofiler. The separate GUI massif-visualizer visualizes output from Massif.
- Helgrind and DRD, detect race conditions in multithreaded code
- Cachegrind, a cache profiler. The separate GUI KCacheGrind visualizes output from Cachegrind.
- Callgrind, a callgraph analyzer created by Josef Weidendorfer was added to Valgrind as of version 3.2.0. KCacheGrind can visualize output from Callgrind.
- DHAT, dynamic heap analysis tool which analyzes how much memory is allocated and for how long as well as patterns of memory usage.
- exp-sgcheck (named exp-ptrcheck prior to version 3.7), an experimental tool to find stack and global array overrun errors which Memcheck cannot find.[9] Some code results in false positives from this tool.[10]
- exp-bbv, a performance simulator that extrapolates performance from a small sample set.
There are also several externally developed tools available. One such tool is ThreadSanitizer, another detector of race conditions.[11][12]
Platforms supported[edit]
As of version 3.4.0, Valgrind supports Linux on x86, x86-64 and PowerPC. Support for OS X was added in version 3.5.0.[13] Support for Linux on ARMv7 (used for example in certain smartphones) was added in version 3.6.0.[14] Support for Solaris was added in version 3.11.0.[3] There are unofficial ports to other UNIX-like platforms (like FreeBSD,[15]OpenBSD,[16] and NetBSD[17]). From version 3.7.0 the ARM/Android platform support was added.[3]
Since version 3.9.0 there is support for Linux on MIPS64 little and big endian, for MIPS DSP ASE on MIPS32, for s390x Decimal Floating Point instructions, for POWER8 (Power ISA 2.07) instructions, for Intel AVX2 instructions, for Intel Transactional Synchronization Extensions, both RTM and HLE and initial support for Hardware Transactional Memory on POWER.[2]
History and development[edit]
It is named after the main entrance to Valhalla in Norse mythology.[18]
Alternative To Valgrind Mac
The original author of Valgrind is Julian Seward, who in 2006 won a Google-O'Reilly Open Source Award for his work on Valgrind.[19][20]
Several others have also made significant contributions, including Cerion Armour-Brown, Jeremy Fitzhardinge, Tom Hughes, Nicholas Nethercote, Paul Mackerras, Dirk Mueller, Bart Van Assche, Josef Weidendorfer, and Robert Walsh.[21]
It is used by a number of Linux-based projects.[22]
Limitations of Memcheck[edit]
In addition to the performance penalty, an important limitation of Memcheck is its inability to detect all cases of bounds errors in the use of static or stack-allocated data.[23] The following code will pass the Memcheck tool in Valgrind without incident, despite containing the errors described in the comments:
The experimental valgrind tool exp-sgcheck has been written to address this limitation in Memcheck. It will detect array overrun errors, provided the first access to an array is within the array bounds. Note that exp-sgcheck will not detect the array overrun in the code above, since the first access to an array is out of bounds, but it will detect the array overrun error in the following code.
The inability to detect all errors involving the access of stack allocated data is especially noteworthy sincecertain types of stack errors make software vulnerable to the classicstack smashing exploit.
See also[edit]
- AddressSanitizer et al.
Notes[edit]
- ^https://valgrind.org/info/developers.html
- ^ abValgrind News
- ^ abcValgrind release notes
- ^Valgrind homepage
- ^Valgrind Manual
- ^https://valgrind.org/docs/manual/quick-start.html#quick-start.mcrun
- ^Valgrind main tool list
- ^[1]
- ^section on exp-sgcheck in the Valgrind user manual
- ^[2]
- ^https://valgrind.org/downloads/variants.html
- ^K Serebryany, T Iskhodzhanov, ThreadSanitizer–data race detection in practice, Proceedings of the Workshop on Binary Instrumentation and Applications WBIA'09
- ^OS X port
- ^ARM/Linux port
- ^Valgrind FreeBSD port
- ^Valgrind OpenBSD port
- ^'Valgrind NetBSD port'. Archived from the original on 2006-02-09. Retrieved 2006-01-28.
- ^Valgrind FAQ
- ^valgrind.org's list of awards
- ^Google-O'Reilly Open Source Awards – Hall of Fame
- ^The Valgrind Developers
- ^valgrind.org's list of users
- ^Valgrind FAQ
References[edit]
- Nethercote, Nicholas; Seward, Julian. 'Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation'. Proceedings of ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007). ACM.
- Seward, Julian; Nethercote, Nicholas. 'Using Valgrind to detect undefined value errors with bit-precision'. Proceedings of the USENIX Annual Technical Conference 2005. USENIX Association.
- Seward, J.; Nethercote, N.; Weidendorfer, J.; Valgrind Development Team (March 2008). Valgrind 3.3 – Advanced Debugging and Profiling for GNU/Linux applications. Network Theory Ltd. pp. 164 pages. ISBN0-9546120-5-1.
External links[edit]
Wikibooks has a book on the topic of: Valgrind |