RSH 16: Debugging

Debugging: we all do it, we are never taught it. How we approach it and some tools we use

Video

Collaborative notes taken during the session

Ice-breaker question

What was the most difficult bug you have chased/solved? What was the longest debugging session/period for you?

couple of days for one line of code
couple of hours (around 10h) and learning that it was not straightforward to put simple text into an HDF5 file
couple of days, most difficult was to find the bug in a part where i was abslutely sure it was right
couple of days, gave up for a few months (on that new module), solved it in a few hours next time I looked at it. The issue with locating the bug was that debugging/print statements led me to believe that the error was in the wrong part of the code, due to asynchronous execution. In the end, the error was a simple index error.
over a week. A bug that was very hard to reproduce and trigger. Software had poor test coverage.

Questions and comments

Is this related? How to keep an overview about the code that gets more and more complex?
- definitely related. the more modular and "well structured" code, the easier it may be to locate/corner the bug.
Python logging module: https://docs.python.org/3/library/logging.html
Git bisect exercise: https://github.com/coderefinery/git-bisect-exercise/
Error message:

Code:

import ipyparallel as ipp
rc = ipp.Client()
rc.ids
executor = rc.become_dask(ncores=4)
executor

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/ipyparallel/controller/hub.py", line 559, in dispatch_query
    handler(idents, msg)
  File "/opt/conda/lib/python3.7/site-packages/ipyparallel/controller/hub.py", line 1467, in become_dask
    self.distributed_scheduler = scheduler = Scheduler(**kwargs)
  File "/opt/conda/lib/python3.7/site-packages/distributed/scheduler.py", line 885, in __init__
    self.bandwidth = parse_bytes(dask.config.get("distributed.scheduler.bandwidth"))
  File "/opt/conda/lib/python3.7/site-packages/dask/utils.py", line 1164, in parse_bytes
    s = s.replace(" ", "")
AttributeError: 'int' object has no attribute 'replace'

Do you not start at the bottom and read up?
- this is what I most often do in these cases
  - oops :-) it seems Python and Fortran/C unwind errors in different orders
- Would say depends a little on the kind of program you are debugging. When using frameworks sometimes the answer is in the middle of the traceback.
- I usually

Check changelog as well. Might be that a fix is mentioned.
- great point!
- changelogs, issues, pull requests

"Sleep on it" :smile: :+1:

Ask somebody else to have a look?
- Also explain it to someone else - See Rubber duck debugging
Also just trying to explain to the little rubber duck what is happening and what should actually happen.
- Often I have found the solution by explaining the problem to somebody. So this is real :-)
The Zen of Python: https://www.python.org/dev/peps/pep-0020/
- or import this
set -e
set -euo pipefail
set -x
-q -v -vv, -vvv

"debuggers do not remove bugs, they run your code in slow motion" [citation needed]

TIL: https://en.wikipedia.org/wiki/Heisenbug

The example that we use gdb on (later also Valgrind): https://github.com/ResearchSoftwareHour/demo-debugging :+1:

fortran example: fortran/bugs/
- -g -fpe-trap=zero,invalid,overflow,underflow
Would it be easier to do this in an IDE (?), e.g. like Spyder, where one could see the variable and their values?
- yes I think so, it's a lot easier to follow the position and see the variables
Is there a way to print all variables and their values currently in memory? +1
- yes (I don't remember off-hand how, though)
- try help all in gdb
  - how about in the python (was it pdb?)
    - import pdb ; pdb.set_trace()
I remember in Matlab I could put a command into the code and it would exactly stop there and give me access to the values save in this current moment.
- yes - a "breakpoint". in python this would be breakpoint() after importing pdb
- in visual studio code you can also click next to a line and it sets a breakpoint at that line (never used though, just did it by accident and wondered what was wrong :D) - Thank you! - See also debug adapter protocol and a list of supported languages
For Rust developers, I was happy to find out about the macro dbg!()
Have you heard about sourcery?
- I haven't, please tell us more ...
- https://sourcery.ai/ ?
  - Yes, I heard about it in a podcast. I thought it checks you code and makes suggestions, maybe another sort of debugging?
awesome! I missed the variable inpector from Rstudio when starting with python. Now will look into if there is same in VSCode, do you know? -> seems there is under the run debug view (play sign with bug)
In Matlab the very similar variable inspector could handle the changing variables when one goes into the functions.
anyone with debugging advice on Haskell? Would like to hear/read any stories.
- I would also like to know :-)
https://valgrind.org/
@bast any chance of showing how to approach a segfault? +2
even describing what it is. It's still a bit vague to me.
- https://www.geeksforgeeks.org/core-dump-segmentation-fault-c-cpp/
- Thanks!
Python takes care of memory leaks?
- Not really. You can still have them but they have a different form. For instance Python has a Garbage Collector that ensures unused memory is released at specific times when no additional references exist to this memory. You can still have a memory leak if you do something like:

def func(mylist=[]):
    mylist.append("hello")
    print(mylist)

func()
func()
...

any advice on how to detect memory leaks (that are small when testing but may increase later when running with real data) in python?
- Test the code in loops, repeatedly calling it.
- Plot your memory usage over time
  - yes, memory profiling. ideally at the end it should be back to zero
    - is there specific tools for memory profiling?
Any suggestion on detecting memory leaks in CUDA? I know about cuda-memcheck (run with cuda-memcheck [memcheck_options] app_name [app_options]). Are there other/better alternatives?
- Looking around, I found Cudagrind, it hasn't been updated for many years though. Would be nice to see what else can be used.
Is pair programming sitting down together and work it out together? If yes, it was super fun! A scientist and a software engineer. I guess we learned a lot from each other.
- Can be but usually refers to "collaborating" in solving problems or implementing solutions (my understanding)
  - My understanding is original, two people and one set of code, one editor.
- there is also "mob programming": one person types, many others (Twitch :P) are in the same room and navigate, and people give the keyboard to others after some time
  - haha, yes
RSH live from Nordic RSE: https://nordic-rse.org/events/2020-online-get-together/

Thank you! :) Learned a lot and will start debugging now

Thanks for another great episode of RSHour! :heart: +1 :+1:

Thank you :-)

Notes we used when planning this session

Introduction

This is a really challenging topic to talk about, since there are
- So many ways to go wrong
- So many ways to fix it
- No one right answer, it's more of an art than a science
https://en.wikipedia.org/wiki/List_of_software_bugs

Types of bugs

syntax errors
runtime errors
- Hard errors that are clearly reported as errors
results are wrong (like the kelvin example)
Heisenbug: trying to study the bug changes it, e.g. adding print statements changes timings. Memory locations change. etc
- Compiled with or without optimizations. Stepped through with debugger to eliminate race conditions.
- you add a print statement and bug goes away: memory bugs
schrödinbug: it never worked, but you never noticed it.
Local (clearly identify to one line) vs systematic (a property of the whole system)

Approach to debugging

Reality not as fancy as you might thing
Take a break
Reducing size of the problem
- Make it run faster, but still produce the bug (smaller input data, less iterations, etc)
- Removing degrees of freedom: Disable optional features until you get straight to it
- i thought efficient debugging is like efficient tree-search of possibilities "inside our head": eliminating as big branches as possible as early as possible.
Finding the point of the problem
- bisection
  - git bisect: when you have good version control, when was it introduced?
  - Turn off optional features and see if the problem still occurs
  - "Deactivating code"/skipping code to locate memory problems: making the result wrong but making the code not crash
- git grep
- it can be useful to make the code produce "not scientifically meaningful" results for the sake of debugging
Now, you roughly know the point. What now?
- Reading error messages
  - How to approach stack traces and finding the problem
  - How to pick the interesting part out of error message: read from both the top and the bottom
  - internet-searching solutions with the right error message
Turn it into a unit test or example script, that is self-contained
- Can you make the example portable to someone else to try (git, conda, containers, etc?)
- This is basically a prerequisite to asking someone for help
Asking for help
- When to ask for help (after you have narrowed it down)
- What to include when asking (not "it doesn't work")
- what to do if it crashes/fails in somebody else's code (library or package)

Preparing code for debugging

various points from Zen of Python, for example "Errors should never pass silently / unless explicitly silenced", but there is more relevant in it too
print debugging
writing good error messages, catching errors the right way
- don't trap all errors and ignore them
logging and verbosity
- shell: set -x
- -v, -q
- stdout vs stderr for printing stuff
- logging module show example (rkdarst)
  - Everything defined in levels: debug, info, warning, error, critical
  - https://github.com/NordicHPC/envkernel/blob/master/envkernel.py
assertions, programming for safety
- shell script strict modes
  - set -e ; set -u ; set -o pipefail . often done as set -euo pipefail
debug compile flags

Debuggers and debugging tools

"debuggers do not remove bugs, they run your code in slow motion"
gdb/pdb type interface (AF), let it run until error happens
- demo-debugging/fortran/bugs
- explicit error to make it start
print statements
- Useful as starting point
- Maybe use logging module instead?
jupyter: %%debug (RD)
import IPython ; IPython.embed() or or from code import interact; interact()
Valgrind and memory bugs (Radovan)
- C/C++ example with 3 memory bugs (use after free, memory leak, out of bounds access)
debugging through IDE; remote debugging
variable inspector (RD)

Bonus if we have time

git bisect example