RSH 5: Interview with a research software engineer
Video
Collaborative notes taken during the session
What did you work on in the past week?
- started to package one of my old research code (clean up, more tests, better pipeline, compatibility with sklearn) --> got inspired here
- ah, pipeline is more like: better usage from data to predictions
- Q: Maybe we could once talk about how to design test-code / -cases?
- I started to extensively test my program, this is important!
Interview with Sarah Gibson
- I like that, "two ends of the same spectrum"
- [name=Sarah] Terms origins: https://www.software.ac.uk/research-software-engineers
- this is an important aspect. As you know science first hand, you understand the needs and can talk to the community
- The book: https://the-turing-way.netlify.app/index.html
- Binder: https://mybinder.org/
- Q: example or live demo?
- yes we should
- yes - binderize a repo
- [name=Sarah] bit.ly/zero-to-binder-tutorial or bit.ly/zero-to-binderhub-workshop
- [name=Kirstie] Workshop that Sarah is instructing at on June 11 - open for everyone :sparkles: https://www.eventbrite.co.uk/e/workshop-boost-research-reproducibility-with-binder-registration-105228216428
- in CodeRefinery we could use repo2docker to convert the requirements.txt to an image
- What do you know now, that you would have liked known when you started?
- [name=Sarah] Version control is the magic time machine I wish I'd had during my PhD and everything broke :joy:
Snakemake files
- I guess they are also more often read than programmed, so readability is more important, right?
- Snakemake seems a very nice option. I think I may switch to it for our course as well. Make is very useful for keeping our data files updated (there are all sorts of preprocessing steps after which one make sure everything down the line is updated), but its syntax is super old and students don't really like fighting with it :) This seems much more easy to get through, thanks for the idea :) (from chat)
- the comment is Make syntax is super old (from chat)
- Are there noteworthy alternatives to snakemake? Somebody mentioned its syntax is super old...
Versioning data
- Data Version Control: https://dvc.org/
- [name=Richard] apparently dvc even added telemetry without adding notifications in advance: https://github.com/iterative/dvc/issues/1449
- Alternatives:
- https://git-annex.branchable.com/
- https://git-lfs.github.com/
- Autoformatting with clang-format
- [name=lnw] the 'Allman' is useful but I think for all options one does not set, a default is used.
Smaller demonstrations
- A couple of tools I started using which reimplemented classic tools
- [name=Radovan] find→fd, grep→ripgrep, ls→exa, cat→bat
Content ideas
- More advanced topics, please :)
- Debugging segfaults!
- More advanced is good, but of course you'll lose a subsection of people ...
- I for one cannot follow well the more advanced parts, but I do not mind them either. I think the mix has been good, and I like that I always learn something unexpected! :)
- I'm thinking that focusing on a topic for longer can make it easier for people to recognize what they want to know and ask questions. I do like the short mentions of several topics, but I think topic is changing a bit too fast for people to ask questions. I think this is the difference with the previous sessions (where you held the focus for a longer time). That said, I do enjoy the random idea drop very much, so please do keep doing that too :)
- advanced usage of GDB
- How to write proper unit tests for more complicated functions. (+)
- I was quite happy about snakemake. Any tool that makes dealing with pipelines easier would be useful to my students, so would be interested in finding out more.