RSH 14: How to tame the cluster

You've got code, you've got the cluster. Now we connect them

Collaborative notes taken during the session

Something you know/use now on a supercomputer that you wish somebody had told/shown you years ago:

documentation
safe settings for bash scripts to stop at undefined variables
submit scripts don't have to be in bash
how to use scp
minimalism (minimum environment, minimum options/specification, no cargo culting)
containers, venv and other means of isolation
messy setup -> non-reproducible mess

Maybe I missed it, but do you check sometimes the different nodes if each of them is used?
- The SLURM scheduler should only be giving you an open node to run your tasks.
Does this cluster have hyperthreading enabled?
- This is really good to check.
Is it always like that, #cpus=#threads ?
- for purely shared-memory jobs like this one, yes
- but for hybrid OpenMP+MPI calculations
--mem vs --mem-per-cpu
- First allocates memory per node, latter allocates memory per core
How do you determine whether you should use OpenMP/MPI or array jobs for your task?
- If the tasks are independent and don't need to communicate, array jobs is simpler and does not require to change the code. MPI might be good if the tasks need to communicate. Hybrid OpenMP/MPI might be good if we want to use shared memory but at the same time go beyond one node.

https://github.com/AaltoSciComp/hpc-examples
MPI example can be found here: https://aaltoscicomp.github.io/python-for-scicomp/parallel/#mpi