RSH 13: Cluster etiquette
How and why computer clusters are used for work
Collaborative notes taken during the session
Icebreaker question: What stops you from using a cluster?
- Primarily, my lack of knowledge about using a cluster for daily programming work. For example, I would not think of using a HPC cluster unless I’m working on a project that involves some kind of HPC in it or that deals with large scale distributed computing. Although I do think that you can do some nice experiments and benchmarking of your software on a cluster. The obstacle for me in that case is that I need to be familiar with the parallel programming techniques and libraries to be able to create chunks of programs that will run parallely on the cluster.This may be a bit of a steep learning curve.
Does this diagram represent a standard structure of a cluster?, or at least maybe most setups are like this?
- many clusters are like this (many more compute nodes). sometimes more than one login node, sometimes compute nodes carry also local disks for fast disk access for temporary files.
Can you explain the .slrm extension for the script you used?
- it was personal convention, no special meaning. Some people like to call these scripts job.sh or run.sh.
If I understand correctly, when you ran the file as a python script (the very first time you ran the code on the cluster), you ended up sharing computing resources with other people running their jobs and it took 11 seconds. But when you ran it as .slrm bash script, you entered a queue and got dedicated computing resources for yourself, hence it took 6 sec instead of 11. Did I interpret this right?
- Ok, that clarifies. Thanks!
Asking support questions: