seisflows.system.slurm
The Simple Linux Utility for Resource Management (SLURM) is a commonly used workload manager on many high performance computers / clusters. The Slurm system class provides generalized utilites for interacting with Slurm systems.
- Useful commands for figuring out system-specific required parameters
$ sinfo –Node –long # Determine the cores-per-node for partitions
Note
The main development system for SeisFlows used SLURM. Therefore the other system supers will not be up to date until access to those systems are granted. This rosetta stone, for converting from SLURM to other workload management tools will be useful: https://slurm.schedmd.com/rosetta.pdf
Note
SLURM systems expect walltime/tasktime in format: “minutes”, “minutes:seconds”, “hours:minutes:seconds”. SeisFlows uses the latter and converts task and walltimes from input of minutes to a time string.
- TODO
Create ‘slurm_singulairty’, a child class for singularity-based runs which loads and runs programs through singularity, OR add a parameter options which will change the run and/or submit calls
Classes
System Slurm |
Module Contents
- class seisflows.system.slurm.Slurm(ntask_max=100, slurm_args='', **kwargs)
Bases:
seisflows.system.cluster.ClusterSystem Slurm
Interface for submitting and monitoring jobs on HPC systems running the Simple Linux Utility for Resource Management (SLURM) workload manager.
Parameters
- type slurm_args:
str
- param slurm_args:
Any (optional) additional SLURM arguments that will be passed to the SBATCH scripts. Should be in the form: ‘–key1=value1 –key2=value2”
Paths
- __doc__
- ntask_max = 100
- slurm_args = ''
- partition = None
- submit_to = None
- _partitions
- _completed_states = ['COMPLETED']
- _failed_states = ['TIMEOUT', 'FAILED', 'NODE_FAIL', 'OUT_OF_MEMORY', 'CANCELLED']
- _pending_states = ['PENDING', 'RUNNING']
- check()
Checks parameters and paths
- property nodes
Defines the number of nodes which is derived from system node size
- property node_size
Defines the node size of a given cluster partition. This is a hard set number defined by the system architecture
- property submit_call_header
The submit call defines the SBATCH header which is used to submit a workflow task list to the system. It is usually dictated by the system’s required parameters, such as account names and partitions. Submit calls are modified and called by the submit function.
- Return type:
str
- Returns:
the system-dependent portion of a submit call
- run_call(executable='', single=False, array=None, tasktime=None)
The run call defines the SBATCH call which is used to run tasks during an executing workflow. Like the submit call its arguments are dictated by the given system. Run calls are modified and called by the run function
- Parameters:
exectuable – the actual exectuable to run within the SBATCH directive. Something like ‘./script.py’
array (str) – overwrite the array variable to run specific jobs. If not provided, then we will run jobs 0-{ntask}%{ntask_max}. Jobs should be submitted in the format of a SLURM array string, something like: 0,1,3,5 or 2-4,8-22
single (bool) – flag to get a run call that is meant to be run on the mainsolver (ntask==1), or run for all jobs (ntask times). Examples of single process runs include smoothing, and kernel combination
- Return type:
str
- Returns:
the system-dependent portion of a run call
- static _stdout_to_job_id(stdout)
The stdout message after an SBATCH job is submitted, from which we get the job number, differs between systems, allow this to vary
Note
Examples 1) standard example: Submitted batch job 4738244 2) (1) with ‘–parsable’ flag: 4738244 3) federated cluster: Submitted batch job 4738244; Maui 4) (3) with ‘–parsable’ flag: 4738244; Maui
This function deals with cases (2) and (4). Other systems that have more complicated stdout messages will need to overwrite this function
- Parameters:
stdout (str) – standard SBATCH response after submitting a job with the ‘–parsable’ flag
- Return type:
str
- Returns:
a matching job ID. We convert str->int->str to ensure that the job id is an integer value (which it must be)
- Raises:
SystemExit – if the job id does not evaluate as an integer
- run(funcs, single=False, tasktime=None, array=None, _attempts=0, **kwargs)
Runs task multiple times in embarrassingly parallel fasion on a SLURM cluster. Executes the list of functions (funcs) NTASK times with each task occupying NPROC cores.
Note
Completely overwrites the Cluster.run() command
- Parameters:
funcs (list of methods) – a list of functions that should be run in order. All kwargs passed to run() will be passed into the functions.
single (bool) – run a single-process, non-parallel task, such as smoothing the gradient, which only needs to be run by once. This will change how the job array and the number of tasks is defined, such that the job is submitted as a single-core job to the system.
tasktime (float) – Custom tasktime in units minutes for running the given functions funcs. If not given, defaults to the System variable tasktime. If tasks exceed the given tasktime, the program will exit
array (str) – overwrite the array variable to run specific jobs. If not provided, then we will run jobs 0-{ntask}%{ntask_max}. Jobs should be submitted in the format of a SLURM array string, something like: 0,1,3,5 or 2-4,8-22
_attempts (int) – a recursive counter for failed job runs that allows the run function to re-attempt failed jobs up to rerun number of times
- task_ids(single=False)
Overwrite system.workstation.task_ids to get SLURM specific array configurations which are passed as strings for the –array={task_ids()} SLURM directive, rather than lists which is how system.workstation handles this
Relevant format definition: https://slurm.schedmd.com/job_array.html
- Parameters:
single (bool) – If we only want to run a single process, this is will default to TaskID == 0
- Return type:
str
- Returns:
string formatter of Task IDs to be used by the run function via the run_call
- query_job_states(job_id, sort=False)
Overwrites system.cluster.Cluster.query_job_states
Queries completion status of an array job by running the SLURM sacct
Note
The actual command line call wil look something like this $ sacct -nLX -o jobid,state -j 441630 441630_0 PENDING 441630_1 COMPLETED
Note
SACCT flag options are described as follows: -L: queries all available clusters, not just the cluster that ran
the sacct call. Used for federated clusters
- -X: supress the .batch and .extern jobnames that are normally
returned but don’t represent that actual running job
- Parameters:
job_id (str) – main job id to query, returned from the subprocess.run that ran the jobs
sort (bool) – sort by job ids or job array ids. Defaults to False because currently running jobs may return job numbers that cannot be sorted e.g., 1_0, 1_1, 1_[2-5]. We only use sort when recovering from job failure because then we are assured that all jobs have run.
- Return type:
(list, list)
- Returns:
(job ids, corresponding job states). Returns (None, None) if sacct does not return a useful stdout (e.g., jobs have not yet initialized on system)