seisflows.system.maui
Maui is a New Zealand eScience Infrastructure (NeSI) high performance computer. Maui operates on a SLURM workload manager and therefore overloads the SLURM System module. Maui-specific parameters and functions are defined here.
Information on Maui can be found here: https://support.nesi.org.nz/hc/en-gb/articles/360000163695-M%C4%81ui
Note
Python and conda capabilities are NOT accessible from Maui, these capabilities have been shifted onto a separate cluster: Maui ancil This subclass therefore moves all Python dependent capabilities (i.e., SeisFlows3, Pyatoa) onto the ancilary cluster.
See also: https://support.nesi.org.nz/hc/en-gb/articles/ 360000203776-M%C4%81ui-Ancillary-Nodes
Classes
System Maui |
Module Contents
- class seisflows.system.maui.Maui(account=None, cpus_per_task=1, cluster='maui', partition='nesi_research', ancil_cluster='maui_ancil', ancil_partition='nesi_prepost', ancil_tasktime=1, **kwargs)
Bases:
seisflows.system.slurm.SlurmSystem Maui
New Zealand Maui-specfic modifications to base SLURM system
Parameters
- type account:
str
- param account:
Maui account to submit jobs under, will be used for the ‘–account’ sbatch argument
- type cpus_per_task:
int
- param cpus_per_task:
allow for multiple cpus per task, i.e,. multithreaded jobs
- type cluster:
str
- param cluster:
cluster to submit jobs to. Available are Maui and Mahuika
- type partition:
str
- param partition:
partition of the cluster to submit jobs to.
- type ancil_cluster:
str
- param ancil_cluster:
name of the ancilary cluster used for pre- post-processing tasks.
- type ancil_partition:
name of the partition of the ancilary cluster
- type ancil_tasktime:
int
- param ancil_tasktime:
Tasktime in minutes for pre and post-processing jobs submitted to Maui ancil.
Paths
- __doc__ = Multiline-String
Show Value
""" Workstation System [System Base] -------------------------------- Defines foundational structure for System module. When used standalone, runs solver tasks either in serial (if `nproc`==1; i.e., without MPI) or in parallel (if `nproc`>1; i.e., with MPI). All other tasks are run in serial. Parameters ---------- :type ntask: int :param ntask: number of individual tasks/events to run during workflow. Must be <= the number of source files in `path_specfem_data` :type nproc: int :param nproc: number of processors to use for each simulation. Choose 1 for serial simulations, and `nproc`>1 for parallel simulations. :type tasktime: float :param tasktime: maximum job time in units minutes for each job spawned by the SeisFlows master job during a workflow. These include, e.g., running the forward solver, adjoint solver, smoother, kernel combiner. All spawned tasks receive the same task time. Fractions of minutes acceptable. If set as `None`, no tasktime will be enforced. :type mpiexec: str :param mpiexec: MPI executable on system. Defaults to 'mpirun -n ${NPROC}' :type array: str :param array: for `ntask` > 1, determine which tasks to submit to run. By default (NoneType) this submits all task IDs [0:ntask), or for single runs, submits only the first task ID, 0. However, for debugging or manual control purposes, Users may input a string of task IDs that they would like to run. Follows formatting of SLURM array directive (https://slurm.schedmd.com/job_array.html), which is, for example: 1,2,3-8:2,10 -> 1,2,3,5,7,10 where '-' denotes a range (inclusive), and ':' denotes an optional step. If ':' step is not given for a range, then step defaults to 1. :type rerun: int :param rerun: [EXPERIMENTAL FEATURE] attempt to re-run failed tasks or array tasks submitted with `run`. Collects information about failed jobs (or array jobs) after a failure, and re-submits with `run`. `rerun` is an integer defining how many times the User wants System to try and rerun before failing the entire job. If 0 (default), a single task failure will cause main job failure. :type log_level: str :param log_level: logger level to pass to logging module. Available: 'debug', 'info', 'warning', 'critical' :type verbose: bool :param verbose: if True, formats the log messages to include the file name and line number of the log message in the source code, as well as the message and message type. Useful for debugging but also very verbose so not recommended for production runs. Paths ----- :type path_output_log: str :param path_output_log: path to a text file used to store the outputs of the package wide logger, which are also written to stdout :type path_par_file: str :param path_par_file: path to parameter file which is used to instantiate the package :type path_log_files: str :param path_log_files: path to a directory where individual log files are saved whenever a number of parallel tasks are run on the system. *** The Cluster class provides the core utilities interaction with HPC systems which must be overloaded by subclasses for specific workload managers, or specific clusters. The `Cluster` class acts as a base class for more specific cluster implementations (like SLURM). However it can be used standalone. When running jobs on the `Cluster` system, jobs will be submitted to the master system using `subprocess.run`, mimicing how jobs would be run on a cluster but not actually submitting to any job scheduler. The Simple Linux Utility for Resource Management (SLURM) is a commonly used workload manager on many high performance computers / clusters. The Slurm system class provides generalized utilites for interacting with Slurm systems. Useful commands for figuring out system-specific required parameters $ sinfo --Node --long # Determine the cores-per-node for partitions .. note:: The main development system for SeisFlows used SLURM. Therefore the other system supers will not be up to date until access to those systems are granted. This rosetta stone, for converting from SLURM to other workload management tools will be useful: https://slurm.schedmd.com/rosetta.pdf .. note:: SLURM systems expect walltime/tasktime in format: "minutes", "minutes:seconds", "hours:minutes:seconds". SeisFlows uses the latter and converts task and walltimes from input of minutes to a time string. TODO Create 'slurm_singulairty', a child class for singularity-based runs which loads and runs programs through singularity, OR add a parameter options which will change the run and/or submit calls Maui is a New Zealand eScience Infrastructure (NeSI) high performance computer. Maui operates on a SLURM workload manager and therefore overloads the SLURM System module. Maui-specific parameters and functions are defined here. Information on Maui can be found here: https://support.nesi.org.nz/hc/en-gb/articles/360000163695-M%C4%81ui .. note:: Python and conda capabilities are NOT accessible from Maui, these capabilities have been shifted onto a separate cluster: Maui ancil This subclass therefore moves all Python dependent capabilities (i.e., SeisFlows3, Pyatoa) onto the ancilary cluster. See also: https://support.nesi.org.nz/hc/en-gb/articles/ 360000203776-M%C4%81ui-Ancillary-Nodes """
- account = None
- cluster = 'maui'
- partition = 'nesi_research'
- cpus_per_task = 1
- ancil_cluster = 'maui_ancil'
- ancil_partition = 'nesi_prepost'
- ancil_tasktime = 1
- _partitions
- _available_clusters = ['maui', 'mahuika']
- check()
Checks parameters and paths
- property submit_call_header
The submit call defines the SBATCH header which is used to submit a workflow task list to the system. It is usually dictated by the system’s required parameters, such as account names and partitions. Submit calls are modified and called by the submit function.
Note
The master job must be run on maui_ancil because Maui does not have the ability to run the command “sacct”, nor can it not have the ability to run the command “sacct”, nor can it use the Conda environment that has been set by Ancil
Note
We do not place SLURMARGS into the sbatch command to avoid the export=None which will not propagate the conda environment
- Return type:
str
- Returns:
the system-dependent portion of a submit call
- property run_call_header
The run call defines the SBATCH header which is used to run tasks during an executing workflow. Like the submit call its arguments are dictated by the given system. Run calls are modified and called by the run function
- Return type:
str
- Returns:
the system-dependent portion of a run call
- property ancil_run_call_header
A modified form of run_call which is used to run jobs on the Ancil pre/postprocessing cluster of Maui. This is used to run Pyaflowa jobs which require the Conda environment active on Maui Ancil.