seisflows.system.maui

Maui is a New Zealand eScience Infrastructure (NeSI) high performance computer. Maui operates on a SLURM workload manager and therefore overloads the SLURM System module. Maui-specific parameters and functions are defined here.

Information on Maui can be found here: https://support.nesi.org.nz/hc/en-gb/articles/360000163695-M%C4%81ui

Note

Python and conda capabilities are NOT accessible from Maui, these capabilities have been shifted onto a separate cluster: Maui ancil This subclass therefore moves all Python dependent capabilities (i.e., SeisFlows3, Pyatoa) onto the ancilary cluster.

See also: https://support.nesi.org.nz/hc/en-gb/articles/ 360000203776-M%C4%81ui-Ancillary-Nodes

Classes

Maui

System Maui

Module Contents

class seisflows.system.maui.Maui(account=None, cpus_per_task=1, cluster='maui', partition='nesi_research', ancil_cluster='maui_ancil', ancil_partition='nesi_prepost', ancil_tasktime=1, **kwargs)

Bases: seisflows.system.slurm.Slurm

System Maui

New Zealand Maui-specfic modifications to base SLURM system

Parameters

type account:

str

param account:

Maui account to submit jobs under, will be used for the ‘–account’ sbatch argument

type cpus_per_task:

int

param cpus_per_task:

allow for multiple cpus per task, i.e,. multithreaded jobs

type cluster:

str

param cluster:

cluster to submit jobs to. Available are Maui and Mahuika

type partition:

str

param partition:

partition of the cluster to submit jobs to.

type ancil_cluster:

str

param ancil_cluster:

name of the ancilary cluster used for pre- post-processing tasks.

type ancil_partition:

name of the partition of the ancilary cluster

type ancil_tasktime:

int

param ancil_tasktime:

Tasktime in minutes for pre and post-processing jobs submitted to Maui ancil.

Paths

***

__doc__ = Multiline-String
Show Value
"""
    Workstation System [System Base]
    --------------------------------
    Defines foundational structure for System module. When used standalone,
    runs solver tasks either in serial (if `nproc`==1; i.e., without MPI) or in
    parallel (if `nproc`>1; i.e., with MPI). All other tasks are run in serial.

    Parameters
    ----------
    :type ntask: int
    :param ntask: number of individual tasks/events to run during workflow.
        Must be <= the number of source files in `path_specfem_data`
    :type nproc: int
    :param nproc: number of processors to use for each simulation. Choose 1 for
        serial simulations, and `nproc`>1 for parallel simulations.
    :type tasktime: float
    :param tasktime: maximum job time in units minutes for each job spawned by
        the SeisFlows master job during a workflow. These include, e.g.,
        running the forward solver, adjoint solver, smoother, kernel combiner.
        All spawned tasks receive the same task time. Fractions of minutes
        acceptable. If set as `None`, no tasktime will be enforced.
    :type mpiexec: str
    :param mpiexec: MPI executable on system. Defaults to 'mpirun -n ${NPROC}'
    :type array: str
    :param array: for `ntask` > 1, determine which tasks to submit to run. By
        default (NoneType) this submits all task IDs [0:ntask), or for single
        runs, submits only the first task ID, 0. However, for debugging or
        manual control purposes, Users may input a string of task IDs that they
        would like to run. Follows formatting of SLURM array directive
        (https://slurm.schedmd.com/job_array.html), which is, for example:
        1,2,3-8:2,10 -> 1,2,3,5,7,10
        where '-' denotes a range (inclusive), and ':' denotes an optional step.
        If ':' step is not given for a range, then step defaults to 1.
    :type rerun: int
    :param rerun: [EXPERIMENTAL FEATURE] attempt to re-run failed tasks or
        array tasks submitted with `run`. Collects information about failed
        jobs (or array jobs) after a failure, and re-submits with `run`.
        `rerun` is an integer defining how many times the User wants System to
        try and rerun before failing the entire job. If 0 (default), a single
        task failure will cause main job failure.
    :type log_level: str
    :param log_level: logger level to pass to logging module.
        Available: 'debug', 'info', 'warning', 'critical'
    :type verbose: bool
    :param verbose: if True, formats the log messages to include the file
        name and line number of the log message in the source code, as well as
        the message and message type. Useful for debugging but also very verbose
        so not recommended for production runs.

    Paths
    -----
    :type path_output_log: str
    :param path_output_log: path to a text file used to store the outputs of
        the package wide logger, which are also written to stdout
    :type path_par_file: str
    :param path_par_file: path to parameter file which is used to instantiate
        the package
    :type path_log_files: str
    :param path_log_files: path to a directory where individual log files are
        saved whenever a number of parallel tasks are run on the system.
    ***

The Cluster class provides the core utilities interaction with HPC systems
which must be overloaded by subclasses for specific workload managers, or
specific clusters.

The `Cluster` class acts as a base class for more specific cluster
implementations (like SLURM). However it can be used standalone. When running
jobs on the `Cluster` system, jobs will be submitted to the master system
using `subprocess.run`, mimicing how jobs would be run on a cluster but not
actually submitting to any job scheduler.

The Simple Linux Utility for Resource Management (SLURM) is a commonly used
workload manager on many high performance computers / clusters. The Slurm
system class provides generalized utilites for interacting with Slurm systems.

Useful commands for figuring out system-specific required parameters
    $ sinfo --Node --long  # Determine the cores-per-node for partitions

.. note::
    The main development system for SeisFlows used SLURM. Therefore the other
    system supers will not be up to date until access to those systems are
    granted. This rosetta stone, for converting from SLURM to other workload
    management tools will be useful: https://slurm.schedmd.com/rosetta.pdf

.. note::
   SLURM systems expect walltime/tasktime in format: "minutes",
   "minutes:seconds", "hours:minutes:seconds". SeisFlows uses the latter
   and converts task and walltimes from input of minutes to a time string.

TODO
    Create 'slurm_singulairty', a child class for singularity-based runs which
    loads and runs programs through singularity, OR add a parameter options
    which will change the run and/or submit calls

Maui is a New Zealand eScience Infrastructure (NeSI) high performance computer.
Maui operates on a SLURM workload manager and therefore overloads the SLURM
System module. Maui-specific parameters and functions are defined here.

Information on Maui can be found here:
https://support.nesi.org.nz/hc/en-gb/articles/360000163695-M%C4%81ui

.. note::
    Python and conda capabilities are NOT accessible from Maui, these
    capabilities have been shifted onto a separate cluster: Maui ancil
    This subclass therefore moves all Python dependent capabilities
    (i.e., SeisFlows3, Pyatoa) onto the ancilary cluster.

    See also: https://support.nesi.org.nz/hc/en-gb/articles/                                          360000203776-M%C4%81ui-Ancillary-Nodes

"""
account = None
cluster = 'maui'
partition = 'nesi_research'
cpus_per_task = 1
ancil_cluster = 'maui_ancil'
ancil_partition = 'nesi_prepost'
ancil_tasktime = 1
_partitions
_available_clusters = ['maui', 'mahuika']
check()

Checks parameters and paths

property submit_call_header

The submit call defines the SBATCH header which is used to submit a workflow task list to the system. It is usually dictated by the system’s required parameters, such as account names and partitions. Submit calls are modified and called by the submit function.

Note

The master job must be run on maui_ancil because Maui does not have the ability to run the command “sacct”, nor can it not have the ability to run the command “sacct”, nor can it use the Conda environment that has been set by Ancil

Note

We do not place SLURMARGS into the sbatch command to avoid the export=None which will not propagate the conda environment

Return type:

str

Returns:

the system-dependent portion of a submit call

property run_call_header

The run call defines the SBATCH header which is used to run tasks during an executing workflow. Like the submit call its arguments are dictated by the given system. Run calls are modified and called by the run function

Return type:

str

Returns:

the system-dependent portion of a run call

property ancil_run_call_header

A modified form of run_call which is used to run jobs on the Ancil pre/postprocessing cluster of Maui. This is used to run Pyaflowa jobs which require the Conda environment active on Maui Ancil.