seisflows.system.singularity

A Cluster-adjacent base class that provides core utilities for interactions with HPC systems running Singularity. Must be overloaded by subclasses defined for specific workload managers / clusters.

The Singularity class was written for clusters running SeisFlows through Docker containers using Singularity. The reason for writing a separate class is because Docker containers do not have access to the workload manager (e.,g SLURM/sbatch) and therefore we cannot run job submission calls directly from the Python environment. Instead, each time a job must be submitted to the Cluster, the User must manually submit.

Note

To users looking to run SeisFlows directly via their Cluster Conda environment, look at the Cluster class and its workload manager-specific sub-classes

Classes

Singularity

Singularity System

Module Contents

class seisflows.system.singularity.Singularity(title=None, mpiexec='', ntask_max=None, tasktime=1, environs='', singularity_exec='singularity', path_container=None, **kwargs)

Bases: seisflows.system.workstation.Workstation

Singularity System

HPC interfacing through Docker/Singularity containers

Parameters

type title:

str

param title:

The name used to submit jobs to the system, defaults to the name of the current working directory

type mpiexec:

str

param mpiexec:

Function used to invoke executables on the system. For example ‘mpirun’, ‘mpiexec’, ‘srun’, ‘ibrun’

type ntask_max:

int

param ntask_max:

limit the number of concurrent tasks in a given array job

type tasktime:

float

param tasktime:

maximum job time in minutes for each job spawned by the SeisFlows master job during a workflow. These include, e.g., running the forward solver, adjoint solver, smoother, kernel combiner. All spawned tasks receive the same task time. Fractions of minutes acceptable.

type environs:

str

param environs:

Optional environment variables to be provided in the following format VAR1=var1,VAR2=var2… Will be set using os.environs

Paths

type path_container:

str

param path_container:

path to the Docker Image that contains adjTomo software package

***

__doc__ = Multiline-String
Show Value
"""
    Workstation System [System Base]
    --------------------------------
    Defines foundational structure for System module. When used standalone,
    runs solver tasks either in serial (if `nproc`==1; i.e., without MPI) or in
    parallel (if `nproc`>1; i.e., with MPI). All other tasks are run in serial.

    Parameters
    ----------
    :type ntask: int
    :param ntask: number of individual tasks/events to run during workflow.
        Must be <= the number of source files in `path_specfem_data`
    :type nproc: int
    :param nproc: number of processors to use for each simulation. Choose 1 for
        serial simulations, and `nproc`>1 for parallel simulations.
    :type tasktime: float
    :param tasktime: maximum job time in units minutes for each job spawned by
        the SeisFlows master job during a workflow. These include, e.g.,
        running the forward solver, adjoint solver, smoother, kernel combiner.
        All spawned tasks receive the same task time. Fractions of minutes
        acceptable. If set as `None`, no tasktime will be enforced.
    :type mpiexec: str
    :param mpiexec: MPI executable on system. Defaults to 'mpirun -n ${NPROC}'
    :type array: str
    :param array: for `ntask` > 1, determine which tasks to submit to run. By
        default (NoneType) this submits all task IDs [0:ntask), or for single
        runs, submits only the first task ID, 0. However, for debugging or
        manual control purposes, Users may input a string of task IDs that they
        would like to run. Follows formatting of SLURM array directive
        (https://slurm.schedmd.com/job_array.html), which is, for example:
        1,2,3-8:2,10 -> 1,2,3,5,7,10
        where '-' denotes a range (inclusive), and ':' denotes an optional step.
        If ':' step is not given for a range, then step defaults to 1.
    :type rerun: int
    :param rerun: [EXPERIMENTAL FEATURE] attempt to re-run failed tasks or
        array tasks submitted with `run`. Collects information about failed
        jobs (or array jobs) after a failure, and re-submits with `run`.
        `rerun` is an integer defining how many times the User wants System to
        try and rerun before failing the entire job. If 0 (default), a single
        task failure will cause main job failure.
    :type log_level: str
    :param log_level: logger level to pass to logging module.
        Available: 'debug', 'info', 'warning', 'critical'
    :type verbose: bool
    :param verbose: if True, formats the log messages to include the file
        name and line number of the log message in the source code, as well as
        the message and message type. Useful for debugging but also very verbose
        so not recommended for production runs.

    Paths
    -----
    :type path_output_log: str
    :param path_output_log: path to a text file used to store the outputs of
        the package wide logger, which are also written to stdout
    :type path_par_file: str
    :param path_par_file: path to parameter file which is used to instantiate
        the package
    :type path_log_files: str
    :param path_log_files: path to a directory where individual log files are
        saved whenever a number of parallel tasks are run on the system.
    ***

A Cluster-adjacent base class that provides core utilities for interactions
with HPC systems running Singularity. Must be overloaded by subclasses defined
for specific workload managers / clusters.

The `Singularity` class was written for clusters running SeisFlows through
Docker containers using Singularity. The reason for writing a separate class
is because Docker containers do not have access to the workload manager (e.,g
SLURM/sbatch) and therefore we cannot run job submission calls directly from
the Python environment. Instead, each time a job must be submitted to the
Cluster, the User must manually submit.

.. note::
    To users looking to run SeisFlows directly via their Cluster Conda
    environment, look at the `Cluster` class and its workload manager-specific
    sub-classes
"""
mpiexec = ''
ntask_max
tasktime = 1
environs = ''
singularity_exec = 'singularity'
setup()

Copies ‘submit’ and ‘run’ .py scripts from the repository into the working directory so that the User can run these scripts directly. This is a manual step in order to allow Users to run with a container without using native environment commands (e.g., sbatch) from inside a container.

property run_call_header

The run call defines the Singularity wrapper which executes run calls using the Docker image. It also binds the current working directory inside the container so that we can write back to the local filesystem.

Note

Generalized cluster returns empty string but child system classes will need to overwrite the submit call.

Return type:

str

Returns:

the system-dependent portion of a run call

submit(workdir=None, parameter_file='parameters.yaml')

Submits the main workflow job as a serial job submitted directly to the system that is running the master job

Parameters:
  • workdir (str) – path to the current working directory

  • parameter_file (str) – parameter file name used to instantiate the SeisFlows package

run(funcs, single=False, **kwargs)

Runs tasks multiple times in parallel by submitting NTASK new jobs to system. The list of functions and its kwargs are saved as pickles files, and then re-loaded by each submitted process with specific environment variables. Each spawned process will run the list of functions.

Parameters:
  • funcs (list of methods) – a list of functions that should be run in order. All kwargs passed to run() will be passed into the functions.

  • single (bool) – run a single-process, non-parallel task, such as smoothing the gradient, which only needs to be run by once. This will change how the job array and the number of tasks is defined, such that the job is submitted as a single-core job to the system.

  • run_call (str) – the call used to submit the run script. If None, attempts default run call which should be suited for the given system. Can be overwritten by child classes to involve other arguments