seisflows.system.frontera

Frontera is one of the Texas Advanced Computing Center (TACC) HPCs. https://frontera-portal.tacc.utexas.edu/

Note

Frontera Caveat 1 On TACC Systems you cannot submit ‘sbatch’ from compute nodes. Work around: SSHs from compute node to login node, activate conda environemtn, submit sbatch script. This requires knowing the User name, conda environment name, and ensuring SSH keys are available. Thanks to Ian Wang for the suggestion to SSH around the problem.

Essentially we are running, from the compute node: $ ssh user@hostname ‘conda activate env; sbatch –arg=val run_function.sh’

Note

Frontera Caveat 2 TACC does not allow the ‘–array’ option, which SeisFlows uses to submit multiple jobs in a single SBATCH command. To work around this, the Frontera module submits jobs one by one.

Module Contents

Classes

Frontera

System Frontera

class seisflows.system.frontera.Frontera(user=None, conda_env=None, partition='development', submit_to=None, allocation=None, mpiexec='ibrun', **kwargs)

Bases: seisflows.system.slurm.Slurm

System Frontera

Texas Advanced Computing Center HPC Frontera, SLURM based system

Parameters

type user

str

param user

User’s username on TACC systems. Can be determined by ‘whoami’ or will be gathered from the ‘USER’ environment variable. Used for internal ssh’ing from compute nodes to login nodes.

type conda_env

str

param conda_env

name of the Conda environment in which you are running SeisFlows. Defaults to environment variable ‘CONDA_DEFAULT_ENV’. Used to activate the conda environment AFTER ssh’ing from compute to login node, to ensure that the newly submitted job has access to the SeisFlows environment

type partition

str

param partition

Chinook has various partitions which each have their own number of cores per compute node. Available are: small, normal, large, development, flex

type submit_to

str

param submit_to

(Optional) partition to submit the main/master job which is a serial Python task that controls the workflow. Likely this should be ‘small’ or ‘development’. If not given, defaults to partition.

type allocation

str

param allocation

Name of allocation/project on the Frontera system. Required if you have more than one active allocation.

Paths

***

property submit_call_header

The submit call defines the SBATCH header which is used to submit a workflow task list to the system. It is usually dictated by the system’s required parameters, such as account names and partitions. Submit calls are modified and called by the submit function.

Return type

str

Returns

the system-dependent portion of a submit call

property run_call_header

The run call defines the SBATCH header which is used to run tasks during an executing workflow. Like the submit call its arguments are dictated by the given system. Run calls are modified and called by the run function

Return type

str

Returns

the system-dependent portion of a run call

check()

Checks parameters and paths

static _stdout_to_job_id(stdout)

The stdout message after an SBATCH job is submitted. On Frontera, the standard message is preceded by a log message which looks like:

```

Welcome to the Frontera Supercomputer


No reservation for this job –> Verifying valid submit host (login3)…OK –> Verifying valid jobname…OK –> Verifying valid ssh keys…OK –> Verifying access to desired queue (development)…OK –> Checking available allocation (EAR21042)…OK –> Verifying that quota for filesystem … is at 3.87% allocated…OK –> Verifying that quota for filesystem … is at 0.91% allocated…OK 4738284

``` :type stdout: str :param stdout: standard SBATCH response after submitting a job with the

‘–parsable’ flag

rtype

str

return

a matching job ID. We convert str->int->str to ensure that the job id is an integer value (which it must be)

run(funcs, single=False, **kwargs)

Runs task multiple times in embarrassingly parallel fasion on Frontera. Executes the list of functions (funcs) NTASK times with each task occupying NPROC cores.

Note

Completely overwrites the Slurm.run() command

TODO
  • can we ssh once or do we have to do it for each process?

  • the ssh command prints the ssh prompt to the log file, how do we supress that?

Parameters
  • funcs (list of methods) – a list of functions that should be run in order. All kwargs passed to run() will be passed into the functions.

  • single (bool) – run a single-process, non-parallel task, such as smoothing the gradient, which only needs to be run by once. This will change how the job array and the number of tasks is defined, such that the job is submitted as a single-core job to the system.