seisflows.system.frontera
Frontera is one of the Texas Advanced Computing Center (TACC) HPCs. https://frontera-portal.tacc.utexas.edu/
Note
Frontera Caveat 1 On TACC Systems you cannot submit ‘sbatch’ from compute nodes. Work around: SSHs from compute node to login node, activate conda environemtn, submit sbatch script. This requires knowing the User name, conda environment name, and ensuring SSH keys are available. Thanks to Ian Wang for the suggestion to SSH around the problem.
Essentially we are running, from the compute node: $ ssh user@hostname ‘conda activate env; sbatch –arg=val run_function.sh’
Note
Frontera Caveat 2 TACC does not allow the ‘–array’ option, which SeisFlows uses to submit multiple jobs in a single SBATCH command. To work around this, the Frontera module submits jobs one by one.
Module Contents
Classes
System Frontera |
- class seisflows.system.frontera.Frontera(user=None, conda_env=None, partition='development', submit_to=None, allocation=None, mpiexec='ibrun', **kwargs)
Bases:
seisflows.system.slurm.Slurm
System Frontera
Texas Advanced Computing Center HPC Frontera, SLURM based system
Parameters
- type user
str
- param user
User’s username on TACC systems. Can be determined by ‘whoami’ or will be gathered from the ‘USER’ environment variable. Used for internal ssh’ing from compute nodes to login nodes.
- type conda_env
str
- param conda_env
name of the Conda environment in which you are running SeisFlows. Defaults to environment variable ‘CONDA_DEFAULT_ENV’. Used to activate the conda environment AFTER ssh’ing from compute to login node, to ensure that the newly submitted job has access to the SeisFlows environment
- type partition
str
- param partition
Chinook has various partitions which each have their own number of cores per compute node. Available are: small, normal, large, development, flex
- type submit_to
str
- param submit_to
(Optional) partition to submit the main/master job which is a serial Python task that controls the workflow. Likely this should be ‘small’ or ‘development’. If not given, defaults to partition.
- type allocation
str
- param allocation
Name of allocation/project on the Frontera system. Required if you have more than one active allocation.
Paths
- property submit_call_header
The submit call defines the SBATCH header which is used to submit a workflow task list to the system. It is usually dictated by the system’s required parameters, such as account names and partitions. Submit calls are modified and called by the submit function.
- Return type
str
- Returns
the system-dependent portion of a submit call
- property run_call_header
The run call defines the SBATCH header which is used to run tasks during an executing workflow. Like the submit call its arguments are dictated by the given system. Run calls are modified and called by the run function
- Return type
str
- Returns
the system-dependent portion of a run call
- check()
Checks parameters and paths
- static _stdout_to_job_id(stdout)
The stdout message after an SBATCH job is submitted. On Frontera, the standard message is preceded by a log message which looks like:
```
Welcome to the Frontera Supercomputer
No reservation for this job –> Verifying valid submit host (login3)…OK –> Verifying valid jobname…OK –> Verifying valid ssh keys…OK –> Verifying access to desired queue (development)…OK –> Checking available allocation (EAR21042)…OK –> Verifying that quota for filesystem … is at 3.87% allocated…OK –> Verifying that quota for filesystem … is at 0.91% allocated…OK 4738284
``` :type stdout: str :param stdout: standard SBATCH response after submitting a job with the
‘–parsable’ flag
- rtype
str
- return
a matching job ID. We convert str->int->str to ensure that the job id is an integer value (which it must be)
- run(funcs, single=False, **kwargs)
Runs task multiple times in embarrassingly parallel fasion on Frontera. Executes the list of functions (funcs) NTASK times with each task occupying NPROC cores.
Note
Completely overwrites the Slurm.run() command
- TODO
can we ssh once or do we have to do it for each process?
the ssh command prints the ssh prompt to the log file, how do we supress that?
- Parameters
funcs (list of methods) – a list of functions that should be run in order. All kwargs passed to run() will be passed into the functions.
single (bool) – run a single-process, non-parallel task, such as smoothing the gradient, which only needs to be run by once. This will change how the job array and the number of tasks is defined, such that the job is submitted as a single-core job to the system.