HPC Example on UAF Chinook

This docs page introduces Users to running research-grade problems on high performance computers (HPC). It is currently targeted at a specific university cluster but may be expanded to other systems as necessary. Instruction should hopefully be generalizable to other clusters, although Users may need to write their own custom interface.

Chinook is University of Alaska Fairbank’s (UAF) high performance computer. Chinook is an Intel machine running the SLURM workload manager and Rocky Linux 8. Chinook is operated by Research Computing Systems (RCS).

Note

These instructions were written to be followed along during a group meeting at UAF and therefore go into some minute details that may not be relevant for all.

Note

Resources last accessed Nov. 14, 2022

0. Access Chinook

For those following along in-person, we will access Chinook via SSH and then access the updated chinook by SSH’ing into Chinook04 which contains the latest OS update for Chinook. You should be met by this cool fish upon successful login to Chinook04:

      /-._
    _/,.._/          dP""b8 88  88 88 88b 88  dP"Yb   dP"Yb  88  dP
 ,-'   ,  `-:,.-')  dP   `" 88  88 88 88Yb88 dP   Yb dP   Yb 88odP
: o ):';     _  {   Yb      888888 88 88 Y88 Yb   dP Yb   dP 88"Yb
 `-.  `' _,.-`-.)   YboodP 88  88 88 88  Y8  YbodP   YbodP  88  Yb
    `\`,.-'

You may get MPI errors when running SPECFEM if you do not include module load statements in your bashrc file, as the compute nodes will not have the appropriate modules to run executables. To deal with this, you will want to add the following lines to the following file: ~/.bashrc

module purge
module load slurm
module load intel

1. Install Conda

First we need to install Conda, the Python package managaer. RCS has installation instructions related to installing Minoconda here:

https://uaf-rcs.gitbook.io/uaf-rcs-hpc-docs/third-party-software/miniconda

2. Install or load adjTomo

You have two options here for grabbing the adjTomo softwater suite. 1) Easiest would be to load a pre-installed Conda environment:

conda activate /import/c1/ERTHQUAK/bhchow/REPOS/miniconda3/envs/adjtomo

2) A more flexible solution would be to create your own Conda environment and install software yourself. For that you will have to follow the instructions on the main docs page.

If you go with Option 2, make sure you activate your conda environment before proceeding.

3. Set up a SPECFEM2D/3D/3D_GLOBE working directory

SeisFlows requires a pre-established SPECFEM working directory, with:

binary executables (configured and compiled),
an appropriate DATA/ directory containing source, stations and Par_file, and
an initial model defined using one of SPECFEM’s internally defined model formats.

Note

If you already have a valid directory where you run forward simulations, you can skip subsection 3a

3a. Configure and compile

You will need to clone SPECFEM2D/3D/3D_GLOBE (you choose the flavor), configure and compile the code. Below are instructions specifically for Chinook.

Other clusters will have different compiler options and requirements that are machine/OS specific so it is difficult to write a generalized set of instructions.

Here we choose SPECFEM3D and compile using the Intel compiler suite:

mkdir -p $CENTER/REPOS  # Center1 is our working filesystem
cd $CENTER/REPOS
git clone --branch devel --depth=1 https://github.com/SPECFEM/specfem3d.git
cd specfem3d
module load slurm  # load SLURM workload manager
module load intel  # latest Intel compiler suite
./configure F90=ifort FC=ifort MPIFC=mpiifort CC=icc MPICC=mpiicc --with-mpi
make all  # -j to compile in parallel, if parallel, run on interactive mode

3b. Generate appropriate DATA/ directory

Here you can choose to set your own mesh and model parameters to suit your research problem. For the sake of simplicity we will use the homogeneous halfspace model located in the EXAMPLES/ directory to generate our starting model.

We will also work in a separate SPECFEM working directory (outside the cloned repository) to keep things clean and manageable.

mkdir -p $CENTER/work/specfem3d_workdir  # clean working directory
cd $CENTER/work/specfem3d_workdir
ln -s $CENTER/REPOS/specfem3d/bin .  # making sure we have the executables
cp -r $CENTER/REPOS/specfem3d/EXAMPLES/homogeneous_halfspace/DATA .
cp -r $CENTER/REPOS/specfem3d/EXAMPLES/homogeneous_halfspace/meshfem3D_files ./DATA
mkdir OUTPUT_FILES

3c. Dealing with multiple sources

One key difference that needs to be addressed is that SeisFlows requires sources be tagged. For example, if you want to run 10 events in your inversion you will need to individually tag each event with the appropriate format.

In SPECFEM3D our source prefix will be ‘CMTSOLUTION’. If we have multiple CMTSOLUTIONS, then one easy way to differentiate them would be to name them e.g.: CMTSOLUTION_1, CMTSOLUTION_2, …, CMTSOLUTION_N. These tags could also refer to event ids or origin times, it’s up to the user.

Here is one example of the naming scheme used in a published study.

For this example, since we don’t have multiple sources to choose from, we will simply copy our example CMTSOLUTION and rename:

cd $CENTER/work/specfem3d_workdir/DATA
mv CMTSOLUTION CMTSOLUTION_01  # source 1 is the example default
cp CMTSOLUTION_01 CMTSOLUTION_02  # source 2 is the same as source 1
ln -s CMTSOLUTION_01/ CMTSOLUTION  # so that SPECFEM can still find source 1

3d. Create Initial model

Now we’ll run SPECFEM to generate our mesh and model. This is the same procedure you would follow if running a forward simulation in SPECFEM, except we will not run the solver.

We need a slurm-specific SBATCH script to run our executables. You can find example SBATCH scripts for Chinook here. I will use two files from this directory, run_xmeshfem3d.sh and run_xgenerate_databases.sh.

Note

SPECFEM2D and SPECFEM3D_GLOBE do not require the xgenerate_databases step

sbatch run_xmeshfem3d.sh  # generates mesh files
sbatch run_xgenerate_databases.sh  # generates model files

By the end we want to have a number of binary (.bin) files that contain our model. These should be located in the local path:

ls OUTPUT_FILES/DATABASES_MPI  # should contain vp, vs, and rho files

Finally, we need to set the model parameter in the SPECFEM Par_file to ‘gll’. This will tell future runs of SPECFEM to read the model we just created, rather than trying to define it from internal parameters:

seisflows sempar -P DATA/Par_file model gll

Have a look at the command line tool docs page for more information on the command line tools available for SeisFlows.

4. Setting up a SeisFlows working directory

We are now ready to run SeisFlows. We just have to set up a working directory and point the parameter file at the correct locations such that SeisFlows can find our SPECFEM working directory.

I will run SeisFlows in a separate directory to keep things clean.

mkdir -p $CENTER/work/seisflows_workdir
cd $CENTER/work/seisflows_workdir
seisflows setup  # creates a template parameters.yaml file

Have a look at the parameter file docs page for more information on how the file is structured.

4a. SeisFlows parameter file

You can look at the generated parameter file to see what the template version looks like (using a text editor or cat). We will simply overwrite some of the base starting parameters to suit our current use case. Use the seisflows par command to do this quickly on the command line.

SeisFlows already contains a pre-built Chinook interface (based on a general SLURM interface). You can use seisflows print modules to see all valid system (and other modules) choices.

seisflows print modules

If you do not see your own system (for non-Chinook users) supported, you will need to follow the instructions on writing your own system-subclass

Here we overwrite some default parameters to set up the base modules for our workflow:

seisflows par system chinook  # chinook system interface
seisflows par solver specfem3d  # specfem3d cartesian version
seisflows par preprocess null  # turn OFF preprocessing for now
seisflows par optimize null  # turn OFF optimization

By default we are running a forward workflow, which simply runs forward simulations en-masse. In following sections we will swap over to an inversion workflow.

4b. Configuring the parameter file

Each choice of base module (i.e., workflow system, solver, preprocess, optimize) comes with it’s own distinct set of parameters. SeisFlows therefore dynamically generates a parameter file based on User choices for the base modules and the appropriate source code doc strings.

We can configure our parameter file with:

seisflows configure

Have a look at your parameter file now to see all the module-specific parameters that have been instantiated.

4c. Checking the parameter file

As with SPECFEM, the parameter file in SeisFlows controls the entire package, and all the parameters that have been set using the seisflows configure command are applicable to your current workflow.

Warning

It is up to a prospetive user to carefully read and understand what each parameter does. I have tried to make the docstrings as comprehensive as possible, but things do slip through the cracks. If you find that a certain parameter is not well explained, ambiguous, etc. please open up a GitHub issue or PR with clarifying changes.

Each module in SeisFlows has a check function which it uses to determine parameter validity.

Users can use this check function to quickly determine missing, inappropriate, or invalid parameters in their parameter file.

seisflows check

You can use this method to fix parameters one by one until no errors are raised, after which you should be confident that you are able to run your workflow.

Following the parameter errors raised, you will have to change the following:

# Changing paths to tell SeisFlows where to find SPECFEM
seisflows par path_specfem_bin ${CENTER}/work/specfem3d_workdir/bin
seisflows par path_specfem_data ${CENTER}/work/specfem3d_workdir/DATA
seisflows par path_model_init ${CENTER}/work/specfem3d_workdir/OUTPUT_FILES/DATABASES_MPI

Based on docstrings, I know I will also want to set the following parameters in order to suit my current research problem:

# Changing parameters to suit our workflow
seisflows par ntask 2  # two events, corresponding to two CMTSOLUTIONS
seisflows par tasktime 5  # walltime for individual simulations
seisflows par walltime 20  # walltime for the entire workflow
seisflows par nproc 4  # to match the SPECFEM parameter of the same name
seisflows par export_traces True  # save seismograms to disk

5. Submit the main job

SeisFlows operates using a serial, single-core main job submitted to a compute node. This main job will act like you, the researcher:

Through the pre-defined Chinook/SLURM system interface, the main job already knows how to:

submit jobs (using sbatch),
monitor the queue (using sacct)
book keep SPECFEM and manage the filesystem
stop jobs if any errors occur

To submit the main job, we simply run:

seisflows submit

Now that we have submitted the workflow, the main job will run en-masse forward simulations. In other words, it runs two forward simulations corresponding to the two CMTSOLUTIONS we have in our DATA/ directory.

Note

On Chinook, in order to keep the main partition clean, all master jobs are submitted to the ‘debug’ node by default. This is hardcoded into the Chinook implementation. Future work may place the main job on the login node as well.

6. Inspecting SeisFlows

Have a look at the working directory docs page for an explanation of the directories and files being generated.

Monitor the job queue to see the master job and all spawned compute jobs that get submitted to the system using the squeue or sacct commands.

The main log is writing to sflog.txt
Each spawned job is logging to a unique file in logs/
Each source has it’s own working directory in scratch/solver/

6a. Recovering from job failures

SeisFlows has a state file (sfstate.txt) that tracks the progress of your inversion. Each main workflow function (e.g., forward simulations) constitute a ‘checkpoint’ in the workflow. If a function completes sucessfully, it is labeled ‘completed’. Jobs which fail are labelled ‘failed’.

If your job fails (e.g., due to walltime), you can simply run seisflows submit again, and SeisFlows will know to skip over the already completed tasks, saving computational cost.

Note

Currently, SeisFlows does not know how to track individually completed jobs. E.g., for a two event workflow, one event completes a successful forward simulation, but the other one fails for unknown reason. Currently SeisFlows will need to re-run ALL forward simulations. In the future I hope to include some more detailed checkpointing to avoid this.

6b. SeisFlows debug mode

SeisFlows has a debug mode, which is simply an IPython environment with all SeisFlows modules and parameters loaded. This allows the User to step through code while debugging or developing.

This is especially useful when you are looking at source code (trying to figure out a bug), and you want to know “what is this variable?”, or “what does this function return?”. You can figure that out with:

seisflows debug

7. Modifying for a synthetic inversion

Great! This is essentially the standard method of operating SeisFlows: manually setting up your SPECFEM directory, tooling the parameter file, and submitting your job.

But what if you now want to run a synthetic inversion to compare synthetic seismograms from two very similar models? How do you get from here to there?

It is a good idea to either clear out your current working directory, or start a new one, before proceeding with a separate workflow. To delete all non-essential files, you can run:

seisflows clean -f

7a. Swap modules in the parameter file

SeisFlows swap allows Users to swap out valid modules without disturbing the remainder of the parameter file. So since we want to swap out our ‘forward’ workflow for an ‘inversion’ workflow, we can do:

seisflows swap workflow inversion

If you look at your parameter file now, you will see a suite of new parameters that control an inversion workflow.

This is the same for swapping from SPECFEM3D -> SPECFEM3D_GLOBE or choosing preprocessing parameters.

The inversion workflow requires a corresponding preprocess and optimize module. We can set these to the preferred classes default and LBFGS. Again have a look at the output of seisflows print modules for all choices.

seisflows swap preprocess default
seisflows swap optimize LBFGS

7b. Generate your target model

The inversion workflow requires data. Since we have decided to do a synthetic inversion, SeisFlows requires a target model. If we were doing a real-data inversion, SeisFlows would require waveform data.

We’ll set up our target model as a slightly altered homogeneous halfspace model to keep things simple:

cd $CENTER/work/specfem3d_workdir
mv OUTPUT_FILES OUTPUT_FILES_INIT  # setting aside our initial model
cd DATA/meshfem3D_files
mv Mesh_Par_file Mesh_Par_file_init  # setting aside initial mesh
cp Mesh_Par_file_init Mesh_Par_file_true
ln -s Mesh_Par_file_true Mesh_Par_file  # ensuring mesh name is correct

Here you need to manually:

open up the Mesh_Par_file file,
scroll down to the ‘Domain materials’ section (around Line 86) and
edit the material parameters to your choosing.

I will increase velocities by 10%, that is Vp: 2800 -> 3020 m/s and Vs: 1500 -> 1650 m/s.

And now we need to run the SPECFEM binaries again to generate our target model

cd $CENTER/work/specfem3d_workdir
mkdir OUTPUT_FILES_TRUE
ln -s OUTPUT_FILES_TRUE OUTPUT_FILES  # making sure SPECFEM can find this dir.
seisflows sempar -P DATA/Par_file model default  # make sure SPECFEM reads the model from the mesh
sbatch run_xmeshfem3d.sh
sbatch run_xgenerate_databases.sh
seisflows sempar -P DATA/Par_file model gll  # reset for seisflows run

7c. Set inversion-specific parameters

Again we can use seisflows check to see what new parameters we need to set, which are introduced by the 3 new modules we have (workflow, preprocess, optimize).

cd $CENTER/work/seisflows_workdir
seisflows check

Following the ‘check’list we will need to change the folowing parameters

seisflows par data_case synthetic  # synthetic inversion (no data)
seisflows par path_model_true ${CENTER}/work/specfem3d_workdir/OUTPUT_FILES_TRUE/DATABASES_MPI

We’ll also set the following parameters:

seisflows par path_model_init ${CENTER}/work/specfem3d_workdir/OUTPUT_FILES_INIT/DATABASES_MPI  # to deal with the fact that we renamed this directory
seisflows par materials elastic  # update both vp and vs
seisflows par end 2  # stop after iteration 2 is finished

7d. SeisFlows submit

Again we run submit to submit our workflow.

seisflows submit

You can monitor sflog.txt to watch the progress of your job.