Tips and Tricks ================ Learn some neat tips and tricks for running SeisFlows that may not be evidently apparent when running examples or looking at the parameter file. Stopping Mid Workflow ---------------------- Stop a workflow prematurely to look at results or change parameters. All SeisFlows workflows (except TestFlow) have a parameter called `stop_after` which can be used to stop mid workflow. To check valid options for `stop_after`, run the following command **from within a valid working directory**. .. code:: bash seisflows print tasks To set your `stop_after` parameter you can use the `seisflows par` command. For example: .. code:: bash seisflows par stop_after run_adjoint_simulations To resume a stopped workflow, you only need to re-run `seisflows submit`. The checkpointing system will ensure that the workflow picks up from where it left off. .. code:: bash seisflows submit Checkpointing ------------- SeisFlows has a checkpointing system which ensures that tasks that have already been run will not be re-run in the case of job failures and workflow restarts. The checkpointing system uses a text file called `sfstate.txt` which simply has entries related to tasks in the task list. Tasks in the task list have three states: 'completed', 'failed' and 'pending'. - Completed: Task has already been run and will be skipped over if re-run - Failed: Task has failed and will be re-run - Pending: Task has not been executed and will be run SeisFlows manages the `sfstate.txt` file on its own, however Users can manually edit the state file if they want certain tasks to be re-run. Simply open the task file with a text editor and change states. .. note:: In the future we hope to improve the checkpointing system with a command line option to edit the file `seisflows state`, and with a more sophisticated system that can single out particular job failures to re-run. Tasktime vs. Walltime --------------------- Jobs run on Clusters have two time-related parameters `tasktime` and `walltime`. `Walltime` refers to the submission wall time given to the `main` job, whereas `tasktime` refers to the submission wall time given to each simulation job. `Tasktime` is relatively simple to figure out - it should be set to the longest expected time it takes **one** simulation to finish. If running inversion workflows, expect that adjoint simulations will take longer to run w.r.t forward simulations. Be sure to add a little buffer time for serial processing steps taken before or after simulations. `Walltime` should represent how long you think an **entire** workflow will take to run. At an extreme, this can be set to the longest allowable walltime on your system (e.g., 24 hours). Or you can try to calculate how long an entire workflow will take. For example, if you are running a 2 iteration inversion where each simulation (tasktime) takes 10 min, then you may expect 1 forward simulation, 1 adjoint simulation and 2-3 forward simulations for the line search. Given open queues (i.e., all array jobs can run at the same time), this will equal roughly 2 iterations * 5 simulations / iteration * 10 minutes / simulation = 100 minutes. In the above example, a User might want to add some buffer time for long queue times and non-simulation processing steps. An acceptable walltime might then be 150-200 minutes.