seisflows.system.workstation
============================

.. py:module:: seisflows.system.workstation

.. autoapi-nested-parse::

   The `workstation` class is the foundational `System` module in SeisFlows,
   it provides utilities for submitting jobs in SERIAL on a small-scale machine,
   e.g., a workstation or a laptop. All other `System` classes build on this class.


Classes
-------

.. autoapisummary::

   seisflows.system.workstation.Workstation


Module Contents
---------------

.. py:class:: Workstation(ntask=1, nproc=1, tasktime=1, mpiexec=None, array=None, rerun=0, log_level='DEBUG', verbose=False, workdir=os.getcwd(), path_output=None, path_system=None, path_par_file=None, path_output_log=None, path_log_files=None, **kwargs)

   Workstation System [System Base]
   --------------------------------
   Defines foundational structure for System module. When used standalone,
   runs solver tasks either in serial (if `nproc`==1; i.e., without MPI) or in
   parallel (if `nproc`>1; i.e., with MPI). All other tasks are run in serial.

   Parameters
   ----------
   :type ntask: int
   :param ntask: number of individual tasks/events to run during workflow.
       Must be <= the number of source files in `path_specfem_data`
   :type nproc: int
   :param nproc: number of processors to use for each simulation. Choose 1 for
       serial simulations, and `nproc`>1 for parallel simulations.
   :type tasktime: float
   :param tasktime: maximum job time in units minutes for each job spawned by
       the SeisFlows master job during a workflow. These include, e.g.,
       running the forward solver, adjoint solver, smoother, kernel combiner.
       All spawned tasks receive the same task time. Fractions of minutes
       acceptable. If set as `None`, no tasktime will be enforced.
   :type mpiexec: str
   :param mpiexec: MPI executable on system. Defaults to 'mpirun -n ${NPROC}'
   :type array: str
   :param array: for `ntask` > 1, determine which tasks to submit to run. By
       default (NoneType) this submits all task IDs [0:ntask), or for single
       runs, submits only the first task ID, 0. However, for debugging or
       manual control purposes, Users may input a string of task IDs that they
       would like to run. Follows formatting of SLURM array directive
       (https://slurm.schedmd.com/job_array.html), which is, for example:
       1,2,3-8:2,10 -> 1,2,3,5,7,10
       where '-' denotes a range (inclusive), and ':' denotes an optional step.
       If ':' step is not given for a range, then step defaults to 1.
   :type rerun: int
   :param rerun: [EXPERIMENTAL FEATURE] attempt to re-run failed tasks or 
       array tasks submitted with `run`. Collects information about failed 
       jobs (or array jobs) after a failure, and re-submits with `run`. 
       `rerun` is an integer defining how many times the User wants System to
       try and rerun before failing the entire job. If 0 (default), a single
       task failure will cause main job failure.
   :type log_level: str
   :param log_level: logger level to pass to logging module.
       Available: 'debug', 'info', 'warning', 'critical'
   :type verbose: bool
   :param verbose: if True, formats the log messages to include the file
       name and line number of the log message in the source code, as well as 
       the message and message type. Useful for debugging but also very verbose
       so not recommended for production runs.

   Paths
   -----
   :type path_output_log: str
   :param path_output_log: path to a text file used to store the outputs of
       the package wide logger, which are also written to stdout
   :type path_par_file: str
   :param path_par_file: path to parameter file which is used to instantiate
       the package
   :type path_log_files: str
   :param path_log_files: path to a directory where individual log files are
       saved whenever a number of parallel tasks are run on the system.
   ***


   .. py:attribute:: ntask
      :value: 1


   .. py:attribute:: nproc
      :value: 1


   .. py:attribute:: tasktime
      :value: 1


   .. py:attribute:: rerun
      :value: 0


   .. py:attribute:: mpiexec
      :value: None


   .. py:attribute:: array
      :value: None


   .. py:attribute:: log_level
      :value: ''


   .. py:attribute:: verbose
      :value: False


   .. py:attribute:: path


   .. py:attribute:: _acceptable_log_levels
      :value: ['CRITICAL', 'WARNING', 'INFO', 'DEBUG']


   .. py:method:: check()

      Checks parameters and paths


   .. py:method:: setup()

      Create the SeisFlows directory structure in preparation for a
      SeisFlows workflow. Ensure that if any config information is left over
      from a previous workflow, that these files are not overwritten by
      the new workflow. Should be called by submit()

      .. note::
          This function is expected to create dirs: SCRATCH, SYSTEM, OUTPUT
          and the following log files: output, error

      .. note::
          Logger is configured here as all workflows, independent of system,
          will be calling setup()

      :rtype: tuple of str
      :return: (path to output log, path to error log)


   .. py:method:: finalize()

      Tear down tasks for the end of an Inversion-based iteration


   .. py:method:: submit(workdir=None, parameter_file='parameters.yaml')

      Submits the main workflow job as a serial job submitted directly to
      the system that is running the master job

      :type workdir: str
      :param workdir: path to the current working directory
      :type parameter_file: str
      :param parameter_file: parameter file name used to instantiate the
          SeisFlows package


   .. py:method:: run(funcs, single=False, tasktime=None, **kwargs)

      Executes task multiple times in serial.

      .. note::
          kwargs will be passed to the underlying `method` that is called

      :type funcs: list of methods
      :param funcs: a list of functions that should be run in order. All
          kwargs passed to run() will be passed into the functions.
      :type single: bool
      :param single: run a single-process, non-parallel task, such as
          smoothing the gradient, which only needs to be run by once.
          This will change how the job array and the number of tasks is
          defined, such that the job is submitted as a single-core job to
          the system.
      :type tasktime: float
      :param tasktime: Custom tasktime in units minutes for running the given 
          functions `funcs`. If not given, defaults to the System variable
          `tasktime`. If System `tasktime` is also None, defaults to no 
          tasktime (inifinty time). If tasks exceed the given `tasktime`, 
          the program will exit


   .. py:method:: task_ids(single=False)

      Return a list of Task IDs (linked to each indiviudal source) to supply
      to the 'run' function. By default this returns a range of available
      tasks [0:ntask). See class docstring of parameter `array` for how to
      manually set task_ids to use for run call.

      :type single: bool
      :param single: If we only want to run a single process, this is will 
          default to TaskID == 0
      :rtype: list
      :return: a list of task IDs to be used by the `run` function


   .. py:method:: _get_log_file(task_id)

      To mimic clusters which assign job numbers to spawned processes, our
      on-system runs will also assign job numbers simply be incrementing the
      number on the log files on system.