seisflows.tools.specfem
=======================

.. py:module:: seisflows.tools.specfem

.. autoapi-nested-parse::

   Utilities to interact with, manipulate or call on the external solver,
   i.e., SPECFEM2D/3D/3D_GLOBE


Attributes
----------

.. autoapisummary::

   seisflows.tools.specfem.SOURCE_KEYS


Functions
---------

.. autoapisummary::

   seisflows.tools.specfem.get_src_rcv_lookup_table
   seisflows.tools.specfem.return_matching_waveform_files
   seisflows.tools.specfem.convert_stations_to_sources
   seisflows.tools.specfem.get_station_locations
   seisflows.tools.specfem.get_source_locations
   seisflows.tools.specfem.rename_as_adjoint_source
   seisflows.tools.specfem.check_source_names
   seisflows.tools.specfem.getpar
   seisflows.tools.specfem.setpar
   seisflows.tools.specfem._getidx_vel_model
   seisflows.tools.specfem.getpar_vel_model
   seisflows.tools.specfem.setpar_vel_model
   seisflows.tools.specfem.read_fortran_binary
   seisflows.tools.specfem.write_fortran_binary


Module Contents
---------------

.. py:data:: SOURCE_KEYS

.. py:function:: get_src_rcv_lookup_table(path_to_data, source_prefix='CMTSOLUTION', stations_file='STATIONS')

   Generate a lookup table that gives relative distance, azimuth etc. for
   each source and receiver used in the workflow. Source and station
   locations will be gathered from metadata files stored in the SPECFEM
   data directory.

   :type path_to_data: str
   :param path_to_data: full path to the SPECFEM DATA/ directory which should
       contain the source file (e.g., CMTSOLUTION, FORCESOLUTION), and the
       STATIONS file whose name is defined by `station_file`.
   :type source_prefix: str
   :param source_prefix: prefix of all the source files that will be wildcard
       searched for using the following wildcard: {source_prefix}_*. e.g., if
       all your files are CMTSOLUTIONS (CMTSOLUTION_001, CMTSOLUTION_002), then
       `source_prefix` should be 'CMTSOLUTION'
   :type stations_file: str
   :param stations_file: the name of the STATIONS file in the SPECFEM DATA
       directory `path_to_data`. By default this is STATIONS


.. py:function:: return_matching_waveform_files(obs_path, syn_path, obs_fmt='ASCII', syn_fmt='ASCII', components=None)

   Gather a list of filenames of matching waveform IDs which are sorted and
   match for given paths to 'observed' and 'synthetic' waveform data. This is
   useful because often the available 'observed' data will not match all of
   the 'synthetic' data that is generated during a simulation, so this function
   helps determine which files/stations can be accessed for misfit
   quantification.

   .. note::

       Waveform files are expected to be in the format
       NN.SSS.CCc.* (N=network, S=station, C=channel, c=component;
       following SPECFEM ASCII formatting). They will be matched on
       `NN.SSS.c` (dropping channel naming because SEED convention may have
       different channel naming). For example, synthetic name
       'AA.S001.BXZ.semd' will be converted to 'AA.S001.Z', and matching
       observation 'AA.S001.HHZ.SAC' will be converted to 'AA.S001.Z'.
       These two will be matched.

   :type obs_path: str
   :param obs_path: the path to observed data waveforms for a given source.
       In SeisFlows this will typically point to somewhere like:
        'scratch/solver/<SOURCE_NAME>/traces/obs/'
   :type syn_path: str
   :param syn_path: the path to synthetic data waveforms for a given source.
       In SeisFlows this will typically point to somewhere like:
        'scratch/solver/<SOURCE_NAME>/traces/syn/'
   :type obs_fmt: str
   :param obs_fmt: expected file format of the observed waveforms. Used for
       safety checks that only one expected file format will be read. Defaults
       to 'ASCII'
   :type syn_fmt: str
   :param syn_fmt: expected file format of the synthetic waveforms. Used for
       safety checks that only one expected file format will be read. Defaults
       to 'ASCII'
   :type components: list
   :param components: optional list of components to ignore preprocessing
       traces that do not have matching components. The adjoint sources for
       these components will be 0. E.g., ['Z', 'N']. If None, all available
       components will be considered.
   :rtype: list of tuples
   :return: [(observed filename, synthetic filename)]. tuples will contain
       filenames for matching stations + component for obs and syn


.. py:function:: convert_stations_to_sources(stations_file, source_file, source_type, output_dir='./')

   Used for ambient noise adjoint tomography inversions where each station
   is treated like a virtual source. This requires generating source files
   for each station in a station file.

   .. warning::

       In SPECFEM3D_GLOBE, the FORCESOLUTION first line requires a specific
       format. I haven't tested this thoroughly but it either cannot exceed
       a char limit, or cannot have characters like spaces or hyphens. To be
       safe, something like FORCE{N} will work, where N should be the source
       number. This function enforces this just incase.

   :type stations_file: str
   :param stations_file: full path to SPECFEM STATIONS file which should be
       formatted 'STATION NETWORK LATITUDE LONGITUDE ELEVATION BURIAL',
       elevant and burial will not be used
   :type source_file: str
   :param source_file: path to
   :type source_type: str
   :param source_type: tells SeisFlows what type of file we are using, which
       in turn defines the specific keys and delimiters to use when editing
       the source file.

       - SOURCE: SPECFEM2D source file
       - FORCESOLUTION_3D: SPECFEM3D Cartesian FORCESOLUTION file
       - FORCESOLUTION_3DGLOBE: SPECFEM3D_GLOBE FORCESOLUTION file


.. py:function:: get_station_locations(stations_file)

   Read the SPECFEM STATIONS file to get metadata information about stations.
   This functionality is required in a few preprocessing or utility functions

   :type stations_file: str
   :param stations_file: full path to STATIONS file that will be read. We
       assume the structure of the STATIONS file to be a text file with the
       first 4 columns defining 'station ID, network ID, latitude, longitude'
   :rtype: Dict
   :return: keys are `network_station` and values are dictionaries contianing
       'lat' and 'lon' for latitude and longitude values


.. py:function:: get_source_locations(path_to_sources, source_prefix)

   Read in SOURCE/CMTSOLUTION/FORCESOLUTION files to get lat/lon/depth or x/y/z
   values which can be used to determine relative locations between sources
   and receivers. Used by the preprocessing module.


   :type path_to_sources: str
   :param path_to_sources: full path to all source files which should start
       with `source_prefix`. Will run a wildcard glob search on this path
       to look for ALL available source files
   :type source_prefix: str
   :param source_prefix: source file name prefix used for wildcard searching.
       This function will look for glob(`path_to_sources`/`source_prefix`_*),
       and tag each of the sources with whatever tag comes within wildcard *
   :rtype: Dict
   :return: keys are `source tag` and values are dictionaries contianing
       'lat' and 'lon' for latitude and longitude values. If None is returned,
       then function could not determine what keys to use for reading source.


.. py:function:: rename_as_adjoint_source(fid, fmt)

   Rename SPECFEM synthetic waveform filenames consistent with how SPECFEM
   expects adjoint sources to be named. Usually this just means adding
   a '.adj' to the end of the filename.

   :type fid: str
   :param fid: file path of synthetic waveform to rename as adjoint source
   :type fmt: str
   :param fmt: expected format of the input synthetic waveform, because
       different file formats have different filename structure. Available
       are 'SU' (seismic unix) and 'ASCII'. Case-insensitive
   :rtype: str
   :return: renamed file that matches expected SPECFEM filename format
       for adjoint sources


.. py:function:: check_source_names(path_specfem_data, source_prefix, ntask=None)

   Determines names of sources by applying wildcard rule to user-supplied
   input files. Source names are only provided up to PAR.NTASK and are
   returned in alphabetical order.

   .. note::
       SeisFlows expects sources to be stored in the DATA/ directory with a
       prefix and a source name, e.g., {source_prefix}_{source_name} which
       would evaluate to something like CMTSOLUTION_001

   :type path_specfem_data: str
   :param path_specfem_data: path to a
   :type source_prefix: str
   :param source_prefix: type of SPECFEM input source, e.g., CMTSOLUTION
   :type ntask: int
   :parma ntask: if provided, curtails the list of sources up to `ntask`. If
       None, returns all files found matching the wildcard
   :rtype: list
   :return: alphabetically ordered list of source names up to PAR.NTASK


.. py:function:: getpar(key, file, delim='=', match_partial=False, comment='#', _fmt_dbl=True)

   Reads and returns parameters from a SPECFEM or SeisFlows parameter file
   Assumes the parameter file is formatted in the following way:

   # comment comment comment
   {key}      {delim} VAL

   :type key: str
   :param key: case-insensitive key to match in par_file. must be EXACT match
   :type file: str
   :param file: The SPECFEM Par_file to match against
   :type delim: str
   :param delim: delimiter between parameters and values within the file.
       default is '=', which matches for SPECFEM2D and SPECFEM3D_Cartesian
   :type match_partial: bool
   :param match_partial: allow partial key matches, e.g., allow key='tit' to
       return value for 'title'. Defaults to False as this can have
       unintended consequences
   :type comment: str
   :param comment: character used to delimit comments in the file. Defaults to
       '#' for the SPECFEM Par_file, but things like the FORCESOLUTION use '!'
   :type _fmt_dbl: bool
   :param _fmt_dbl: the SPECFEM files use FORTRAN double precision notation
       to define floats (e.g., 2.5d0 == 2.5, or 1.2e2 == 1.2*10^2). Usually it
       is preferable to convert this notation directly to a Python float, so
       this is set True by default. However, in cases where we are doing string
       replacement, like when using `setpar`, we do not want to format the
       double precision values, so this should be set False.
   :rtype: tuple (str, str, int)
   :return: a tuple of the key, value and line number (indexed from 0).
       The key will match exactly how it looks in the Par_file
       The value will be returned as a string, regardless of its expected type
       IF no matches found, returns (None, None, None)


.. py:function:: setpar(key, val, file, delim='=', **kwargs)

   Overwrites parameter value to a SPECFEM Par_file. Kwargs passed to `getpar`

   :type key: str
   :param key: case-insensitive key to match in par_file. must be EXACT match
   :type val: str
   :param val: value to OVERWRITE to the given key
   :type file: str
   :param file: The SPECFEM Par_file to match against
   :type delim: str
   :param delim: delimiter between parameters and values within the file.
       default is '=', which matches for SPECFEM2D and SPECFEM3D_Cartesian
   :type match_partial: bool
   :param match_partial: allow partial key matches, e.g., allow key='tit' to
       return value for 'title'. Defaults to False as this can have
       unintended consequences


.. py:function:: _getidx_vel_model(lines)

   Get the line indices of a velocity model, which can be used to retrieve 
   or replace the model values in a SPECFEM2D paramter file. Used by 
   `getpar_vel_model` and `setpar_vel_model`

   :type lines: list
   :param lines: list of strings read from the par_file
   :rtype idxs: list
   :param idxs: list of integer indices of the velocity model lines


.. py:function:: getpar_vel_model(file, strip=False)

       SPECFEM2D doesn't follow standard key = val formatting when defining its 
       internal velocity models so we need a special function to address this 
       specifically.
       
       Velocity models are ASSUMED to be formatted in the following way

       1 1 2700.d0 3000.d0 1732.051d0 0 0 9999 9999 0 0 0 0 0 0

       That is, 15 entries separated by spaces. We use that to find all relevant 
       lines of the model.
       
       :type file: str
       :param file: The SPECFEM Par_file to match against
       :type strip: bool
       :param strip: strip newline '
   ' from each of the model lines
       :rtype: list of str
       :return: list of all the layers of the velocity model as strings
       

.. py:function:: setpar_vel_model(file, model)

   Set velocity model values in a SPECFEM2D Par_file, see getpar_vel_model
   for more information.

   Deletes the old model from the Par_file, writes the new model in the same
   place, and then changes the value of 'nbmodels'

   :type file: str
   :param file: The SPECFEM Par_file to match against
   :type model: list of str
   :param model: input model
   :rtype: list of str
   :return: list of all the layers of the velocity model as strings, e.g.:
       model = ["1 1 2700.d0 3000.d0 1732.051d0 0 0 9999 9999 0 0 0 0 0 0",
                "2 1 2500.d0 2700.d0 0 0 0 9999 9999 0 0 0 0 0 0"]


.. py:function:: read_fortran_binary(filename)

   Reads Fortran-style unformatted binary data into numpy array.

   .. note::
       The FORTRAN runtime system embeds the record boundaries in the data by
       inserting an INTEGER*4 byte count at the beginning and end of each
       unformatted sequential record during an unformatted sequential WRITE.
       see: https://docs.oracle.com/cd/E19957-01/805-4939/6j4m0vnc4/index.html

   :type filename: str
   :param filename: full path to the Fortran unformatted binary file to read
   :rtype: np.array
   :return: numpy array with data with data read in as type Float32


.. py:function:: write_fortran_binary(arr, filename)

   Writes Fortran style binary files. Data are written as single precision
   floating point numbers.

   .. note::
       FORTRAN unformatted binaries are bounded by an INT*4 byte count. This
       function mimics that behavior by tacking on the boundary data.
       https://docs.oracle.com/cd/E19957-01/805-4939/6j4m0vnc4/index.html

   :type arr: np.array
   :param arr: data array to write as Fortran binary
   :type filename: str
   :param filename: full path to file that should be written in format
       unformatted Fortran binary