seisflows.preprocess.pyaflowa

The Pyaflowa preprocessing module for waveform gathering, preprocessing and misfit quantification. We use the name ‘Pyaflowa’ to avoid any potential name overlaps with the actual pyatoa package.

Module Contents

Classes

Pyaflowa

Pyaflowa Preprocess

class seisflows.preprocess.pyaflowa.Pyaflowa(min_period=1.0, max_period=10.0, filter_corners=4, client=None, rotate=False, pyflex_preset='default', fix_windows=False, adj_src_type='cc_traveltime', plot=True, pyatoa_log_level='DEBUG', unit_output='VEL', export_datasets=True, export_figures=True, export_log_files=True, workdir=os.getcwd(), path_preprocess=None, path_solver=None, path_specfem_data=None, path_data=None, path_output=None, syn_data_format='ascii', data_case='data', components='ZNE', start=None, ntask=1, nproc=1, source_prefix=None, **kwargs)

Preprocessing and misfit quantification using Python’s Adjoint Tomography Operations Assistant (Pyatoa)

type min_period

float

param min_period

Minimum filter corner in unit seconds. Bandpass

filter if set with max_period, highpass filter if set without max_period, no filtering if not set and `max_period also not set :type max_period: float :param max_period: Maximum filter corner in unit seconds. Bandpass

filter if set with min_period, lowpass filter if set without min_period, no filtering if not set and `min_period also not set

check()

Checks Parameter and Path files, will be run at the start of a Seisflows workflow to ensure that things are set appropriately.

setup()

Sets up data preprocessing machinery by establishing an internally defined directory structure that will be used to store the outputs of the preprocessing workflow

Note

config.save_to_ds must be set False, otherwise Pyatoa will try to write to a read-only ASDFDataSet causing preprocessing to fail.

static ftag(config)

Create a re-usable file tag from the Config object as multiple functions will use this tag for file naming and file discovery.

Parameters

config (pyatoa.core.config.Config) – Configuration object that must contain the ‘event_id’, iteration and step count

quantify_misfit(source_name=None, save_residuals=None, save_adjsrcs=None, iteration=1, step_count=0, parallel=False, **kwargs)

Main processing function to be called by Workflow module. Generates total misfit and adjoint sources for a given event with name source_name.

Note

Meant to be called by workflow.evaluate_objective_function and run on system using system.run() to get access to compute nodes.

Parameters
  • source_name (str) – name of the event to quantify misfit for. If not given, will attempt to gather event id from the given task id which is assigned by system.run()

  • save_residuals (str) – if not None, path to write misfit/residuls to

  • save_adjsrcs (str) – if not None, path to write adjoint sources to

  • iteration (int) – current iteration of the workflow, information should be provided by workflow module if we are running an inversion. Defaults to 1 if not given (1st iteration)

  • step_count (int) – current step count of the line search. Information should be provided by the optimize module if we are running an inversion. Defaults to 0 if not given (1st evaluation)

set_config(source_name=None, iteration=1, step_count=0)

Create an event-specific Config object which contains information about the current event, and position in the workflow evaluation. Also provides specific information on event paths and timing to be used by the Manager

Parameters
  • source_name (str) – name of the event to quantify misfit for. If not given, will attempt to gather event id from the given task id which is assigned by system.run(). Defaults to self._source_names[0]

  • iteration (int) – current iteration of the workflow, information should be provided by workflow module if we are running an inversion. Defaults to 1 if not given (1st iteration)

  • step_count (int) – current step count of the line search. Information should be provided by the optimize module if we are running an inversion. Defaults to 0 if not given (1st evaluation)

Return type

pyatoa.core.config.Config

Returns

Config object that is specifically crafted for a given event that can be directly fed to the Manager for misfit quantification

quantify_misfit_station(config, station_code, save_adjsrcs=False)

Main Pyatoa processing function to quantify misfit + generation adjsrc.

Run misfit quantification for a single event-station pair. Gather data, preprocess, window and measure data, save adjoint source if requested, and then returns the total misfit and the collected windows for the station.

Parameters
  • config (pyatoa.core.config.Config) – Config object that defines all the processing parameters required by the Pyatoa workflow

  • station_code (str) – chosen station to quantify misfit for. Should be in the format ‘NN.SSS.LL.CCC’

  • save_adjsrcs (str) – path to directory where adjoint sources should be saved. Filenames will be generated automatically by Pyatoa to fit the naming schema required by SPECFEM. If False, no adjoint sources will be saved. They of course can be saved manually later using Pyatoa + PyASDF

sum_residuals(residuals)

Return summed residuals devided by number of events following equation in Tape et al. 2010

Parameters

residuals (np.array) – list of residuals from each NTASK event

Return type

float

Returns

sum of squares of residuals

finalize()

Run serial finalization tasks at the end of a given iteration. These tasks are specific to Pyatoa, used to aggregate figures and data.

Note

This finalize function performs the following tasks: * Generate .csv files using the Inspector * Aggregate event-specific PDFs into a single evaluation PDF * Save scratch/ data into output/ if requested

_check_fixed_windows(iteration, step_count)

Determine how to address re-using misfit windows during an inversion workflow. Throw some log messages out to let the User know whether or not misfit windows will be re used throughout an inversion.

True: Always fix windows except for i01s00 because we don’t have any

windows for the first function evaluation

False: Don’t fix windows, always choose a new set of windows Iter: Pick windows only on the initial step count (0th) for each

iteration. WARNING - does not work well with Thrifty Inversion because the 0th step count is usually skipped

Once: Pick new windows on the first function evaluation and then fix

windows. Useful for when parameters have changed, e.g. filter bounds

Parameters
  • iteration (int) – The current iteration of the SeisFlows3 workflow, within SeisFlows3 this is defined by optimize.iter

  • step_count (int) – Current line search step count within the SeisFlows3 workflow. Within SeisFlows3 this is defined by optimize.line_search.step_count

Return type

tuple (bool, str)

Returns

(bool on whether to use windows from the previous step, and a message that can be sent to the logger)

_config_adjtomo_loggers(fid)

Create a log file to track processing of a given source-receiver pair. Because each station is processed asynchronously, we don’t want them to log to the main file at the same time, otherwise we get a random mixing of log messages. Instead we have them log to temporary files, which are combined at the end of the processing script in serial.

Parameters

fid (str) – full path and filename for logger that will be configured

Return type

logging.Logger

Returns

a logger which does NOT log to stdout and only logs to the given file defined by fid

_collect_tmp_log_files(pyatoa_logger, event_id)

Each source-receiver pair has made its own log file. This function collects these files and writes their content back into the main log. This is a lot of IO but should be okay since the files are small.

Note

This was the most foolproof method for having multiple parallel processes write to the same file. I played around with StringIO buffers and file locks, but they became overly complicated and ultimately did not work how I wanted them to. This function trades filecount and IO overhead for simplicity.

Warning

The assumption here is that the number of source-receiver pairs is manageable (in the thousands). If we start reaching file count limits on the cluster then this method for logging may have to be re-thought. See link for example: https://stackless.readthedocs.io/en/3.7-slp/howto/

logging-cookbook.html#using-concurrent-futures-processpoolexecutor

Parameters
  • pyatoa_logger (logging.Logger) – The main logger for a given event, should be defined by pyaflowa.quantify_misfit()

  • event_id (str) – given event id that we are concerned with. Used to search for matching log files in the temporary log file directory

_make_event_figure_pdf(source_name, output_fid)

Combines source-receiver output PNGS into a single event-specific PDF. Mostly a convenience function to make it easier to ingest waveform figures during a workflow.

Parameters
  • source_name (str) – name of event to search for input files

  • output_fid (str) – full path and filename for output PDF which will be a combination of all the PNG files created for each station

_make_evaluation_composite_pdf()

Combines event-specific PDFs to make an evaluation-specific PDF. By evaluation we mean any given set of foward simulations, e.g., i01s00

This is meant to make it easier for the User to scroll through figures. Deletes the original event-specific PDFs to keep filecount down