seisflows.preprocess.pyaflowa

The Pyaflowa preprocessing module for waveform gathering, preprocessing and misfit quantification. We use the name ‘Pyaflowa’ to avoid any potential name overlaps with the actual pyatoa package.

Module Contents

Classes

Pyaflowa

Pyaflowa Preprocess

class seisflows.preprocess.pyaflowa.Pyaflowa(min_period=1.0, max_period=10.0, filter_corners=4, client=None, rotate=False, pyflex_preset='default', fix_windows=False, adj_src_type='cc', plot=True, pyatoa_log_level='DEBUG', unit_output='VEL', max_workers=None, export_datasets=True, export_figures=True, export_log_files=True, workdir=os.getcwd(), path_preprocess=None, path_solver=None, path_specfem_data=None, path_data=None, path_output=None, data_format='ascii', data_case='data', components='ZNE', start=None, ntask=1, nproc=1, source_prefix=None, **kwargs)

Preprocessing and misfit quantification using Python’s Adjoint Tomography Operations Assistance (Pyatoa)

type min_period

float

param min_period

Minimum filter corner in unit seconds. Bandpass

filter if set with max_period, highpass filter if set without max_period, no filtering if not set and `max_period also not set :type max_period: float :param max_period: Maximum filter corner in unit seconds. Bandpass

filter if set with min_period, lowpass filter if set without min_period, no filtering if not set and `min_period also not set

check()

Checks Parameter and Path files, will be run at the start of a Seisflows workflow to ensure that things are set appropriately.

setup()

Sets up data preprocessing machinery by establishing an internally defined directory structure that will be used to store the outputs of the preprocessing workflow

Note

config.save_to_ds must be set False, otherwise Pyatoa will try to write to a read-only ASDFDataSet causing preprocessing to fail.

static _ftag(config)

Create a re-usable file tag from the Config object as multiple functions will use this tag for file naming and file discovery.

Parameters

config (pyatoa.core.config.Config) – Configuration object that must contain the ‘event_id’, iteration and step count

quantify_misfit(source_name=None, save_residuals=None, save_adjsrcs=None, iteration=1, step_count=0, **kwargs)

Prepares solver for gradient evaluation by evaluating data-synthetic misfit and writing residuals and adjoint traces. Meant to be called by workflow.evaluate_objective_function.

Note

meant to be run on system using system.run() with access to solver

Warning

parallel processing with concurrent futures currently leads to computer crashes and I have not been able to figure out why or how to deal with it. So for the moment misfit quantification on a per-event basis is run serially

Parameters
  • source_name (str) – name of the event to quantify misfit for. If not given, will attempt to gather event id from the given task id which is assigned by system.run()

  • save_residuals (str) – if not None, path to write misfit/residuls to

  • save_adjsrcs (str) – if not None, path to write adjoint sources to

  • iteration (int) – current iteration of the workflow, information should be provided by workflow module if we are running an inversion. Defaults to 1 if not given (1st iteration)

  • step_count (int) – current step count of the line search. Information should be provided by the optimize module if we are running an inversion. Defaults to 0 if not given (1st evaluation)

_setup_quantify_misfit(source_name, iteration, step_count)

Create an event-specific Config object which contains information about the current event, and position in the workflow evaluation. Also provides specific information on event paths and timing to be used by the Manager

Parameters
  • source_name (str) – name of the event to quantify misfit for. If not given, will attempt to gather event id from the given task id which is assigned by system.run()

  • iteration (int) – current iteration of the workflow, information should be provided by workflow module if we are running an inversion. Defaults to 1 if not given (1st iteration)

  • step_count (int) – current step count of the line search. Information should be provided by the optimize module if we are running an inversion. Defaults to 0 if not given (1st evaluation)

Return type

pyatoa.core.config.Config

Returns

Config object that is specifically crafted for a given event that can be directly fed to the Manager for misfit quantification

_run_quantify_misfit(config, save_adjsrcs, parallel=False)

Run misfit quantification for each station concurrently or in serial. If concurrent, play it safe and cap the max parallel workers at half the processors otherwise you may run out of memory and crash the CPU

_quantify_misfit_station(config, station_code, save_adjsrcs=False)

Run misfit quantification for a single event-station pair. Gathers, preprocesses, windows and measures data, saves adjoint source if requested, and then returns the total misfit and the collected windows for the station.

Parameters
  • config (pyatoa.core.config.Config) – Config object that defines all the processing parameters required by the Pyatoa workflow

  • station_code (str) – chosen station to quantify misfit for. Should be in the format ‘NN.SSS.LL.CCC’

  • save_adjsrcs (str) – path to directory where adjoint sources should be saved. Filenames will be generated automatically by Pyatoa to fit the naming schema required by SPECFEM. If False, no adjoint sources will be saved. They of course can be saved manually later using Pyatoa + PyASDF

sum_residuals(residuals)

Return summed residuals devided by number of events following equation in Tape et al. 2010

Parameters

residuals (np.array) – list of residuals from each NTASK event

Return type

float

Returns

sum of squares of residuals

finalize()

Run some serial finalization tasks specific to Pyatoa, which will help aggregate the collection of output information.

Note

This finalize function performs the following tasks: * Generate .csv files using the Inspector * Aggregate event-specific PDFs into a single evaluation PDF * Save scratch/ data into output/ if requested

_check_fixed_windows(iteration, step_count)

Determine how to address re-using misfit windows during an inversion workflow. Throw some log messages out to let the User know whether or not misfit windows will be re used throughout an inversion.

True: Always fix windows except for i01s00 because we don’t have any

windows for the first function evaluation

False: Don’t fix windows, always choose a new set of windows Iter: Pick windows only on the initial step count (0th) for each

iteration. WARNING - does not work well with Thrifty Inversion because the 0th step count is usually skipped

Once: Pick new windows on the first function evaluation and then fix

windows. Useful for when parameters have changed, e.g. filter bounds

Parameters
  • iteration (int) – The current iteration of the SeisFlows3 workflow, within SeisFlows3 this is defined by optimize.iter

  • step_count (int) – Current line search step count within the SeisFlows3 workflow. Within SeisFlows3 this is defined by optimize.line_search.step_count

Return type

tuple (bool, str)

Returns

(bool on whether to use windows from the previous step, and a message that can be sent to the logger)

_config_pyatoa_logger(fid)

Create a log file to track processing of a given source-receiver pair. Because each station is processed asynchronously, we don’t want them to log to the main file at the same time, otherwise we get a random mixing of log messages. Instead we have them log to temporary files, which are combined at the end of the processing script in serial.

Parameters

fid (str) – full path and filename for logger that will be configured

Return type

logging.Logger

Returns

a logger which does NOT log to stdout and only logs to the given file defined by fid

_collect_tmp_log_files(pyatoa_logger, event_id)

Each source-receiver pair has made its own log file. This function collects these files and writes their content back into the main log. This is a lot of IO but should be okay since the files are small.

Note

This was the most foolproof method for having multiple parallel processes write to the same file. I played around with StringIO buffers and file locks, but they became overly complicated and ultimately did not work how I wanted them to. This function trades filecount and IO overhead for simplicity.

Warning

The assumption here is that the number of source-receiver pairs is manageable (in the thousands). If we start reaching file count limits on the cluster then this method for logging may have to be re-thought. See link for example: https://stackless.readthedocs.io/en/3.7-slp/howto/

logging-cookbook.html#using-concurrent-futures-processpoolexecutor

Parameters
  • pyatoa_logger (logging.Logger) – The main logger for a given event, should be defined by pyaflowa.quantify_misfit()

  • event_id (str) – given event id that we are concerned with. Used to search for matching log files in the temporary log file directory

_make_event_figure_pdf(source_name, output_fid)

Combines source-receiver output PNGS into a single event-specific PDF. Mostly a convenience function to make it easier to ingest waveform figures during a workflow.

Parameters
  • source_name (str) – name of event to search for input files

  • output_fid (str) – full path and filename for output PDF which will be a combination of all the PNG files created for each station

_make_evaluation_composite_pdf()

Combines event-specific PDFs to make an evaluation-specific PDF. By evaluation we mean any given set of foward simulations, e.g., i01s00

This is meant to make it easier for the User to scroll through figures. Deletes the original event-specific PDFs to keep filecount down