seisflows.preprocess.pyaflowa
The Pyaflowa preprocessing module for waveform gathering, preprocessing and misfit quantification. We use the name ‘Pyaflowa’ to avoid any potential name overlaps with the actual pyatoa package.
Module Contents
Classes
Pyaflowa Preprocess |
- class seisflows.preprocess.pyaflowa.Pyaflowa(min_period=1.0, max_period=10.0, filter_corners=4, client=None, rotate=False, pyflex_preset='default', fix_windows=False, adj_src_type='cc_traveltime', plot=True, pyatoa_log_level='DEBUG', unit_output='VEL', export_datasets=True, export_figures=True, export_log_files=True, workdir=os.getcwd(), path_preprocess=None, path_solver=None, path_specfem_data=None, path_data=None, path_output=None, syn_data_format='ascii', data_case='data', components='ZNE', start=None, ntask=1, nproc=1, source_prefix=None, **kwargs)
Preprocessing and misfit quantification using Python’s Adjoint Tomography Operations Assistant (Pyatoa)
- type min_period
float
- param min_period
Minimum filter corner in unit seconds. Bandpass
filter if set with max_period, highpass filter if set without max_period, no filtering if not set and `max_period also not set :type max_period: float :param max_period: Maximum filter corner in unit seconds. Bandpass
filter if set with min_period, lowpass filter if set without min_period, no filtering if not set and `min_period also not set
- check()
Checks Parameter and Path files, will be run at the start of a Seisflows workflow to ensure that things are set appropriately.
- setup()
Sets up data preprocessing machinery by establishing an internally defined directory structure that will be used to store the outputs of the preprocessing workflow
Note
config.save_to_ds must be set False, otherwise Pyatoa will try to write to a read-only ASDFDataSet causing preprocessing to fail.
- static ftag(config)
Create a re-usable file tag from the Config object as multiple functions will use this tag for file naming and file discovery.
- Parameters
config (pyatoa.core.config.Config) – Configuration object that must contain the ‘event_id’, iteration and step count
- quantify_misfit(source_name=None, save_residuals=None, save_adjsrcs=None, iteration=1, step_count=0, parallel=False, **kwargs)
Main processing function to be called by Workflow module. Generates total misfit and adjoint sources for a given event with name source_name.
Note
Meant to be called by workflow.evaluate_objective_function and run on system using system.run() to get access to compute nodes.
- Parameters
source_name (str) – name of the event to quantify misfit for. If not given, will attempt to gather event id from the given task id which is assigned by system.run()
save_residuals (str) – if not None, path to write misfit/residuls to
save_adjsrcs (str) – if not None, path to write adjoint sources to
iteration (int) – current iteration of the workflow, information should be provided by workflow module if we are running an inversion. Defaults to 1 if not given (1st iteration)
step_count (int) – current step count of the line search. Information should be provided by the optimize module if we are running an inversion. Defaults to 0 if not given (1st evaluation)
- set_config(source_name=None, iteration=1, step_count=0)
Create an event-specific Config object which contains information about the current event, and position in the workflow evaluation. Also provides specific information on event paths and timing to be used by the Manager
- Parameters
source_name (str) – name of the event to quantify misfit for. If not given, will attempt to gather event id from the given task id which is assigned by system.run(). Defaults to self._source_names[0]
iteration (int) – current iteration of the workflow, information should be provided by workflow module if we are running an inversion. Defaults to 1 if not given (1st iteration)
step_count (int) – current step count of the line search. Information should be provided by the optimize module if we are running an inversion. Defaults to 0 if not given (1st evaluation)
- Return type
pyatoa.core.config.Config
- Returns
Config object that is specifically crafted for a given event that can be directly fed to the Manager for misfit quantification
- quantify_misfit_station(config, station_code, save_adjsrcs=False)
Main Pyatoa processing function to quantify misfit + generation adjsrc.
Run misfit quantification for a single event-station pair. Gather data, preprocess, window and measure data, save adjoint source if requested, and then returns the total misfit and the collected windows for the station.
- Parameters
config (pyatoa.core.config.Config) – Config object that defines all the processing parameters required by the Pyatoa workflow
station_code (str) – chosen station to quantify misfit for. Should be in the format ‘NN.SSS.LL.CCC’
save_adjsrcs (str) – path to directory where adjoint sources should be saved. Filenames will be generated automatically by Pyatoa to fit the naming schema required by SPECFEM. If False, no adjoint sources will be saved. They of course can be saved manually later using Pyatoa + PyASDF
- sum_residuals(residuals)
Return summed residuals devided by number of events following equation in Tape et al. 2010
- Parameters
residuals (np.array) – list of residuals from each NTASK event
- Return type
float
- Returns
sum of squares of residuals
- finalize()
Run serial finalization tasks at the end of a given iteration. These tasks are specific to Pyatoa, used to aggregate figures and data.
Note
This finalize function performs the following tasks: * Generate .csv files using the Inspector * Aggregate event-specific PDFs into a single evaluation PDF * Save scratch/ data into output/ if requested
- _check_fixed_windows(iteration, step_count)
Determine how to address re-using misfit windows during an inversion workflow. Throw some log messages out to let the User know whether or not misfit windows will be re used throughout an inversion.
- True: Always fix windows except for i01s00 because we don’t have any
windows for the first function evaluation
False: Don’t fix windows, always choose a new set of windows Iter: Pick windows only on the initial step count (0th) for each
iteration. WARNING - does not work well with Thrifty Inversion because the 0th step count is usually skipped
- Once: Pick new windows on the first function evaluation and then fix
windows. Useful for when parameters have changed, e.g. filter bounds
- Parameters
iteration (int) – The current iteration of the SeisFlows3 workflow, within SeisFlows3 this is defined by optimize.iter
step_count (int) – Current line search step count within the SeisFlows3 workflow. Within SeisFlows3 this is defined by optimize.line_search.step_count
- Return type
tuple (bool, str)
- Returns
(bool on whether to use windows from the previous step, and a message that can be sent to the logger)
- _config_adjtomo_loggers(fid)
Create a log file to track processing of a given source-receiver pair. Because each station is processed asynchronously, we don’t want them to log to the main file at the same time, otherwise we get a random mixing of log messages. Instead we have them log to temporary files, which are combined at the end of the processing script in serial.
- Parameters
fid (str) – full path and filename for logger that will be configured
- Return type
logging.Logger
- Returns
a logger which does NOT log to stdout and only logs to the given file defined by fid
- _collect_tmp_log_files(pyatoa_logger, event_id)
Each source-receiver pair has made its own log file. This function collects these files and writes their content back into the main log. This is a lot of IO but should be okay since the files are small.
Note
This was the most foolproof method for having multiple parallel processes write to the same file. I played around with StringIO buffers and file locks, but they became overly complicated and ultimately did not work how I wanted them to. This function trades filecount and IO overhead for simplicity.
Warning
The assumption here is that the number of source-receiver pairs is manageable (in the thousands). If we start reaching file count limits on the cluster then this method for logging may have to be re-thought. See link for example: https://stackless.readthedocs.io/en/3.7-slp/howto/
logging-cookbook.html#using-concurrent-futures-processpoolexecutor
- Parameters
pyatoa_logger (logging.Logger) – The main logger for a given event, should be defined by pyaflowa.quantify_misfit()
event_id (str) – given event id that we are concerned with. Used to search for matching log files in the temporary log file directory
- _make_event_figure_pdf(source_name, output_fid)
Combines source-receiver output PNGS into a single event-specific PDF. Mostly a convenience function to make it easier to ingest waveform figures during a workflow.
- Parameters
source_name (str) – name of event to search for input files
output_fid (str) – full path and filename for output PDF which will be a combination of all the PNG files created for each station
- _make_evaluation_composite_pdf()
Combines event-specific PDFs to make an evaluation-specific PDF. By evaluation we mean any given set of foward simulations, e.g., i01s00
This is meant to make it easier for the User to scroll through figures. Deletes the original event-specific PDFs to keep filecount down