PICOcode.ProcessingScripts package

Submodules

PICOcode.ProcessingScripts.concat_files module

concat_files(output_file, files, logname='process_runs', file_names=None)

Concatenate recon files of the same data type into a single recon file.

The recon files being concatenated must contain all the same type of data, including variable names and format list lengths.

Parameters:

output_filestr: The name of the file in which the concatenated data is saved.
fileslist of (str or io.StringIO): The file to concatenate into output_file.
lognamestr, default=process_runs: The name of the log for logging. Defaults to the same log as process_runs().

Returns:

None

Raises:

IOError: If there is a mismatch between the variable names or recon file type between any two files being concatenated.

PICOcode.ProcessingScripts.git_updater module

class GitUpdater(path='', branch='')

Bases: object

archive(path): Wrapper around Repo.archive

check_commits()

Check if remote or local branch is more up to date.

Returns:

int: (-1, 0, 1): 0 if they are equal, -1 (1) if local (remote) is newer

fetch()

get_commit_hash()

pull(): Perform a git pull.

switch_branch(branch)

Check that the requested branch exists, then switch to it.

Parameters:

branch: str: Name of the branch to switch the repository to.

Returns:

None

PICOcode.ProcessingScripts.merge_files module

all_match(alist, indices)

Returns the first two elements of alist whcih match.

Used to determine which lines to merge in merge_files().

Parameters:

alistlist of str or int: A list containing data to match.
indiceslist of int: Indices of alist.

Returns:

None: If all entries in alist are empty strings.
str or int: The element of alist which was found to match.

-9: The error code for no matching elements.

Notes

I can’t remember what this actually does, I’ll figure this out and update this later. -Colin

merge_files(output_file, input_files, data_series='40l-19', logname='', num_event_dirs=-1)

Merge several recon files of differing data types into one recon file.

Parameters:

output_filestr: Full path to the location to save the output recon file.
input_fileslist of str: Full paths to the recon file to merge.
num_event_dirsint: Number of directories in the run folder. Must be specified for zipped run files.
lognamestr, default=””: Name of the log. If an empty string (default), skip logging.

Returns:

None

PICOcode.ProcessingScripts.process_functions module

dataset_acoustic_l2(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_concatenate_files(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_concatenate_files_from_zip(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_convert_to_root(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_dytran_l2(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_full_merge(data_dir, output_dir, log_dir, data_series, **kwargs): Find the latest merged file and merge it with anything that has not yet been merged.

dataset_handscanning(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_pickle(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_processor(processor, data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_prune_merge(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_run_summary(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_scan_missing_files(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_submit_grid_jobs(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_wait_for_grid_jobs(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_write_constants(data_dir, output_dir, log_dir, data_series, **kwargs)

dataset_zip_and_clean(data_dir, output_dir, log_dir, data_series, **kwargs)

Zip output directories which have had processing completed. Saves on number of files.

Returns:

None

PICOcode.ProcessingScripts.process_runs module

class DatasetProcessor(name, func, time): Bases: object

class ProcessRuns(args=[]): Bases: object

PICOcode.ProcessingScripts.process_single_run module

class ProcessedEvent(event_obj): Bases: object

check_processing(run, outfile_list, data_dir, output_dir, log_dir)

Check the processing output for completion following process_single_run().

Check the standard output from each of the processing modules. Results are written to the run directory, in a file called “processing_successful” if all checks pass, or “processing_fail” if one or more checks fail.

Parameters:

runstr: The run which has been processed.
outfile_listlist of str: The list of files which should exist in output_dir.
data_dir, output_dir, log_dirstr: Data, output, and logging directories.

Returns:

str: Contains a description of what went wrong.

listener(queue, outfile, run)

Listens for messages on a queue, and writes a recon file.

Each function in AnalysisModules terminates after passing the processed data for one event to the queue. This data is collected here, and written to a new recon file once all events in a run are complete.

Parameters:

queuemultiprocessing.Manager.Queue: An instance of Queue from the multiprocessing.Manager class.
outfilestr: The name of the recon file which will be written
runstr: The run being processed. Only used for logging purposes.

Returns:

None

process_single_event(run, event, data_dir, output_dir, log_dir, data_series, processor_list, event_is_valid)

Process a single event.

This function is called using a multiprocessing pool from process_single_run().

Parameters:

runstr: The run name. Mainly used at this point for logging, and preparing the data to write to the recon file.
eventEvent: An instance of Event. Load options for this instance depends on the elements of processor_list.
data_dir, output_dir, logging_dirstr: Data, output, and logging directories. See process_single_run() for a description.
data_series{“49l-19-data”, “30l-16-data”, “2l-16-data”}: The data series.
processor_listlist of EventProcessor: A list of processing tasks to apply to this event.
event_is_valid: array_like of bool: Flag for whether to actually try to load an event. Some events are corrupt and hang when loading the Event() class, so skip them.

Returns:

procProcessedEvent: A populated instance of ProcessedEvent.

process_single_run(run, data_dir, output_dir, log_dir, lock_file, data_series, *args, archive_dir='', **kwargs)

Process a single run for PICO-40L.

Which processing functions are called depends on *args.

Parameters:

runstr: Which run to process.
data_dirstr: Where the unprocessed data is stored. Actual data is in data_dir/run.
output_dirstr: Where the processed data is saved. Output is written to output_dir/run.
log_dirstr: Where to write the log files. Logs are saved to log_dir/run.
lock_filestr: A semaphore which indicates whether processing is still running. This file is deleted at the end of this function.
data_seriesstr: Name of data series. E.g. 40l-19
*args: The arguments passed from process_runs(). These arguments are formatted as though they are command line arguments, and are parsed by parse_args.
archive_dirstr, default=””: If the run data is archived, extract the archive from here to data_dir.

Notes

Generally, each event in a run is processed in parallel. Some processing can only be applied to an entire run (e.g. if the processing is in C++ and is called using subprocessing, such as AutoBub3hs and XYZFixL2). Parallel processing is achieved using the multiprocessing module.

run_handscan(processor, data_dir, output_dir, log_dir, data_series, run, event): OBSOLETE

run_processor(processor, data_dir, output_dir, log_dir, data_series, run, event)

Master function to call each processing function.

OBSOLETE

PICOcode.ProcessingScripts.process_single_run_rewrite module

class ProcessConfig(pArgs=None)

Bases: SimpleNamespace

Store the contents of the config

check_valid_config()

class ProcessSingleEvent(run, event): Bases: object

class ProcessSingleRun(config)

Bases: object

Process a single run by initiating processing of each individual event.

Parameters:

configProcessConfig: Instance of the ProcessConfig class, with all required attributes set (run, ev, data_dir, etc.).

PICOcode.ProcessingScripts.process_tasks module

The functions used in process_single_run() are defined in this file.

class EventProcessor(name, module_prefix, func, outfile, event_data_needed, outfile_prefix, kwargs={})

Bases: object

Wrapper class for processing a single event.

Wrapper class for storing the information with which functions in the AnalysisModules package are called, for single event processing.

Parameters:

namestr: Name of the processing task. Only used in logging.
module_prefixstr: The beginning of the file name of the module in AnalysisModules. The data series is appended in process_single_run() to get the full module name, from which func is imported.
funcstr: The name of the function imported from the module defined by module_prefix. Imported using importlib.
outfilestr: Unused. Remnant from COUPPcode.
event_data_neededlist of str: The data required for completing the processing task. This is passed to Event.
outfile_prefixstr: The name of the recon file which is produced after the run is processed.
kwargsdict: A dict containing any other data required for processing.

Attributes:

name, module_prefix, func, outfile, event_data_needed, outfile_prefix, kwargssee Parameters
queue{None, multiprocessing.Manager.Queue}: The Queue instance used to send data to listener().

class RunProcessor(name, module_prefix, func, outfile_prefix)

Bases: object

Wrapper class for processing an entire run.

Wrapper class for storing the information with which functions in the AnalysisModules package are called, for processing which is applied to an entire run at a time. Examples are AutuBub3hs and XYZFixL2, which are C++ programs which process an entire run at a time, which are called using subprocess.

Parameters:

namestr: Name of the processing task. Only used in logging.
module_prefixstr: The beginning of the file name of the module in AnalysisModules. The data series is appended in process_single_run() to get the full module name, from which func is imported.
funcstr: The name of the function imported from the module defined by module_prefix. Imported using importlib.
outfile_prefixstr: The name of the recon file which is produced after the run is processed.

Attributes:

name, module_prefix, func, outfile, event_data_needed, outfile_prefix, kwargssee Parameters

PICOcode.ProcessingScripts.process_utilities module

build_or_rebuild_code(source_dir, build_dir)

Compile C++ code from source_dir in build_dir. The build_dir is created if it does not exist, but it will not be cleaned beforehand.

Returns:

None

Raises:

subprocess.CalledProcessError: If either of the build steps (cmake, make) fails.

check_dir(directory): Corresponds to get_dir_arg from COUPP.pm

check_tar_and_extract(archive_dir, data_dir, tarfile_name, logname=None)

Checks if the contents of the given tar file exist in the data directory.

If the data does not exist, then extract it to data_dir.

Parameters:

archive_dirstr: Location of the run archives.
data_dirstr: Where the extracted data should (or will) exist.
tarfile_namestr: The base name of the archive to extract.
lognameNone or str: The name of the log to use. If logname = None, skip logging.

daq_version(datadir, run)

get_data_series(datadir): Corresponds to DataSeries from COUPP.pm

setup_logger(name, log_file, level=10, formatter=None, mode='w')

Make a call to logging.getLogger, and set up standard formatting.

The logger is set up to write to a log file. One call to this function is made by process_runs() and process_single_run() each. Loggers may be inherited from these during processing by calling logging.getLogger, with the same name as used to initialize this logger, with .newname appended. See Examples below.

Parameters:

namestr: Name of the log, passed to logging.getLogger.
log_filestr: Full path to the file in which to log.
levelint or logging level: The logging level to use. Messages with logging level less than this value are ignored.
formatterNone or str, default=None: How the message is formatted. If formatter = None (default), use a predefined format.
mode{‘w’, ‘a’}: Overwrite (‘w’) any existing log which may exist at log_file, or append to it (‘a’) if one exists.

PICOcode.ProcessingScripts package

Submodules

PICOcode.ProcessingScripts.concat_files module

PICOcode.ProcessingScripts.git_updater module

PICOcode.ProcessingScripts.merge_files module

PICOcode.ProcessingScripts.process_functions module

PICOcode.ProcessingScripts.process_runs module

PICOcode.ProcessingScripts.process_single_run module

PICOcode.ProcessingScripts.process_single_run_rewrite module

PICOcode.ProcessingScripts.process_tasks module

PICOcode.ProcessingScripts.process_utilities module

PICOcode.ProcessingScripts.skip_runs module

PICOcode.ProcessingScripts.skip_runs_40l_19 module

PICOcode.ProcessingScripts.skip_runs_40l_22 module

Module contents