PICOcode.ProcessingScripts package

Submodules

PICOcode.ProcessingScripts.concat_files module

concat_files(output_file, files, logname='process_runs', file_names=None)

Concatenate recon files of the same data type into a single recon file.

The recon files being concatenated must contain all the same type of data, including variable names and format list lengths.

Parameters:
output_filestr

The name of the file in which the concatenated data is saved.

fileslist of (str or io.StringIO)

The file to concatenate into output_file.

lognamestr, default=process_runs

The name of the log for logging. Defaults to the same log as process_runs().

Returns:
None
Raises:
IOError

If there is a mismatch between the variable names or recon file type between any two files being concatenated.

See also

PICOcode.ProcessingScripts.merge_files.merge_files

For merging several recon files of separate types into one file.

PICOcode.ProcessingScripts.git_updater module

class GitUpdater(path='', branch='')

Bases: object

archive(path)

Wrapper around Repo.archive

check_commits()

Check if remote or local branch is more up to date.

Returns:
int: (-1, 0, 1)

0 if they are equal, -1 (1) if local (remote) is newer

fetch()
get_commit_hash()
pull()

Perform a git pull.

switch_branch(branch)

Check that the requested branch exists, then switch to it.

Parameters:
branch: str

Name of the branch to switch the repository to.

Returns:
None

PICOcode.ProcessingScripts.merge_files module

all_match(alist, indices)

Returns the first two elements of alist whcih match.

Used to determine which lines to merge in merge_files().

Parameters:
alistlist of str or int

A list containing data to match.

indiceslist of int

Indices of alist.

Returns:
None

If all entries in alist are empty strings.

str or int

The element of alist which was found to match.

-9

The error code for no matching elements.

Notes

I can’t remember what this actually does, I’ll figure this out and update this later. -Colin

merge_files(output_file, input_files, data_series='40l-19', logname='', num_event_dirs=-1)

Merge several recon files of differing data types into one recon file.

Parameters:
output_filestr

Full path to the location to save the output recon file.

input_fileslist of str

Full paths to the recon file to merge.

num_event_dirsint

Number of directories in the run folder. Must be specified for zipped run files.

lognamestr, default=””

Name of the log. If an empty string (default), skip logging.

Returns:
None

See also

PICOcode.ProcessingScripts.concat_files.concat_files

For concatenating several recon files of the same data types into one file.

Notes

Different recon types (e.g. types 1 and 2, for one line per event and one line per bubble, respectively) may be merged if compatible. Some may not, e.g. recon type 8 (one line per camera hit, produced by AutoBub) is incompatible with types 1 and 2.

num_missing_DAQ_file_lines(input_file)
num_too_many_DAQ_file_lines(input_file)

PICOcode.ProcessingScripts.process_functions module

dataset_acoustic_l2(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_concatenate_files(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_concatenate_files_from_zip(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_convert_to_root(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_dytran_l2(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_full_merge(data_dir, output_dir, log_dir, data_series, **kwargs)

Find the latest merged file and merge it with anything that has not yet been merged.

dataset_handscanning(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_pickle(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_processor(processor, data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_prune_merge(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_run_summary(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_scan_missing_files(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_submit_grid_jobs(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_wait_for_grid_jobs(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_write_constants(data_dir, output_dir, log_dir, data_series, **kwargs)
dataset_zip_and_clean(data_dir, output_dir, log_dir, data_series, **kwargs)

Zip output directories which have had processing completed. Saves on number of files.

Returns:
None

PICOcode.ProcessingScripts.process_runs module

class DatasetProcessor(name, func, time)

Bases: object

class ProcessRuns(args=[])

Bases: object

PICOcode.ProcessingScripts.process_single_run module

class ProcessedEvent(event_obj)

Bases: object

check_processing(run, outfile_list, data_dir, output_dir, log_dir)

Check the processing output for completion following process_single_run().

Check the standard output from each of the processing modules. Results are written to the run directory, in a file called “processing_successful” if all checks pass, or “processing_fail” if one or more checks fail.

Parameters:
runstr

The run which has been processed.

outfile_listlist of str

The list of files which should exist in output_dir.

data_dir, output_dir, log_dirstr

Data, output, and logging directories.

Returns:
str

Contains a description of what went wrong.

listener(queue, outfile, run)

Listens for messages on a queue, and writes a recon file.

Each function in AnalysisModules terminates after passing the processed data for one event to the queue. This data is collected here, and written to a new recon file once all events in a run are complete.

Parameters:
queuemultiprocessing.Manager.Queue

An instance of Queue from the multiprocessing.Manager class.

outfilestr

The name of the recon file which will be written

runstr

The run being processed. Only used for logging purposes.

Returns:
None
process_single_event(run, event, data_dir, output_dir, log_dir, data_series, processor_list, event_is_valid)

Process a single event.

This function is called using a multiprocessing pool from process_single_run().

Parameters:
runstr

The run name. Mainly used at this point for logging, and preparing the data to write to the recon file.

eventEvent

An instance of Event. Load options for this instance depends on the elements of processor_list.

data_dir, output_dir, logging_dirstr

Data, output, and logging directories. See process_single_run() for a description.

data_series{“49l-19-data”, “30l-16-data”, “2l-16-data”}

The data series.

processor_listlist of EventProcessor

A list of processing tasks to apply to this event.

event_is_valid: array_like of bool

Flag for whether to actually try to load an event. Some events are corrupt and hang when loading the Event() class, so skip them.

Returns:
procProcessedEvent

A populated instance of ProcessedEvent.

process_single_run(run, data_dir, output_dir, log_dir, lock_file, data_series, *args, archive_dir='', **kwargs)

Process a single run for PICO-40L.

Which processing functions are called depends on *args.

Parameters:
runstr

Which run to process.

data_dirstr

Where the unprocessed data is stored. Actual data is in data_dir/run.

output_dirstr

Where the processed data is saved. Output is written to output_dir/run.

log_dirstr

Where to write the log files. Logs are saved to log_dir/run.

lock_filestr

A semaphore which indicates whether processing is still running. This file is deleted at the end of this function.

data_seriesstr

Name of data series. E.g. 40l-19

*args

The arguments passed from process_runs(). These arguments are formatted as though they are command line arguments, and are parsed by parse_args.

archive_dirstr, default=””

If the run data is archived, extract the archive from here to data_dir.

Notes

Generally, each event in a run is processed in parallel. Some processing can only be applied to an entire run (e.g. if the processing is in C++ and is called using subprocessing, such as AutoBub3hs and XYZFixL2). Parallel processing is achieved using the multiprocessing module.

run_handscan(processor, data_dir, output_dir, log_dir, data_series, run, event)

OBSOLETE

run_processor(processor, data_dir, output_dir, log_dir, data_series, run, event)

Master function to call each processing function.

OBSOLETE

PICOcode.ProcessingScripts.process_single_run_rewrite module

class ProcessConfig(pArgs=None)

Bases: SimpleNamespace

Store the contents of the config

check_valid_config()
class ProcessSingleEvent(run, event)

Bases: object

class ProcessSingleRun(config)

Bases: object

Process a single run by initiating processing of each individual event.

Parameters:
configProcessConfig

Instance of the ProcessConfig class, with all required attributes set (run, ev, data_dir, etc.).

PICOcode.ProcessingScripts.process_tasks module

The functions used in process_single_run() are defined in this file.

class EventProcessor(name, module_prefix, func, outfile, event_data_needed, outfile_prefix, kwargs={})

Bases: object

Wrapper class for processing a single event.

Wrapper class for storing the information with which functions in the AnalysisModules package are called, for single event processing.

Parameters:
namestr

Name of the processing task. Only used in logging.

module_prefixstr

The beginning of the file name of the module in AnalysisModules. The data series is appended in process_single_run() to get the full module name, from which func is imported.

funcstr

The name of the function imported from the module defined by module_prefix. Imported using importlib.

outfilestr

Unused. Remnant from COUPPcode.

event_data_neededlist of str

The data required for completing the processing task. This is passed to Event.

outfile_prefixstr

The name of the recon file which is produced after the run is processed.

kwargsdict

A dict containing any other data required for processing.

Attributes:
name, module_prefix, func, outfile, event_data_needed, outfile_prefix, kwargssee Parameters
queue{None, multiprocessing.Manager.Queue}

The Queue instance used to send data to listener().

class RunProcessor(name, module_prefix, func, outfile_prefix)

Bases: object

Wrapper class for processing an entire run.

Wrapper class for storing the information with which functions in the AnalysisModules package are called, for processing which is applied to an entire run at a time. Examples are AutuBub3hs and XYZFixL2, which are C++ programs which process an entire run at a time, which are called using subprocess.

Parameters:
namestr

Name of the processing task. Only used in logging.

module_prefixstr

The beginning of the file name of the module in AnalysisModules. The data series is appended in process_single_run() to get the full module name, from which func is imported.

funcstr

The name of the function imported from the module defined by module_prefix. Imported using importlib.

outfile_prefixstr

The name of the recon file which is produced after the run is processed.

Attributes:
name, module_prefix, func, outfile, event_data_needed, outfile_prefix, kwargssee Parameters

PICOcode.ProcessingScripts.process_utilities module

build_or_rebuild_code(source_dir, build_dir)

Compile C++ code from source_dir in build_dir. The build_dir is created if it does not exist, but it will not be cleaned beforehand.

Returns:
None
Raises:
subprocess.CalledProcessError

If either of the build steps (cmake, make) fails.

check_dir(directory)

Corresponds to get_dir_arg from COUPP.pm

check_tar_and_extract(archive_dir, data_dir, tarfile_name, logname=None)

Checks if the contents of the given tar file exist in the data directory.

If the data does not exist, then extract it to data_dir.

Parameters:
archive_dirstr

Location of the run archives.

data_dirstr

Where the extracted data should (or will) exist.

tarfile_namestr

The base name of the archive to extract.

lognameNone or str

The name of the log to use. If logname = None, skip logging.

daq_version(datadir, run)
get_data_series(datadir)

Corresponds to DataSeries from COUPP.pm

setup_logger(name, log_file, level=10, formatter=None, mode='w')

Make a call to logging.getLogger, and set up standard formatting.

The logger is set up to write to a log file. One call to this function is made by process_runs() and process_single_run() each. Loggers may be inherited from these during processing by calling logging.getLogger, with the same name as used to initialize this logger, with .newname appended. See Examples below.

Parameters:
namestr

Name of the log, passed to logging.getLogger.

log_filestr

Full path to the file in which to log.

levelint or logging level

The logging level to use. Messages with logging level less than this value are ignored.

formatterNone or str, default=None

How the message is formatted. If formatter = None (default), use a predefined format.

mode{‘w’, ‘a’}

Overwrite (‘w’) any existing log which may exist at log_file, or append to it (‘a’) if one exists.

See also

logging

For information on formatting, log levels, etc.

Examples

>>> run_level_log = setup_logger("20200721_0", "./test.log") # example call from process_single_run
>>> run_level_log.info("Starting processing on 20200721_0") # writes this message to "test.log"
>>> event_level_log = logging.getLogger("20200721_0.0") # inherited log
submit_slurm_job(run, scratch_dir, output_dir, log_dir, lock_file, data_series, *args, archive_dir='', picocode_tarball='', config_path='')

Submit a slurm job to process a run.

Creates a temporary file (using python tempfile) and writes a Slurm job script to it, which runs process_single_run() as its own job. Called by process_runs().

Parameters:
runstr

The run ID.

scratch_dirstr

The location where the run data is stored.

output_dirstr

Where to save the processed data.

log_dirstr

Where to log info about processing.

lock_filestr

Semaphore for processing.

data_seriesstr

Name of data series. E.g. 40l-19

*args

Processing tasks.

Returns:
str or None

The Slurm job ID associated with the running job. If the job ID cannot be determined, returns None. This indicates an issue with job submission, in which case this processing task will be resubmitted.

write_banner(text)

Formatted log output

PICOcode.ProcessingScripts.skip_runs module

PICOcode.ProcessingScripts.skip_runs_40l_19 module

PICOcode.ProcessingScripts.skip_runs_40l_22 module

Module contents