PICOcode.ProcessingScripts package
Submodules
PICOcode.ProcessingScripts.concat_files module
- concat_files(output_file, files, logname='process_runs', file_names=None)
Concatenate recon files of the same data type into a single recon file.
The recon files being concatenated must contain all the same type of data, including variable names and format list lengths.
- Parameters:
- output_filestr
The name of the file in which the concatenated data is saved.
- fileslist of (str or io.StringIO)
The file to concatenate into output_file.
- lognamestr, default=process_runs
The name of the log for logging. Defaults to the same log as
process_runs()
.
- Returns:
- None
- Raises:
- IOError
If there is a mismatch between the variable names or recon file type between any two files being concatenated.
See also
PICOcode.ProcessingScripts.merge_files.merge_files
For merging several recon files of separate types into one file.
PICOcode.ProcessingScripts.git_updater module
- class GitUpdater(path='', branch='')
Bases:
object
- archive(path)
Wrapper around Repo.archive
- check_commits()
Check if remote or local branch is more up to date.
- Returns:
- int: (-1, 0, 1)
0 if they are equal, -1 (1) if local (remote) is newer
- fetch()
- get_commit_hash()
- pull()
Perform a git pull.
- switch_branch(branch)
Check that the requested branch exists, then switch to it.
- Parameters:
- branch: str
Name of the branch to switch the repository to.
- Returns:
- None
PICOcode.ProcessingScripts.merge_files module
- all_match(alist, indices)
Returns the first two elements of alist whcih match.
Used to determine which lines to merge in
merge_files()
.- Parameters:
- alistlist of str or int
A list containing data to match.
- indiceslist of int
Indices of alist.
- Returns:
- None
If all entries in alist are empty strings.
- str or int
The element of alist which was found to match.
- -9
The error code for no matching elements.
Notes
I can’t remember what this actually does, I’ll figure this out and update this later. -Colin
- merge_files(output_file, input_files, data_series='40l-19', logname='', num_event_dirs=-1)
Merge several recon files of differing data types into one recon file.
- Parameters:
- output_filestr
Full path to the location to save the output recon file.
- input_fileslist of str
Full paths to the recon file to merge.
- num_event_dirsint
Number of directories in the run folder. Must be specified for zipped run files.
- lognamestr, default=””
Name of the log. If an empty string (default), skip logging.
- Returns:
- None
See also
PICOcode.ProcessingScripts.concat_files.concat_files
For concatenating several recon files of the same data types into one file.
Notes
Different recon types (e.g. types 1 and 2, for one line per event and one line per bubble, respectively) may be merged if compatible. Some may not, e.g. recon type 8 (one line per camera hit, produced by AutoBub) is incompatible with types 1 and 2.
- num_missing_DAQ_file_lines(input_file)
- num_too_many_DAQ_file_lines(input_file)
PICOcode.ProcessingScripts.process_functions module
- dataset_acoustic_l2(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_concatenate_files(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_concatenate_files_from_zip(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_convert_to_root(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_dytran_l2(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_full_merge(data_dir, output_dir, log_dir, data_series, **kwargs)
Find the latest merged file and merge it with anything that has not yet been merged.
- dataset_handscanning(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_pickle(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_processor(processor, data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_prune_merge(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_run_summary(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_scan_missing_files(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_submit_grid_jobs(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_wait_for_grid_jobs(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_write_constants(data_dir, output_dir, log_dir, data_series, **kwargs)
- dataset_zip_and_clean(data_dir, output_dir, log_dir, data_series, **kwargs)
Zip output directories which have had processing completed. Saves on number of files.
- Returns:
- None
PICOcode.ProcessingScripts.process_runs module
PICOcode.ProcessingScripts.process_single_run module
- check_processing(run, outfile_list, data_dir, output_dir, log_dir)
Check the processing output for completion following
process_single_run()
.Check the standard output from each of the processing modules. Results are written to the run directory, in a file called “processing_successful” if all checks pass, or “processing_fail” if one or more checks fail.
- Parameters:
- runstr
The run which has been processed.
- outfile_listlist of str
The list of files which should exist in output_dir.
- data_dir, output_dir, log_dirstr
Data, output, and logging directories.
- Returns:
- str
Contains a description of what went wrong.
- listener(queue, outfile, run)
Listens for messages on a queue, and writes a recon file.
Each function in AnalysisModules terminates after passing the processed data for one event to the queue. This data is collected here, and written to a new recon file once all events in a run are complete.
- Parameters:
- queuemultiprocessing.Manager.Queue
An instance of Queue from the multiprocessing.Manager class.
- outfilestr
The name of the recon file which will be written
- runstr
The run being processed. Only used for logging purposes.
- Returns:
- None
- process_single_event(run, event, data_dir, output_dir, log_dir, data_series, processor_list, event_is_valid)
Process a single event.
This function is called using a multiprocessing pool from
process_single_run()
.- Parameters:
- runstr
The run name. Mainly used at this point for logging, and preparing the data to write to the recon file.
- event
Event
An instance of
Event
. Load options for this instance depends on the elements of processor_list.- data_dir, output_dir, logging_dirstr
Data, output, and logging directories. See
process_single_run()
for a description.- data_series{“49l-19-data”, “30l-16-data”, “2l-16-data”}
The data series.
- processor_listlist of
EventProcessor
A list of processing tasks to apply to this event.
- event_is_valid: array_like of bool
Flag for whether to actually try to load an event. Some events are corrupt and hang when loading the Event() class, so skip them.
- Returns:
- proc
ProcessedEvent
A populated instance of
ProcessedEvent
.
- proc
- process_single_run(run, data_dir, output_dir, log_dir, lock_file, data_series, *args, archive_dir='', **kwargs)
Process a single run for PICO-40L.
Which processing functions are called depends on *args.
- Parameters:
- runstr
Which run to process.
- data_dirstr
Where the unprocessed data is stored. Actual data is in data_dir/run.
- output_dirstr
Where the processed data is saved. Output is written to output_dir/run.
- log_dirstr
Where to write the log files. Logs are saved to log_dir/run.
- lock_filestr
A semaphore which indicates whether processing is still running. This file is deleted at the end of this function.
- data_seriesstr
Name of data series. E.g. 40l-19
- *args
The arguments passed from
process_runs()
. These arguments are formatted as though they are command line arguments, and are parsed by parse_args.- archive_dirstr, default=””
If the run data is archived, extract the archive from here to data_dir.
Notes
Generally, each event in a run is processed in parallel. Some processing can only be applied to an entire run (e.g. if the processing is in C++ and is called using subprocessing, such as AutoBub3hs and XYZFixL2). Parallel processing is achieved using the multiprocessing module.
- run_handscan(processor, data_dir, output_dir, log_dir, data_series, run, event)
OBSOLETE
- run_processor(processor, data_dir, output_dir, log_dir, data_series, run, event)
Master function to call each processing function.
OBSOLETE
PICOcode.ProcessingScripts.process_single_run_rewrite module
- class ProcessConfig(pArgs=None)
Bases:
SimpleNamespace
Store the contents of the config
- check_valid_config()
- class ProcessSingleRun(config)
Bases:
object
Process a single run by initiating processing of each individual event.
- Parameters:
- configProcessConfig
Instance of the
ProcessConfig
class, with all required attributes set (run, ev, data_dir, etc.).
PICOcode.ProcessingScripts.process_tasks module
The functions used in process_single_run()
are defined in this file.
- class EventProcessor(name, module_prefix, func, outfile, event_data_needed, outfile_prefix, kwargs={})
Bases:
object
Wrapper class for processing a single event.
Wrapper class for storing the information with which functions in the
AnalysisModules
package are called, for single event processing.- Parameters:
- namestr
Name of the processing task. Only used in logging.
- module_prefixstr
The beginning of the file name of the module in
AnalysisModules
. The data series is appended inprocess_single_run()
to get the full module name, from which func is imported.- funcstr
The name of the function imported from the module defined by module_prefix. Imported using importlib.
- outfilestr
Unused. Remnant from COUPPcode.
- event_data_neededlist of str
The data required for completing the processing task. This is passed to
Event
.- outfile_prefixstr
The name of the recon file which is produced after the run is processed.
- kwargsdict
A dict containing any other data required for processing.
- Attributes:
- name, module_prefix, func, outfile, event_data_needed, outfile_prefix, kwargssee Parameters
- queue{None, multiprocessing.Manager.Queue}
The Queue instance used to send data to
listener()
.
- class RunProcessor(name, module_prefix, func, outfile_prefix)
Bases:
object
Wrapper class for processing an entire run.
Wrapper class for storing the information with which functions in the
AnalysisModules
package are called, for processing which is applied to an entire run at a time. Examples are AutuBub3hs and XYZFixL2, which are C++ programs which process an entire run at a time, which are called using subprocess.- Parameters:
- namestr
Name of the processing task. Only used in logging.
- module_prefixstr
The beginning of the file name of the module in
AnalysisModules
. The data series is appended inprocess_single_run()
to get the full module name, from which func is imported.- funcstr
The name of the function imported from the module defined by module_prefix. Imported using importlib.
- outfile_prefixstr
The name of the recon file which is produced after the run is processed.
- Attributes:
- name, module_prefix, func, outfile, event_data_needed, outfile_prefix, kwargssee Parameters
PICOcode.ProcessingScripts.process_utilities module
- build_or_rebuild_code(source_dir, build_dir)
Compile C++ code from source_dir in build_dir. The build_dir is created if it does not exist, but it will not be cleaned beforehand.
- Returns:
- None
- Raises:
- subprocess.CalledProcessError
If either of the build steps (cmake, make) fails.
- check_dir(directory)
Corresponds to get_dir_arg from COUPP.pm
- check_tar_and_extract(archive_dir, data_dir, tarfile_name, logname=None)
Checks if the contents of the given tar file exist in the data directory.
If the data does not exist, then extract it to data_dir.
- Parameters:
- archive_dirstr
Location of the run archives.
- data_dirstr
Where the extracted data should (or will) exist.
- tarfile_namestr
The base name of the archive to extract.
- lognameNone or str
The name of the log to use. If logname = None, skip logging.
- daq_version(datadir, run)
- get_data_series(datadir)
Corresponds to DataSeries from COUPP.pm
- setup_logger(name, log_file, level=10, formatter=None, mode='w')
Make a call to logging.getLogger, and set up standard formatting.
The logger is set up to write to a log file. One call to this function is made by
process_runs()
andprocess_single_run()
each. Loggers may be inherited from these during processing by calling logging.getLogger, with the same name as used to initialize this logger, with .newname appended. See Examples below.- Parameters:
- namestr
Name of the log, passed to logging.getLogger.
- log_filestr
Full path to the file in which to log.
- levelint or logging level
The logging level to use. Messages with logging level less than this value are ignored.
- formatterNone or str, default=None
How the message is formatted. If formatter = None (default), use a predefined format.
- mode{‘w’, ‘a’}
Overwrite (‘w’) any existing log which may exist at log_file, or append to it (‘a’) if one exists.
See also
logging
For information on formatting, log levels, etc.
Examples
>>> run_level_log = setup_logger("20200721_0", "./test.log") # example call from process_single_run >>> run_level_log.info("Starting processing on 20200721_0") # writes this message to "test.log" >>> event_level_log = logging.getLogger("20200721_0.0") # inherited log
- submit_slurm_job(run, scratch_dir, output_dir, log_dir, lock_file, data_series, *args, archive_dir='', picocode_tarball='', config_path='')
Submit a slurm job to process a run.
Creates a temporary file (using python tempfile) and writes a Slurm job script to it, which runs
process_single_run()
as its own job. Called byprocess_runs()
.- Parameters:
- runstr
The run ID.
- scratch_dirstr
The location where the run data is stored.
- output_dirstr
Where to save the processed data.
- log_dirstr
Where to log info about processing.
- lock_filestr
Semaphore for processing.
- data_seriesstr
Name of data series. E.g. 40l-19
- *args
Processing tasks.
- Returns:
- str or None
The Slurm job ID associated with the running job. If the job ID cannot be determined, returns None. This indicates an issue with job submission, in which case this processing task will be resubmitted.
- write_banner(text)
Formatted log output