PICOcode.DataHandling package

Submodules

PICOcode.DataHandling.BinaryData module

class BinaryData(filename=None)

Bases: object

Load the data from a binary recon file.

Currently does not actually read in the data. The two use cases for binary recon files is the fastDAQ file generated by events, and the binary lookup table for 3D reconstruction. FastDAQ file reading is done manually in the Event class, so this class is only used as the parent class of BinaryLUT, which accesses data. This class should be altered to read binary recon data, and used in Event to read fastDAQ data.

Parameters:
filenamestr

The path to the binary recon file.

Attributes:
filenamestr

The path to the binary recon file.

number_linesint

The number of rows in the recon file.

header_bytesint

The number of bytes in the header.

endiannessstr

The endianness of the recon file.

data_size = {'char': 1, 'double': 4, 'float128': 16, 'int16': 2, 'int32': 4, 'int64': 8, 'int8': 1, 'single': 2, 'uint16': 2, 'uint32': 4, 'uint64': 8, 'uint8': 1}
end_of_file()

Tells whether the EOF has been reached.

The binary recon file format supports concatenating multiple files together, with the header of one block immediately following the final data line of the previous block. In this case, there is no way to tell prior to finding the EOF whether all blocks have been read from the file.

Parameters:
None
Returns:
bool

True if read() returns an empty binary string, False otherwise (and resets the reading pointer).

format_map = {'d': 'int32', 'e': 'double', 'f': 'double', 's': 'char'}
read_data()

Read the binary recon file.

Loads data into memory for a standard PICO recon file in binary format. See docdb 555 for the format of this file.

read_header()

Parse the header of a binary recon file and store the information.

This function is called if filename is specified in the constructor, otherwise SetFile must be called prior to this function.

As multiple recon files may be concatenated, this may be called several times per file.

set_binary_file(binary_file)

Set the binary_file attribute, which stores the open instance of a binary file.

This instance must have the read() function. The two methods of creating such an intance is either using the open() python call on a file, or by creating an io.BytesIO() instance which treats a binary string as an open file instance. The functionality of the former is available by passing a file path to :func:SetFileName().

Parameters:
binary_fileio.BytesIO, file object

An instance of io.BytesIO or file object, with the read() method available.

set_file_name(filename)
write_binary_reconfile(var_list, fmt_list, recon_type, data, mode='w', logname=None)

PICOcode.DataHandling.BinaryLUT module

class BinaryLUT(filename)

Bases: BinaryData

Load the data from a lookup table (LUT) in binary recon file format..

The reason this is separate from the BinaryData class is that most recon files are small enough to quickly read into memory (<10 MB). Binary LUTs are >300 MB, and would take a long time to load into memory. Instead of opening the file and loading the data into memory, the file is kept open and traversed using seek() and reading a certain number of bytes.

Parameters:
filenamestr

The path to the binary recon file.

Attributes:
filenamestr

The path to the binary recon file.

number_linesint

The number of rows in the recon file.

header_bytesint

The number of bytes in the header.

endiannessstr

The endianness of the recon file.

LUT_index(cam, rotatedV, rotatedH, R3Point, InOut)

Determine where in the LUT the relevant data is stored.

The LUT is essentially an array with shape (4, image_width, image_height, 3, 2), which represent the camera, image width, image height, direction (X, Y, or Z), and starting/ending point of a ray from the ray tracing program. This data is stored in a contigous block.

Parameters:
cam{0, 1, 2, 3}

Which camera was used.

rotatedV, rotatedHint

The vertical and horizontal position of the pixel in question. The ‘rotated’ name indicates that the coordinates should have the top of the jar as the vertical direction (which is rotated from how the pictures are taken, with the top of the jar in the left of the image.

R3Point{0, 1, 2}

Which coordinate of the ray (X, Y, or Z) is being accessed. 0 = X, 1 = Y, 2 = Z.

InOut{0, 1}

The rays are defined within the jar as their position on the jar surface where they enter or exit the jar. “In” (0) means the initial ray position, “Out” (1) means the final ray position.

Returns:
int

The position in the LUT file to read the data from.

read_data(LUTposition)

Read data from the LUT.

Parameters:
LUTpositionint

The position in the LUT file to read the data from. Produced by LUT_index().

Returns:
float

The position information for the ray start or end point. See LUT_index() for a description of how to interpret this data.

PICOcode.DataHandling.GetEvent module

class Event(rundirectory, eventnum, *args, data_series=None, no_err=True, verbose=True)

Bases: object

Load raw data from a PICO event.

Data is loaded from files in rundirectory and/or rundirectory/eventnum. Which data is loaded depends on the loadoptions argument. Each property has at minimum the loaded attribute, which indicates whether it was successfully loaded.

Parameters:
rundirectorystr

May be either the run ID if the run data is in a known location (pointed to by the PICOcode.conf file), or the full path to the run data.

eventnumint or str

The event number for which data is being requested.

*argstuple of str

Comma-separated optional arguments which specifies the data being requested. Minimum one required. May include any of the following:

  • event

  • fastDAQ

  • slowDAQ

  • temperature

  • PLC

  • DAQsettings

  • rundata

  • camdata

  • images

data_series{“40l-19-data”, “30l-16-data”, “2l-16-data”}

The name of the data series. Used when searching for data when rundirectory is not a full path.

no_errbool, default=False

Whether to suppress errors. Mainly useful during processing, when an error may disrupt all other processing.

verbosebool, default=True

Whether to print extra information when loading data.

Examples

>>> event = Event("20200721_0", 22, 'fastDAQ', 'slowDAQ')
# The following accomplishes the same thing.
>>> event = Event("20200721_0", 22, ['fastDAQ', 'slowDAQ'])
>>> event.slowDAQ.loaded
True
>>> event.fastDAQ.loaded
True
>>> event.PLC.loaded
False
>>> import matplotlib.pyplot as plt
>>> plt.plot( event.slowDAQ.elapsed_time, event.slowDAQ.PT4 )
>>> plt.show()
files/Example_Plotting_PT4.png
Attributes:
runIDstr

The name of the run.

eventnumint

The event number within the run.

rundirectorystr

The directory in which the run data was found. Several locations are checked.

data_seriesstr

Name of the data series. Should be of the form ‘30l-16-data’ or ‘40l-19-data’.

eventProperty

Info about the event, e.g. livetime, pressure setpoint, etc.

fastDAQProperty

Data collected by the fastDAQ system, contained in fastDAQ_?.bin files. Includes piezo and Dytran signals.

slowDAQProperty

Data collected by the pressure cart PLC, contained in the slowDAQ.txt file. Includes signals from pressure and position transducers.

temperatureProperty

Data collected by the temperature PLC, contained in the temperature.txt file. Includes data from the RTDs and chillers.

PLCProperty

Data from the pressure cart PLC, which has been logged by the main DAQ via modbus. The logging here is slower than that recorded by the slowDAQ attribute.

DAQsettingsProperty

Reads the DAQ30l_Setup.xml file, which contains data about the settings for the run, e.g. run type, compression time between events, etc.

rundataProperty

Reads the runID.txt file in the data directory, which contains data about the run.

camdataProperty

Currently unimplemented.

imagesProperty

Load images into memory.

open_run_archive(extension='tar')

Decompress an archived run.

The archive directory is found from the PICOcode.conf file (defaulting to the project directory on Graham), and the data is placed in the “scratch” directory (by default, this is ~/scratch/<data_series>/).

Parameters:
extensionstr, default=”tar”

The extension of the archived file. Only currently handles .tar archives. The tarfile module might handle everything anyway…

Returns:
None
exception LoadoptError(message, loadoption)

Bases: Exception

Custom exception class for failure to load a loadopt (not including a missing file, which raises a FileNotFoundError)

class Property(fields)

Bases: object

Wrapper class for the attributes loaded by Event.

Parameters:
fieldslist of str

Not sure why this is here. Attributes are added on the fly.

PICOcode.DataHandling.LoadReconFile module

LoadReconFile(reconfilepath='master', verbose=False, use_pickle=True)

Calls ReconFile.

Included for those who are used to typing LoadReconFile.

Parameters:
reconfilepathstring, default=”current”

Path to the file containing the reconstructed data. Default points to the location of the current merged_all.txt file.

verboseboolean, default=True

Print timing information after loading. Will likely expand this in the future to print more information.

use_pickleboolean, default=True

If available, load the data from a serialized file. This greatly reduces load time if available.

Returns:
ReconFile

An instance of ReconFile. Data is stored in numpy arrays.

Notes

The string “devel” may be passed as the reconfile parameter, in which case the path points to “/project/6007972/pico/recon/current/40l-19/output/merged_all_all.txt”.

class ReconFile(reconfilepath=None, verbose=False, use_pickle=True)

Bases: object

Load reconstructed data from file located at reconfilepath.

The object returned has fields corresponding to the headers in the reconfile. Data is converted and stored in numpy arrays.

Parameters:
reconfilepathstring, optional

Path to the file containing the reconstructed data. The file should end in .txt or .pickle, otherwise it will try to be loaded as an ASCII file. If not present, :func:ParseData must be used to interpret data from a file.

verboseboolean, default=True

Print timing information after loading. Will likely expand this in the future to print more information.

use_pickle: boolean, default=True

If available, load the data from a serialized file. This greatly reduces load time if available.

Notes

If the reconfile does not end in .pickle but a pickled file is located in the same directory as the .txt file, then the text file is read and hashed, and compared to a hash stored in the .pickle file. If they match, the pickle file is used.

Data is stored in numpy arrays.

Attributes:
The attributes are set by the contents of the recon file.
ParseData(rawdata)

Interpret the raw data from an ASCII recon file.

Separated from ReadAsciiFile so that zip files can ge read from directly, as Python < 3.8 does not support reading from zip files using path-like objects, but the contents of a file can be read.

Parameters:
rawdataio.StringIO, file object

Raw data as an instance of io.StringIO. io.StringIO acts as though it were an open file object, supporting read() operations. Can also be passed as a file object, i.e. what is returned using open(file).

Returns:
None
ReadAsciiFile(reconfilepath)

Read reconstructed data from a text file.

For how the data is stored in the reconfile, see docDB 297.

Reads the file using numpy structured arrays.

Parameters:
reconfilepathstring

Path to the file containing the reconstructed data. The file should end in .txt, but ReconFile defaults to this function if the extension is not recognized.

Returns:
None
ReadRootFile(reconfilepath)

OBSOLETE

Read reconstructed data from a ROOT TFile, and fill fields of numpy arrays containing the data.

First attempt at a TFile reader, and runs incredibly slowly.

There are currently two major problems: - TBranches containing arrays of char* either weren’t saved

correctly originally, or are not being read correctly. E.g. there are two TBranches that should each contain two strings, but upon loading them only the first ever appears to be returned. This may be due to them being null-terminated strings, which numpy complains about. Currently, text fields are ignored (aside from the “run” field).

  • Multidimensional arrays were flattened before they were saved, so the original dimensions are unknown. They could be compared to the header of the .txt reconfile, which contains the dimensions of each field, but that somewhat defeats the purpose of having a TFile reader. Currently these arrays are read as 1D arrays.

Input:
reconfilepath: string

Path to the file containing the reconstructed data. The file should end in .root.

Output:

Fills this object with fields containing data from the reconfile. Each field is constructed from a TBranch.

ReadSerializedData()

Read reconstructed data from a prevously saved serialized file.

Greatly reduces load time compared to reading the raw ASCII files.

Returns:
bool

True if the file was successfully read, False otherwise.

SaveSerializedData(savepath='')

Serialize this object, and save it to a python pickle file.

If no path is given, saves to the same directory as the reconfile was loaded from. Additionally saves a hash of the text file used to create this recon file, so that it can be compared in the future.

Parameters:
savepathstring, default=””

Full path to the location to save the serialized data. If the string is empty, the reconpath is used with the extension changed to .pickle.

Returns:
None
StripMetadata()

Remove metadata from recon file.

Intended to be used before copying the contents of this recon file to another, so that metadata isn’t copied.

PICOcode.DataHandling.Loaders module

class Loader(path, ev, scratch_path='')

Bases: object

Parent class of loaders.

Parameters:
pathstr

The full path to the event directory.

evint

The event from which data is loaded.

get_camdata_data()
get_daqsettings_data()
get_event_data()
get_fastdaq_cal_data()
get_fastdaq_data()
get_plc_data()
get_rundata_data()
get_slowdaq_data()
get_temperature_data()
class RawLoader(path, ev, scratch_path='')

Bases: Loader

Reads the requested data from an unarchived run.

Returns the requested data as a list of strings or a bytes string.

get_daqsettings_data()
get_event_data()
get_fastdaq_cal_data()
get_fastdaq_data()
get_image_data()
get_plc_data()
get_rundata_data(runID)
get_slowdaq_data()
get_temperature_data()
class TarLoader(path, ev, scratch_path)

Bases: RawLoader

class ZipLoader(path, ev, scratch_path)

Bases: Loader

Loads data from a zip file and returns it.

get_daqsettings_data()
get_event_data()
get_fastdaq_cal_data()
get_fastdaq_data()
get_image_data()
get_plc_data()
get_recon_file(recon_file)

Retrieve arbitrary recon file from the zip file.

get_rundata_data(runID)
get_slowdaq_data()

Get slowDAQ data from a zip file.

All slowDAQ files have the form “slowDAQ_0.txt” in the event folder. This is also true in the zip file, so we can map a regex search to the list of files in the zip file to find the slowDAQ_0.txt file. This removes the need to know which specific run we’ve opened (as the directory structure in the zip file would require that we know the run to open “<run>/<ev>/slowDAQ_0.txt”).

get_temperature_data()
matcher = re.compile('^.*fastDAQ_\\d.bin$')
get_loader(path)

PICOcode.DataHandling.Utilities module

cast_array(array, fmtString)

Convert a numpy array data type based on a C-style format string.

Parameters:
arrayarray_like

An array to be converted to a different format. All elements must be of the same type.

fmtString{“%s”, “%d”, “%i”, “%f”, “%e”}

A C-style format string.

Returns:
array

Array converted to the dtype indicated by fmtString.

Raises:
ValueError

If the fmtString cannot be interpretted.

cast_value(value, fmtString)

Convert a single value to a different data type based on a C-style format string.

Parameters:
valuestr, int, or float

The value to cast to a different type.

fmtString{“%s”, “%d”, “%f”, “%e”}

A C-style format string.

Returns:
str, int, or float

The value cast to the data type indicated by ‘fmtString’.

cast_value_mappable(args)

Apply cast_value() to all values in a 2 x n dimension list. Usable with python’s built-in map.

get_default_config()

Return the default config, filled with default values, and parsed as a SimpleNamespace object.

Returns:
types.SimpleNamespace
get_fmt_str(value, precision=5, use_str_precision=False)

Produce a C-style format string for the type of the first input parameter.

Parameters:
value{int, str, float, np.float16, np.float32, np.float64, np.int8, np.int16, np.int32, np.int64}

The value for which a C-style format string is needed.

precisionint, default=5

The precision (for floats) or length (for strings if *use_str_precision*=True) of the C-style format string.

use_str_precisionbool, devault=False

Whether to use the precision as the length for string output. If False, returns “%s” for strings.

get_type(fmtString)

Return the numpy dtype to use for an array from a C-like format string.

This is mainly useful for enforcing the length of the runID str in a recon file.

Parameters:
fmtString{“%s”, “%d”, “%i”, “%f”, “%e”}

A C-style format string.

Returns:
type

The dtype of the array required to store the data type of fmtString.

get_type_root(fmtString)

Return the Python type from a C-like format string

make_default_config(config_data={}, config_path=None)

Create .PICOcode.conf file. This file tells the Event class where to look for archived run files, and where to decompress them.

On Windows, this is located at C:Users<user>Documents.PICOcode.conf

On Linux, this is located at /home/<user>/.PICOcode.conf

on MacOS, this is located at /Users/<user>/.PICOcode.conf

Returns:
Default section of the config parser.
parse(d)

Converts a dictionary to a SimpleNamespace recursively.

stolen from https://stackoverflow.com/questions/66208077/how-to-convert-a-nested-python-dictionary-into-a-simple-namespace

prep_regex(fmtStrings)

Prepare a regular expression from the formatstring line in a recon file.

This regular expression is used to parse each line of an ASCII recon file.

Parameters:
fmtStringsstr

A string containing space-separated C-style format strings from a recon file, which describes the types of each variable contained in the file.

Returns:
regexstr

A regular expression which can be used with python’s built-in regex module to parse the lines of a recon file.

See also

re
read_config(config_path=None)

Read the config file if it exists, or create it with default values.

On Windows, this is located at C:\Users\<user>\Documents\.PICOcode.conf

On Linux, this is located at /home/<user>/.PICOcode.conf

on MacOS, this is located at /Users/<user>/.PICOcode.conf

The config file contains the location of run archives (if present on the system), and the scratch directory (where run data is located).

Returns:
configdict

The parsed config file

update_config(config)

If a PICOcode configuration file exists but is missing entries due to an update of PICOcode, this should update the config file without overwriting the existing entries.

Parameters:
config: SimpleNamespace

Existing configuration file

Returns:
None
verify_config(config)

Check that the configuration file has all the entries of the default config.

Parameters:
config: types.SimpleNamespace

An instance of an existing PICOcode config

Returns:
bool

Whether the existing config has all entries

PICOcode.DataHandling.XMLhelper module

XML2dict(parsed_xml)

Convert the output of XMLparser to a dictionary.

Parameters:
parsed_xmlxml.etree.ElementTree

The parsed XML data as retrieved by XMLParser().

Returns:
dict

The XML data as a dictionary.

XMLparser(xmlPath, asDict=False)

Read the XML file from a PICO event.

The data is returned as either the base of an XML file as an ElementTree object or a dictionary.

Parameters:
xmlPathstring

Path to the XML file.

asDictbool, default=False

If True, returns a dictionary, the keys of which are the roots and tags from the XML document.

Returns:
list or dict

The contents of the XML file. The type depends on the asDict parameter.

Notes

The format for DAQ30l_Setup.xml does not conform to XML document standards; there are multiple root tags, whereas an XML document may have only one. To get around this, an extra tag is added at the beginning and end of the file called “DAQxml”, and all other tags are children of this root. Additionally, the comments must be removed before parsing as they are non-standard and confuse Python’s built-in XML parser.

PICOcode.DataHandling.config module

class Config(*args, **kwargs)

Bases: dict

Singleton for the config file.

read_from(path)
write_to(path)
class Singleton

Bases: type

PICOcode.DataHandling.write_reconfile module

class ReconFileWriter(path, access_mode='w', recon_mode='ascii')

Bases: object

Attributes:
path
mode
headerstr

First line of the recon file containing a description of the file.

recontype(int, str)

An integer between 1 and 8. See docdb 297 for recon types.

variablesdict of {str: str}

Mapping from variable names to formats. The variable names present in the recon file are obtained from the keys.

datalist of [dict of {str: np.ndarray}]

List containing dicts with the mapping from variable names to data. Data may be scalar (int, float, str) or an array-like object with the shape() property.

add_data(data)
set_header(header, variables, recontype)
write_ascii()
check_format(header, var_list, fmt_list, data, log)

Ensure the variable list, format list, and data have the same shapes.

Parameters:
headerstr

The one-line header to be written to the first line of the reco file. Not currently checked.

var_list, fmt_listlist of str

The variable and format lists from write_reconfile.

dataarray

The array containing the data for the recon file.

loglogging.Logger()

The log used by write_reconfile.

Notes

Currently only checks whether the data shape and fmt_list length are the same. The var_list length will be implemented in the future, but the presence of arrays makes this a little more challenging (but not very challening, I’m just lazy…)

write_reconfile(path, header, var_list, fmt_list, recon_type, data, mode='a', logname=None)

Write a reconstructed file.

All processed PICO data is stored in text-based “recon files”. This function writes one of those files based.

Parameters:
pathstr

The full path where the recon file will be saved.

headerstr

Any single-line string to write to the first line of the recon file.

var_listlist or str

A list of strings (or space-separated string, which is split into a list) of the name of each variable in the recon file.

fmt_listlist or str

A list of strings (or space-separated string, which is split into a list) of C-style format strings which indicate each variables’ type. If a variable is an array, one format string should be present for each entry in the array.

recon_typeint or str

An integer specifying the type of recon file.

dataarray of object

The data to write to the recon file. The shape of the array should be (N, len(fmt_list)), where N will be the number of rows in the written file.

mode{“a”, “w”}

The mode with which to open the written file. The default is “a”, which writes a new file or appends to an existing one; “w” will overwrite an existing file.

lognamestr, default=None

The name of the logger to use. Will not log if logname = None.

Returns:
None

Notes

  • See COUPP docDB 297 For a description of the text reconstructed file format.

  • The header may contain any characters (except newlines) and may be any length.

  • Variable names in var_list must not have spaces.

  • The variables writted to a file may be a single (0D) entry, or an array with up to three dimensions. An array’s entry in var_list must have the dimensions in the variable name, and not separated from the variable name, with the dimensions comma-separated, in parentheses. E.g. a length-7 1D array might look like myVar(7); a 3x2 array would look like myVar(3,2).

  • The entries for an array in fmt_list must have the same number of entries as the array’s size. E.g. for myVar(3), which is a float, the format list would have [“%f”, “%f”, “%f”]; myVar(2,2) would have four entries.

  • The recon_type is a number from 0 to 8, which specifies what each line in the recon file represents.

  • Each row in data should have the same shape as fmt_list.

Module contents