Skip to content
/ cdflib Public
forked from MAVENSDC/cdflib

A python module for reading NASA's Common Data Format (cdf) files

Notifications You must be signed in to change notification settings

sfieux/cdflib

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a python script to read/write CDF files without needing to install the CDF NASA library.

You will need Python version 3, as well as the Numpy library to use this module.

To install, open up your terminal/command prompt, and type:

pip install cdflib

There are two different CDF classes: a cdf reader, and a cdf writer.

Currently, you cannot simultaneously read and write to the same file. Future implementations, however, will unify these two classes.

CDF Reader Class

To begin accessing the data within a CDF file, first create a new CDF class. This can be done with the following commands:

import cdflib

cdf_file = cdflib.CDF('/path/to/cdf_file.cdf')

Then, you can call various functions on the variable. For example:

x = cdf_file.varget("NameOfVariable", startrec = 0, endrec = 150)

This command will return all data inside of the variable "Variable1", from records 0 to 150. Below is a list of the 8 different functions you can call.

cdf_info()

Returns a dictionary that shows the basic CDF information. This information includes

varinq(variable)

Returns a dictionary that shows the basic variable information. This information includes

attinq( attribute = None)

Returns a python dictionary of attribute information. If no attribute is provided, a list of all attributes is printed.

attget( attribute = None, entry = None )

Returns the value of the attribute at the entry number provided. A variable name can be used instead of its corresponding entry number. A dictionary is returned with the following defined keys

['Item_Size'] the number of bytes for each entry value
['Num_Items'] total number of values extracted
['Data_Type'] the CDF data type
['Data'] retrieved attribute data as a scalar value, a numpy array or a string

varattsget(variable = None, expand = False)

Gets all variable attributes. Unlike attget, which returns a single attribute entry value, this function returns all of the variable attribute entries, in a dictionary (in the form of 'attribute': value pair) for a variable. If there is no entry found, None is returned. If no variable name is provided, a list of variables are printed. If expand is entered with non-False, then each entry's data type is also returned in a list form as [entry, 'CDF_xxxx']. For attributes without any entries, they will also return with None value.

globalattsget(expand = False)

Gets all global attributes. This function returns all of the global attribute entries, in a dictionary (in the form of 'attribute': {entry: value} pair) from a CDF. If there is no entry found, None is returned. If expand is entered with non-False, then each entry's data type is also returned in a list form as [entry, 'CDF_xxxx']. For attributes without any entries, they will also return with None value.

varget( variable = None, [epoch=None], [[starttime=None, endtime=None] | [startrec=0, endrec = None]], [,expand=True])

Returns the variable data. Variable can be entered either a name or a variable number. By default, it returns a 'numpy.ndarray' or 'list' class object, depending on the data type, with the variable data and its specification.

If "expand" is set as True, a dictionary is returned with the following defined keys for the output

['Rec_Ndim'] | the dimension number of each variable record
['Rec_Shape'] | the shape of the variable record dimensions
['Num_Records'] | the total number of records
['Records_Returned'] | the number of records retrieved
['Data_Type'] | the CDF data type
['Data'] | retrieved variable data
['Real_Records'] | Record numbers for real data for sparse record variable in list

By default, the full variable data is returned. To acquire only a portion of the data for a record-varying variable, either the time or record (0-based) range can be specified. 'epoch' can be used to specify which time variable this variable depends on and is to be searched for the time range. For the ISTP-compliant CDFs, the time variable will come from the attribute 'DEPEND_0' from this variable. The function will automatically search for it thus no need to specify 'epoch'. If either the start or end time is not specified, the possible minimum or maximum value for the specific epoch data type is assumed. If either the start or end record is not specified, the range starts at 0 or/and ends at the last of the written data.

The start (and end) time should be presented in a list as: [year month day hour minute second millisec] for CDF_EPOCH [year month day hour minute second millisec microsec nanosec picosec] for CDF_EPOCH16 [year month day hour minute second millisec microsec nanosec] for CDF_TIME_TT2000 If not enough time components are presented, only the last item can have the floating portion for the sub-time components.

Note: CDF's CDF_EPOCH16 data type uses 2 8-byte doubles for each data value. In Python, each value is presented as a complex or numpy.complex128.

epochrange( epoch, [starttime=None, endtime=None])

Get epoch range. Returns a list of the record numbers, representing the corresponding starting and ending records within the time range from the epoch data. A None is returned if there is no data either written or found in the time range.

getVersion ()

Shows the code version.

Sample use -

import cdflib swea_cdf_file = cdflib.CDF('/path/to/swea_file.cdf') swea_cdf_file.cdf_info() x = swea_cdf_file.varget('NameOfVariable') swea_cdf_file.close()

CDF Writer Class

CDF (path, cdf_spec=None, delete=False)

Creates an empty CDF file. path is the path name of the CDF (with or without .cdf extension). cdf_spec is the optional specification of the CDF file, in the form of a dictionary. The dictionary can have the following values:

['Majority'] 'row_major' or 'column_major', or its corresponding value. Default is 'column_major'.
['Encoding'] Data encoding scheme. See the CDF documentation about the valid values. Can be in string or its numeric corresponding value. Default is 'host'.
['Checksum'] Whether to set the data validation upon file creation. The default is False.
['rDim_sizes'] The dimensional sizes, applicable only to rVariables.
['Compressed'] Whether to compress the CDF at the file level. A value of 0-9 or True/False, the default is 0/False.

write_globalattrs (globalAttrs)

Writes the global attributes. globalAttrs is a dictionary that has global attribute name(s) and their value(s) pair(s). The value(s) is a dictionary of entry number and value pair(s). For example:

globalAttrs={}
globalAttrs['Global1']={0: 'Global Value 1'}
globalAttrs['Global2']={0: 'Global Value 2'}

For a non-string value, use a list with the value and its CDF data type. For example:

globalAttrs['Global3']={0: [12, 'cdf_int4']}
globalAttrs['Global4']={0: [12.34, 'cdf_double']}

If the data type is not provided, a corresponding CDF data type is assumed:

globalAttrs['Global3']={0: 12}     as 'cdf_int4'
globalAttrs['Global4']={0: 12.34}  as 'cdf_double'

CDF allows multi-values for non-string data for an attribute:

globalAttrs['Global5']={0: [[12.34,21.43], 'cdf_double']}

For multi-entries from a global variable, they should be presented in this form:

GA6={}
GA6[0]='abcd'
GA6[1]=[12, 'cdf_int2']
GA6[2]=[12.5, 'cdf_float']
GA6[3]=[[0,1,2], 'cdf_int8']
globalAttrs['Global6']=GA6
....
f.write_globalattrs(globalAttrs)

write_variableattrs (variableAttrs)

Writes a variable's attributes, provided the variable already exists. variableAttrs is a dictionary that has variable attribute name and its entry value pair(s). The entry value is also a dictionary of variable id and value pair(s). Variable id can be the variable name or its id number in the file. Use write_var function if the variable does not exist. For examples:

variableAttrs={}
entries_1={}
entries_1['var_name_1'] = 'abcd'
entries_1['var_name_2'] = [12, 'cdf_int4']
....
variableAttrs['attr_name_1']=entries_1
entries_2={}
entries_2['var_name_1'] = 'xyz'
entries_2['var_name_2'] = [[12, 34], 'cdf_int4']
....
variableAttrs['attr_name_2']=entries_2
....
....
f.write_variableattrs(variableAttrs)

write_var (var_spec, var_attrs=None, var_data=None)

Writes a variable, along with variable attributes and data. var_spec is a dictionary that contains the specifications of the variable. The required/optional keys for creating a variable:

Required keys:
['Variable'] The name of the variable
['Data_Type'] the CDF data type
['Num_Elements'] The number of elements. Always 1 the for numeric type. The char length for string type.
['Rec_Vary'] The dimensional sizes, applicable only to rVariables.
For zVariables:
['Dims_Sizes'] The dimensional sizes for zVariables only. Use [] for 0-dimension. Each and every dimension is varying for zVariables.
For rVariables:
['Dims_Vary'] The dimensional variances for rVariables only.
Optional keys:
['Var_Type'] Whether the variable is a zVariable or rVariable. Valid values: "zVariable" and "rVariable". The default is "zVariable".
['Sparse'] Whether the variable has sparse records. Valid values are "no_sparse", "pad_sparse", and "prev_sparse". The default is 'no_sparse'.
['Compress'] Set the gzip compression level (0 to 9), 0 for no compression. The default is to compress with level 6 (done only if the compressed data is less than the uncompressed data).
['Block_Factor'] The blocking factor, the number of records in a chunk when the variable is compressed.
['Pad'] The padded value (in bytes, numpy.ndarray or string)

var_attrs is a dictionary, with {attribute:value} pairs. The attribute is the name of a variable attribute. The value can have its data type specified for the numeric data. If not, based on Python's type, a corresponding CDF type is assumed: CDF_INT4 for int, CDF_DOUBLE for float, CDF_EPOCH16 for complex and and CDF_INT8 for long. For example:

var_attrs= { 'attr1':  'value1',
             'attr2':  12.45,
             'attr3':  [3,4,5],
              .....}
              -or-
var_attrs= { 'attr1':  'value1',
             'attr2':  [12.45, 'CDF_DOUBLE'],
             'attr3':  [[3,4,5], 'CDF_INT4'],
             .....
            }

var_data is the data for the variable. If the variable is a regular variable without sparse records, it must be in a single structure of bytes, or numpy.ndarray for numeric variable, or str or list of strs for string variable. If the variable has sparse records, var_data should be presented in a list/tuple with two elements, the first being a list/tuple that contains the physical record number(s), the second being the variable data in bytes, numpy.ndarray, or a list of strings. Variable data can have just physical records' data (with the same number of records as the first element) or have data from both physical records and virtual records (which with filled data). The var_data has the form:

[[rec_#1,rec_#2,rec_#3,...],
 [data_#1,data_#2,data_#3,...]]

See the sample for its setup.

getVersion()

Shows the code version and modified date.

Note: The attribute entry value for the CDF epoch data type, CDF_EPOCH, CDF_EPOCH16 or CDF_TIME_TT2000, can be presented in either a numeric form, or an encoded string form. For numeric, the CDF_EPOCH data is 8-byte float, CDF_EPOCH16 16-byte complex and CDF_TIME_TT2000 8-byte long. The default encoded string for the epoch `data should have this form:

CDF_EPOCH: 'dd-mon-year hh:mm:ss.mmm'
CDF_EPOCH16: 'dd-mon-year hh:mm:ss.mmm.uuu.nnn.ppp'
CDF_TIME_TT2000: 'year-mm-ddThh:mm:ss.mmmuuunnn'

where mon is a 3-character month.

Sample use -

Use a master CDF file as the template for creating a CDF. Both global and variable meta-data comes from the master CDF. Each variable's specification also is copied from the master CDF. Just fill the variable data to write a new CDF file:

import cdfwrite, cdfread, numpy as np
cdf_master = cdfread.CDF('/path/to/master_file.cdf')
if (cdf_master.file != None):
# Get the cdf's specification
info=cdf_master.cdf_info()
cdf_file=cdfwrite.CDF('/path/to/swea_file.cdf',cdf_spec=info,delete=True)
if (cdf_file.file == None):
    print('Problem writing file.... Stop')
    cdf_master.close()
    exit()
# Get the global attributes
globalaAttrs=cdf_master.globalattsget(expand=True)
# Write the global attributes
cdf_file.write_globalattrs(globalaAttrs)
zvars=info['zVariables']
print('no of zvars=',len(zvars))
# Loop thru all the zVariables
for x in range (0, len(zvars)):
    # Get the variable's specification
    varinfo=cdf_master.varinq(zvars[x])
    #print('Z =============>',x,': ', varinfo['Variable'])
    # Get the variable's attributes
    varattrs=cdf_master.varattsget(zvars[x], expand=True)
    if (varinfo['Sparse'].lower() == 'no_sparse'):
        # A variable with no sparse records... get the variable data
        vardata=.......
        # Create the zVariable, write out the attributes and data
        cdf_file.write_var(varinfo, var_attrs=varattrs, var_data=vardata)
    else:
        # A variable with sparse records...
        # data is in this form [physical_record_numbers, data_values]
        # physical_record_numbers (0-based) contains the real record
        # numbers. For example, a variable has only 3 physical records
        # at [0, 5, 10]:
        varrecs=[0,5,10]
        # data_values could contain only the physical records' data or
        # both the physical and virtual records' data.
        # For example, a float variable of 1-D with 3 elements with only
        # 3 physical records at [0,5,10]:
        # vardata = [[  5.55000000e+01, -1.00000002e+30,  6.65999985e+01],
        #            [  6.66659973e+02,  7.77770020e+02,  8.88880005e+02],
        #            [  2.00500000e+02,  2.10600006e+02,  2.20699997e+02]]
        # Or, with virtual record data embedded in the data:
        # vardata = [[  5.55000000e+01, -1.00000002e+30,  6.65999985e+01],
        #            [ -1.00000002e+30, -1.00000002e+30, -1.00000002e+30],
        #            [ -1.00000002e+30, -1.00000002e+30, -1.00000002e+30],
        #            [ -1.00000002e+30, -1.00000002e+30, -1.00000002e+30],
        #            [ -1.00000002e+30, -1.00000002e+30, -1.00000002e+30],
        #            [  6.66659973e+02,  7.77770020e+02,  8.88880005e+02],
        #            [ -1.00000002e+30, -1.00000002e+30, -1.00000002e+30],
        #            [ -1.00000002e+30, -1.00000002e+30, -1.00000002e+30],
        #            [ -1.00000002e+30, -1.00000002e+30, -1.00000002e+30],
        #            [ -1.00000002e+30, -1.00000002e+30, -1.00000002e+30],
        #            [  2.00500000e+02,  2.10600006e+02,  2.20699997e+02]]
        # Records 1, 2, 3, 4, 6, 7, 8, 9 are all virtual records with pad
        # data (variable defined with 'pad_sparse').
        vardata=np.asarray([.,.,.,..])
        # Create the zVariable, and optionally write out the attributes
        # and data
        cdf_file.write_var(varinfo, var_attrs=varattrs,
                           var_data=[varrecs,vardata])
   rvars=info['rVariables']
   print('no of rvars=',len(rvars))
   # Loop thru all the rVariables
   for x in range (0, len(rvars)):
       varinfo=cdf_master.varinq(rvars[x])
       print('R =============>',x,': ', varinfo['Variable'])
       varattrs=cdf_master.varattsget(rvars[x], expand=True)
       if (varinfo['Sparse'].lower() == 'no_sparse'):
           vardata=.......
           # Create the rVariable, write out the attributes and data
           cdf_file.write_var(varinfo, var_attrs=varattrs, var_data=vardata)
       else:
           varrecs=[.,.,.,..]
           vardata=np.asarray([.,.,.,..])
           cdf_file.write_var(varinfo, var_attrs=varattrs,
                              var_data=[vardata,vardata])
cdf_master.close()
cdf_file.close()

CDF Epochs

Importing cdflib also imports the module CDFepoch, which handles CDF-based epochs. The following functions can be used to convert back and forth between different ways to display the date. You can call these functions like so:

import cdflib

cdf_file = cdflib.cdfepoch.compute_epoch([2017,1,1,1,1,1,111])

There are three (3) epoch data types in CDF: CDF_EPOCH, CDF_EPOCH16 and CDF_TIME_TT2000.

  • CDF_EPOCH is milliseconds since Year 0.
  • CDF_EPOCH16 is picoseconds since Year 0.
  • CDF_TIME_TT2000 (TT2000 as short) is nanoseconds since J2000 with leap seconds.

CDF_EPOCH is a single double(as float in Python), CDF_EPOCH16 is 2-doubles (as complex in Python), and TT2000 is 8-byte integer (as int in Python). In Numpy, they are np.float64, np.complex128 and np.int64, respectively. All these epoch values can come from from CDF.varget function.

Five main functions are provided

encode (epochs, iso_8601=False)

Encodes the epoch(s) into UTC string(s).

For CDF_EPOCH:
The input should be either a float or list of floats (in numpy, a np.float64 or a np.ndarray of np.float64) Each epoch is encoded, by default to a ISO 8601 form: 2004-05-13T15:08:11.022 Or, if iso_8601 is set to False, 13-May-2004 15:08:11.022
For CDF_EPOCH16:
The input should be either a complex or list of complex(in numpy, a np.complex128 or a np.ndarray of np.complex128) Each epoch is encoded, by default to a ISO 8601 form: 2004-05-13T15:08:11.022033044055 Or, if iso_8601 is set to False, 13-May-2004 15:08:11.022.033.044.055
For TT2000:
The input should be either a int or list of ints (in numpy, a np.int64 or a np.ndarray of np.int64) Each epoch is encoded, by default to a ISO 8601 form: 2008-02-02T06:08:10.10.012014016 Or, if iso_8601 is set to False, 02-Feb-2008 06:08:10.012.014.016

unixtime (epochs, to_np=False)

Encodes the epoch(s) into seconds after 1970-01-01. Precision is only kept to the nearest microsecond.

If to_np is True, then the values will be returned in a numpy array.

breakdown (epochs, to_np=False)

Breaks down the epoch(s) into UTC components.

For CDF_EPOCH:
they are 7 date/time components: year, month, day, hour, minute, second, and millisecond
For CDF_EPOCH16:
they are 10 date/time components: year, month, day, hour, minute, second, and millisecond, microsecond, nanosecond, and picosecond.
For TT2000:
they are 9 date/time components: year, month, day, hour, minute, second, millisecond, microsecond, nanosecond.

Specify to_np to True, if the result should be in numpy array.

compute[_epoch/_epoch16/_tt200] (datetimes, to_np=False)

Computes the provided date/time components into CDF epoch value(s).

For CDF_EPOCH:
For computing into CDF_EPOCH value, each date/time elements should have exactly seven (7) components, as year, month, day, hour, minute, second and millisecond, in a list. For example: [[2017,1,1,1,1,1,111],[2017,2,2,2,2,2,222]] Or, call function compute_epoch directly, instead, with at least three (3) first (up to seven) components. The last component, if not the 7th, can be a float that can have a fraction of the unit.
For CDF_EPOCH16:
They should have exactly ten (10) components, as year, month, day, hour, minute, second, millisecond, microsecond, nanosecond and picosecond, in a list. For example: [[2017,1,1,1,1,1,123,456,789,999],[2017,2,2,2,2,2,987,654,321,999]] Or, call function compute_epoch directly, instead, with at least three (3) first (up to ten) components. The last component, if not the 10th, can be a float that can have a fraction of the unit.
For TT2000:
Each TT2000 typed date/time should have exactly nine (9) components, as year, month, day, hour, minute, second, millisecond, microsecond, and nanosecond, in a list. For example: [[2017,1,1,1,1,1,123,456,789],[2017,2,2,2,2,2,987,654,321]] Or, call function compute_tt2000 directly, instead, with at least three (3) first (up to nine) components. The last component, if not the 9th, can be a float that can have a fraction of the unit.

Specify to_np to True, if the result should be in numpy class.

parse (datetimes, to_np=False)

Parses the provided date/time string(s) into CDF epoch value(s).

For CDF_EPOCH:
The string has to be in the form of 'dd-mmm-yyyy hh:mm:ss.xxx' or 'yyyy-mm-ddThh:mm:ss.xxx' (in iso_8601). The string is the output from encode function.
For CDF_EPOCH16:
The string has to be in the form of 'dd-mmm-yyyy hh:mm:ss.mmm.uuu.nnn.ppp' or 'yyyy-mm-ddThh:mm:ss.mmmuuunnnppp' (in iso_8601). The string is the output from encode function.
For TT2000:
The string has to be in the form of 'dd-mmm-yyyy hh:mm:ss.mmm.uuu.nnn' or 'yyyy-mm-ddThh:mm:ss.mmmuuunnn' (in iso_8601). The string is the output from encode function.

Specify to_np to True, if the result should be in numpy class.

findepochrange (epochs, starttime=None, endtime=None)

Finds the record range within the start and end time from values of a CDF epoch data type. It returns a list of record numbers. If the start time is not provided, then it is assumed to be the minimum possible value. If the end time is not provided, then the maximum possible value is assumed. The epoch is assumed to be in the chronological order. The start and end times should have the proper number of date/time components, corresponding to the epoch's data type.

The start/end times should be in either be in epoch units, or in the list format described in "compute_epoch/epoch16/tt2000" section.

getVersion ()

Shows the code version.

getLeapSecondLastUpdated ()

Shows the latest date a leap second was added to the leap second table.

@author: Bryan Harter, Michael Liu

About

A python module for reading NASA's Common Data Format (cdf) files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%