Optimize feature computation #234

bnaul · 2016-11-30T20:59:57Z

Some thoughts on why things are slow at the moment:

At the moment our entire pipeline assumes that all time series are unevenly-spaced; as a result, internal computations are always performed on every time series separately. If we had some check for the evenly-spaced case, we could use different (faster) numpy array routines.
- cf. np.max(X, axis=0) and [np.max(x_i) for x_i in X]
Our communication overhead through dask isn't horrible as far as I can tell, but it's a (relatively) bigger factor for 1) many time series, 2) shorter time series, or 3) simpler features.
How many features could be sped up in this way? My intuition is that a vectorized approach exists for most the general features, some of the cadence features, and none of the Lomb-Scargle features.
Somewhat related to Accept 3d arrays as input to featurize_time_series #227 in that we would want to handle 3d arrays in a special way.

The text was updated successfully, but these errors were encountered:

Provide feedback