You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some thoughts on why things are slow at the moment:
At the moment our entire pipeline assumes that all time series are unevenly-spaced; as a result, internal computations are always performed on every time series separately. If we had some check for the evenly-spaced case, we could use different (faster) numpy array routines.
cf. np.max(X, axis=0) and [np.max(x_i) for x_i in X]
Our communication overhead through dask isn't horrible as far as I can tell, but it's a (relatively) bigger factor for 1) many time series, 2) shorter time series, or 3) simpler features.
How many features could be sped up in this way? My intuition is that a vectorized approach exists for most the general features, some of the cadence features, and none of the Lomb-Scargle features.
Some thoughts on why things are slow at the moment:
numpy
array routines.np.max(X, axis=0)
and[np.max(x_i) for x_i in X]
dask
isn't horrible as far as I can tell, but it's a (relatively) bigger factor for 1) many time series, 2) shorter time series, or 3) simpler features.featurize_time_series
#227 in that we would want to handle 3d arrays in a special way.The text was updated successfully, but these errors were encountered: