python - Efficiently compute historical aggregates -

- February 15, 2015

i have dataframe containing historical records, example sales.

import pandas pd  first_salesman = pd.series([1, 2, 3], index=[pd.to_datetime('2015-01-01'), pd.to_datetime('2015-01-02'), pd.to_datetime('2015-01-03')]) second_salesman = pd.series([2, 0, 5], index=[pd.to_datetime('2015-01-01'), pd.to_datetime('2015-01-02'), pd.to_datetime('2015-01-03')]) third_salesman = pd.series([6, 7, 1], index=[pd.to_datetime('2015-01-01'), pd.to_datetime('2015-01-02'), pd.to_datetime('2015-01-03')])  df = pd.concat([first_salesman, second_salesman, third_salesman], axis=1).reset_index() df = pd.melt(df, id_vars=['index'], value_vars=[0, 1, 2]) df.columns = ['date', 'salesman_id', 'sales'] df = df.set_index('date')

i compute historical statistics each of salesmen: in other terms, each salesman , each day, know how mean, std, max, min, etc... looked until day.

so focusing on salesman 1 , average, [1, 1.5, 2].

i can build for-loop, have big number of salesmen , dates , doesn't efficient thing do. i'm looking way of computing in efficient manner.

after tinkering, found solution. in pandas >= 0.18, can use df.expanding(). see documentation.

a clean solution scenario one-liner:

df.set_index(['salesman_id', 'date']).expanding(min_periods=1).agg([np.mean, np.std, np.min, np.max])

note .expanding() supports .agg() can use whatever aggregates you'd like.

Search This Blog

Jal

python - Efficiently compute historical aggregates -

Comments

Post a Comment

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -