python - Summing Groups of Columns within a Pandas Dataframe -

- February 15, 2015

i have pandas dataframe 600 columns (df1), , want sum values of each column in groups of 6. in other words, want create new dataframe (df2) has 100 columns, each column being sum of 6 columns input dataframe. example, each row first column in df2 sum of first 6 columns in df1 (keeping rows separate). dataframe using has string values each column name (here represented single letters)

for df1:

         b    c    d    e    f    g    h       j ...    0     9    6    3    4    7    7    6    0    5    2 ...        1     8    0    6    6    0    5    6    5    8    7 ...            2     9    0    7    2    9    5    3    2    1    7 ...             3     5    2    9    6    7    0    3    8    5    0 ...             4     7    1    0    7    4    0    2    0    5    8 ...      5     0    9    2    0    4    9    5    7    6    2 ...

i want first column of df2 be:

       g ...  0   36   1   25 2   32 3   29 4   19 5   24

where each row sum of first 6 columns of row. next column sum of next 6 columns , on, column name being name of first column in each set of 6. (first column name first column's, second column name seventh column's, etc.)

i've tried using column indices sum correct columns, having issues finding way store sums in new columns relevant names.

is there pythonic way create these columns, , pull column names df df2?

you can groupby columns (axis=1) groups created df.columns //6 , sum:

print (df)    0  1  2  3  4  5  6  7  8  9  10  11  12  13 0  9  6  3  4  7  7  6  0  5  2   2   3   7   2 1  8  0  6  6  0  5  6  5  8  7   9   5   5   1 2  9  0  7  2  9  5  3  2  1  7   5   9   6   6 3  5  2  9  6  7  0  3  8  5  0   8   8   9   9 4  7  1  0  7  4  0  2  0  5  8   2   4   4   1 5  0  9  2  0  4  9  5  7  6  2   7   1   5   3  #if values of columns not int #df.columns = df.columns.astype(int)  print (df.columns // 6) int64index([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2], dtype='int64')  print (df.groupby(df.columns // 6, axis=1).sum())     0   1   2 0  36  18   9 1  25  40   6 2  32  27  12 3  29  32  18 4  19  21   5 5  24  28   8

edit:

you can create index range , shape (get length of columns) , use in groupby:

idx = pd.index(range(df.shape[1])) // 6 print (idx) int64index([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2], dtype='int64')  df1 = df.groupby(idx, axis=1).sum() #if need rename columns categories df1.columns = df.columns[::6] print (df1)       g   m 0  36  18   9 1  25  40   6 2  32  27  12 3  29  32  18 4  19  21   5 5  24  28   8

Search This Blog

Jal

python - Summing Groups of Columns within a Pandas Dataframe -

Comments

Post a Comment

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -