python - Multiple Count and Median Values from a Dataframe -


i trying perform several operations in 1 program @ same time. have data-frame has dates of have no clue of start , end , want find:

  1. total number of days data-set has
  2. total number of hours
  3. median of count
  4. write separate output median per day/date.
  5. if possible median-of-median in possible simple way.

input: few rows large file of gb size

2004-01-05,16:00:00,17:00:00,mon,10766,656 2004-01-05,17:00:00,18:00:00,mon,12223,670 2004-01-05,18:00:00,19:00:00,mon,12646,710 2004-01-05,19:00:00,20:00:00,mon,19269,778 2004-01-05,20:00:00,21:00:00,mon,20504,792 2004-01-05,21:00:00,22:00:00,mon,16553,783 2004-01-05,22:00:00,23:00:00,mon,18944,790 2004-01-05,23:00:00,00:00:00,mon,17534,750 2004-01-06,00:00:00,01:00:00,tue,17262,747 2004-01-06,01:00:00,02:00:00,tue,19072,777 2004-01-06,02:00:00,03:00:00,tue,18275,785 2004-01-06,03:00:00,04:00:00,tue,13589,757 2004-01-06,04:00:00,05:00:00,tue,16053,735 

the start , end date not known.

edit: expected output:1 have 1 row of results

days,hours,median,median-of-median 2,17262,13,17398 

median-of-median median value of median column output 2

expected output:2, have medians of every date used find median-of-median

date,median 2004-01-05,17534 2004-01-06,17262 

code:

import pandas pd  datetime import datetime  df = pd.read_csv('one_hour.csv') df.columns = ['date', 'starttime', 'endtime', 'day', 'count', 'unique']  date_count = df.count(['date']) all_median = df.median(['count']) all_hours = df.count(['starttime']) med_med = df.groupby(['date','count']).median()  print date_count print all_median print all_hours  stats = ['date_count', 'all_median', 'all_hours', 'median-of-median'] stats.to_csv('stats_all.csv', index=false)  med_med.to_csv('med_day.csv', index=false, header=false) 

obviously code not give result supposed to.

the error shown below.

error:

traceback (most recent call last):   file "day_median.py", line 8, in <module>     all_median = df.median(['count'])   file "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 5310, in stat_func     numeric_only=numeric_only)   file "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4760, in _reduce     axis = self._get_axis_number(axis)   file "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 308, in _get_axis_number     axis = self._axis_aliases.get(axis, axis) typeerror: unhashable type: 'list' 

iiuc maybe change:

date_count = df.count(['date']) all_median = df.median(['count']) all_hours = df.count(['starttime']) 

to:

date_count = df['date'].count() all_median = df['count'].median() all_hours = df['starttime'].count()  print (date_count) print (all_median) print (all_hours) 13 17262.0 13 

if need count statistics columns date, count , starttime.

edit comment:

if need count unique values of column use nunique:

date_count = df['date'].nunique() print (date_count) 2 

dataframe stats:

cols = ['date_count', 'all_median', 'all_hours'] stats = pd.dataframe([[date_count, all_median, all_hours]], columns = cols) print (stats)    date_count  all_median  all_hours 0           2     17262.0         13 

Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -