r - Subset by group with data.table -

- July 15, 2011

assume have data table containing baseball players:

library(plyr) library(data.table)  bdt <- as.data.table(baseball)

for each player (given id), want find row corresponding year in played games. straightforward in plyr:

ddply(baseball, "id", subset, g == max(g))

what's equivalent code data.table?

i tried:

setkey(bdt, "id")  bdt[g == max(g)]  # 1 row bdt[g == max(g), = id]  # error: 'by' or 'keyby' supplied not j bdt[, .sd[g == max(g)]] # 1 row

this works:

bdt[, .sd[g == max(g)], = id]

but it's 30% faster plyr, suggesting it's not idiomatic.

here's fast data.table way:

bdt[bdt[, .i[g == max(g)], = id]$v1]

this avoids constructing .sd, bottleneck in expressions.

edit: actually, main reason op slow not has .sd in it, fact uses in particular way - calling [.data.table, @ moment has huge overhead, running in loop (when 1 by) accumulates large penalty.

Search This Blog

Jal

r - Subset by group with data.table -

Comments

Post a Comment

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -