r - Subset by group with data.table -


assume have data table containing baseball players:

library(plyr) library(data.table)  bdt <- as.data.table(baseball) 

for each player (given id), want find row corresponding year in played games. straightforward in plyr:

ddply(baseball, "id", subset, g == max(g)) 

what's equivalent code data.table?

i tried:

setkey(bdt, "id")  bdt[g == max(g)]  # 1 row bdt[g == max(g), = id]  # error: 'by' or 'keyby' supplied not j bdt[, .sd[g == max(g)]] # 1 row 

this works:

bdt[, .sd[g == max(g)], = id]  

but it's 30% faster plyr, suggesting it's not idiomatic.

here's fast data.table way:

bdt[bdt[, .i[g == max(g)], = id]$v1] 

this avoids constructing .sd, bottleneck in expressions.

edit: actually, main reason op slow not has .sd in it, fact uses in particular way - calling [.data.table, @ moment has huge overhead, running in loop (when 1 by) accumulates large penalty.


Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -