r - Ways to add multiple columns to data frame using plyr/dplyr/purrr -


i have need mutate data frame through additional of several columns @ once using custom function, preferably using parallelization. below ways know how this.

setup

library(dplyr) library(plyr) library(purrr) library(domc) registerdomc(2)  df <- data.frame(x = rnorm(10), y = rnorm(10), z = rnorm(10)) 

suppose want 2 new columns, foocol = x + y , barcol = (x + y) * 100, these complex calculations done in custom function.

method 1: add columns separately using rowwise , mutate

foo <- function(x, y) return(x + y) bar <- function(x, y) return((x + y) * 100)  df_out1 <- df %>% rowwise() %>% mutate(foocol = foo(x, y), barcol = bar(x, y)) 

this not solution since requires 2 function calls each row , 2 "expensive" calculations of x + y. it's not parallelized.

method 2: trick ddply rowwise operation

df2 <- df df2$id <- 1:nrow(df2)  df_out2 <- ddply(df2, .(id), function(r) {   foocol <- r$x + r$y   barcol <- foocol * 100   return(cbind(r, foocol, barcol)) }, .parallel = t) 

here trick ddply calling function on each row splitting on unique id column created. it's clunky, though, , requires maintaining useless column.

method 3: splat

foobar <- function(x, y, ...) {   foocol <- x + y   barcol <- foocol * 100   return(data.frame(x, y, ..., foocol, barcol)) }  df_out3 <- splat(foobar)(df) 

i solution since can reference columns of df in custom function (which can anonymous if desired) without array comprehension. however, method isn't parallelized.

method 4: by_row

df_out4 <- df %>% by_row(function(r) {   foocol <- r$x + r$y   barcol <- foocol * 100   return(data.frame(foocol = foocol, barcol = barcol)) }, .collate = "cols") 

the by_row function purrr eliminates need unique id column, operation isn't parallelized.

method 5: pmap_df

df_out5 <- pmap_df(df, foobar) # or equivalently... df_out5 <- df %>% pmap_df(foobar) 

this best option i've found. pmap family of functions accept anonymous functions apply arguments. believe pmap_df converts df list , back, though, maybe there performance hit.

it's bit annoying need reference columns plan on using calculation in function definition function(x, y, ...) instead of function(r) row object.


am missing or better options? there concerns methods described?

how using data.table?

library(data.table)  foo <- function(x, y) return(x + y) bar <- function(x, y) return((x + y) * 100)  dt <- as.data.table(df)  dt[, foocol:=foo(x,y)] dt[, barcol:=bar(x,y)] 

the data.table library quite fast , has @ least some potential parallelization.


Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -