physics - How to apply Marshall Palmer function at ID level in R? -


i'm analyzing dual-polarization radar data , want add result of marshall palmer relation id-level variable in my data.

there's no cran function can find, r user has script wherein applies relation estimate of expected value in data:

# troy w. (thanks!) # few small changes hack-r  ## better in r me clean up/refactor code bit.   library(dplyr) library(data.table)  test <- fread('../input/test.csv')   mpalmer <- function(ref, minutes_past) {    # order reflectivity values , minutes_past   sort_min_index = order(minutes_past)   minutes_past <- minutes_past[sort_min_index]   ref <- ref[sort_min_index]    # calculate length of time each reflectivity value valid   valid_time <- rep(0, length(minutes_past))   valid_time[1] <- minutes_past[1]   if (length(valid_time) > 1) {     (i in seq(2, length(minutes_past))) {       valid_time[i] <- minutes_past[i] - minutes_past[i-1]     }     valid_time[length(valid_time)] = valid_time[length(valid_time)] + 60 - sum(valid_time)   } else {     # if 1 observation, make valid entire hour     valid_time <- 60   }    valid_time = valid_time / 60    # calculate hourly rain rates using marshall-palmer weighted valid times   sum <- 0   (i in seq(length(ref))) {     if (!is.na(ref[i])) {       mmperhr <- ((10^(ref[i]/10))/200) ^ 0.625       sum <- sum + mmperhr * valid_time[i]     }   }    return(sum)  } results <- test %>% group_by(id) %>% summarize(expected=sum) write.csv(results, file='sample_solution.csv', row.names=false) 

in addition being incredibly slow big data, problem code above doesn't create column of results within original data, allow productionalized in etl pipeline created relation @ id level 1 predictive variable in dataset.

i tried rewriting function this:

mpalmer <- function(ref, minutes_past) {   # credit troy   # edits jason miller, hack-r.com    # order reflectivity values , minutes_past   sort_min_index = order(minutes_past)   minutes_past <- minutes_past[sort_min_index]   ref <- ref[sort_min_index]    # calculate length of time each reflectivity value valid   valid_time <- rep(0, length(minutes_past))   valid_time[1] <- minutes_past[1]   if (length(valid_time) > 1) {     (i in seq(2, length(minutes_past))) {       valid_time[i] <- minutes_past[i] - minutes_past[i-1]     }     valid_time[length(valid_time)] = valid_time[length(valid_time)] + 60 - sum(valid_time)   } else {     # if 1 observation, make valid entire hour     valid_time <- 60   }    valid_time = valid_time / 60    # calculate hourly rain rates using marshall-palmer weighted valid times   sum <- 0   (i in seq(length(ref))) {     if (!is.na(ref[i])) {       mmperhr <- ((10^(ref[i]/10))/200) ^ 0.625       sum <- sum + mmperhr * valid_time[i]     }   }    return(sum)  } 

and applying this:

train.samp$mp <- aggregate(train.samp$ref, by=list(train.samp$id), fun = mpalmer, minutes_past = train.samp$minutes_past)

which think works, after running long time, returned error this:

error in `$<-.data.frame`(`*tmp*`, "mp", value = list(group.1 = c(10l,  :    replacement has 9765 rows, data has 10000 

i've tried on different samples of data , error message in format, though specific numbers may change. there's no missing data in dataset.

any idea how fix function (and/or make faster)?

update: i've got working loop so slow...

this i'm going now, i'm still open other solutions.

basically, went original function broke apart overly large dataset manageable chunks , ran loops on each chunk:

  train.samp      <- train.samp[order(train.samp$id),]   train.samp1     <- train.samp1[order(train.samp1$id),]    train.samp.id    <- unique(train.samp$id)   train.samp.id.1  <- train.samp.id[1:25000]   train.samp.id.2  <- train.samp.id[25001:50000]   train.samp.id.3  <- train.samp.id[50001:75000]   train.samp.id.4  <- train.samp.id[75001:100000]   train.samp.id.6  <- train.samp.id[100001:125000]   train.samp.id.5  <- train.samp.id[125001:150000]   train.samp.id.7  <- train.samp.id[150001:175000]   train.samp.id.8  <- train.samp.id[175001:200000]   train.samp.id.9  <- train.samp.id[200001:length(train.samp.id)]    train.samp.1 <- train.samp[train.samp$id %in% train.samp.id.1,]   train.samp.2 <- train.samp[train.samp$id %in% train.samp.id.2,]   train.samp.3 <- train.samp[train.samp$id %in% train.samp.id.3,]   train.samp.4 <- train.samp[train.samp$id %in% train.samp.id.4,]   train.samp.5 <- train.samp[train.samp$id %in% train.samp.id.5,]   train.samp.6 <- train.samp[train.samp$id %in% train.samp.id.6,]   train.samp.7 <- train.samp[train.samp$id %in% train.samp.id.7,]   train.samp.8 <- train.samp[train.samp$id %in% train.samp.id.8,]   train.samp.9 <- train.samp[train.samp$id %in% train.samp.id.9,]    system.time(   for(i in unique(train.samp.1$id)){     train.samp.1$mp[train.samp.1$id == i] <- mpalmer(train.samp.1$ref[train.samp.1$id == i], minutes_past = train.samp.1$minutes_past[train.samp.1$id == i])   }    )   for(i in unique(train.samp$id[train.samp$id %in% train.samp.id.2])){     train.samp$mp[train.samp$id == i] <- mpalmer(train.samp$ref[train.samp$id == i], minutes_past = train.samp$minutes_past[train.samp$id == i])   }       for(i in unique(train.samp$id[train.samp$id %in% train.samp.id.3])){     train.samp$mp[train.samp$id == i] <- mpalmer(train.samp$ref[train.samp$id == i], minutes_past = train.samp$minutes_past[train.samp$id == i])   }       for(i in unique(train.samp$id[train.samp$id %in% train.samp.id.4])){     train.samp$mp[train.samp$id == i] <- mpalmer(train.samp$ref[train.samp$id == i], minutes_past = train.samp$minutes_past[train.samp$id == i])   }       for(i in unique(train.samp$id[train.samp$id %in% train.samp.id.5])){     train.samp$mp[train.samp$id == i] <- mpalmer(train.samp$ref[train.samp$id == i], minutes_past = train.samp$minutes_past[train.samp$id == i])   }       for(i in unique(train.samp$id[train.samp$id %in% train.samp.id.6])){     train.samp$mp[train.samp$id == i] <- mpalmer(train.samp$ref[train.samp$id == i], minutes_past = train.samp$minutes_past[train.samp$id == i])   }       for(i in unique(train.samp$id[train.samp$id %in% train.samp.id.7])){     train.samp$mp[train.samp$id == i] <- mpalmer(train.samp$ref[train.samp$id == i], minutes_past = train.samp$minutes_past[train.samp$id == i])   }       for(i in unique(train.samp$id[train.samp$id %in% train.samp.id.8])){     train.samp$mp[train.samp$id == i] <- mpalmer(train.samp$ref[train.samp$id == i], minutes_past = train.samp$minutes_past[train.samp$id == i])   }       for(i in unique(train.samp$id[train.samp$id %in% train.samp.id.9])){     train.samp$mp[train.samp$id == i] <- mpalmer(train.samp$ref[train.samp$id == i], minutes_past = train.samp$minutes_past[train.samp$id == i])   }      system.time(     for(i in unique(train.samp1$id)){     train.samp1$mp[train.samp1$id == i] <- mpalmer(train.samp1$ref[train.samp1$id == i], minutes_past = train.samp1$minutes_past[train.samp1$id == i])   }    

the function not shown here, take advantage of @gregor's suggestion in comment.


Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -