data analysis - Shorter method to replace entries in R -
this question has answer here:
i have started learning r recently. here's source file working (https://github.com/cosname/art-r-translation/blob/master/data/grades.txt). there anyway can change letter grade from, say, 4.0, a- 3.7 etc. without using loop?
i asking because if there 1m entries, "for" loop might not efficient way modify data. appreciate help.
since 1 of posters told me post code, thought of running loop see whether able it. here's code:
mygrades<-read.table("grades.txt",header = true) <- (i in 1:nrow(mygrades)) { #print(i) #for now, see whether replaced 4.0. if(mygrades[i,1]=="a") { mygrades[i,1]=4.0 } else if (mygrades[i,2]=="a") { mygrades[i,2]=4.0 } else if (mygrades[i,3]=="a") { mygrades[i,3]=4.0 } else { #do nothing...continues } } write.table(mygrades,"newgrades.txt")
however, output little weird. "a"s, na , others left is. can please me code?
@alistaire, did try hadley's look-up table, , works. looked @ dplyr code, , works well. however, sake of understanding, i'm still trying use loops. please note has been 2 days since opened r book. here's modified code.
#there 1 mistake in code: didn't use stringsasfactors=false. #now, code doesn't work "a"s. spits out 4.0 as, , #doesn't others. why be? mygrades<-read.table("grades.txt",header = true,stringsasfactors=false) <- (i in 1:nrow(mygrades)) { #print(i) if(mygrades[i,1]=="a") { mygrades[i,1]=4.0 } else if (mygrades[i,2]=="a") { mygrades[i,2]=4.0 } else if (mygrades[i,3]=="a") { mygrades[i,3]=4.0 } else { #do nothing...continues } } write.table(mygrades,"newgrades.txt")
the output is:
"final_exam" "quiz_avg" "homework_avg" "1" "c" "4" "a" "2" "c-" "b-" "4" "3" "d+" "b+" "4" "4" "b+" "b+" "4" "5" "f" "b+" "4" "6" "b" "a-" "4" "7" "d+" "b+" "a-" "8" "d" "a-" "4" "9" "f" "b+" "4" "10" "4" "c-" "b+" "11" "a+" "4" "a" "12" "a-" "4" "a" "13" "b" "4" "a" "14" "d-" "a-" "4" "15" "a+" "4" "a" "16" "b" "a-" "4" "17" "f" "d" "a-" "18" "b" "4" "a" "19" "b" "b+" "4" "20" "a+" "a-" "4" "21" "4" "a" "a" "22" "b" "b+" "4" "23" "d" "b+" "4" "24" "a-" "a-" "4" "25" "f" "4" "a" "26" "b+" "b+" "4" "27" "a-" "b+" "4" "28" "a+" "4" "a" "29" "4" "a-" "a" "30" "a+" "a-" "4" "31" "4" "b+" "a-" "32" "b+" "b+" "4" "33" "c" "4" "a"
as can see in first row, first got recoded 4, second didn't recoded. idea why happening?
thanks in advance.
a typical way in base r make named vector lookup table, e.g.
# data fewer levels simplicity df <- data.frame(x = rep(1:3, 2), y = rep(1:2, 3)) lookup <- c(`1` = "a", `2` = "b", `3` = "c")
and subset each column:
data.frame(lapply(df, function(x){lookup[x]})) ## x y ## 1 a ## 2 b b ## 3 c ## 4 b ## 5 b ## 6 c b
alternately, dplyr
added recode
function that's useful such job:
library(dplyr) df <- read.table('https://raw.githubusercontent.com/cosname/art-r-translation/master/data/grades.txt', header = true) df %>% mutate_all(funs(recode(., = '4.0', `a-` = '3.7'))) %>% # etc. as_data_frame() # prettier printing ## # tibble: 33 x 3 ## final_exam quiz_avg homework_avg ## <fctr> <fctr> <fctr> ## 1 c 4.0 4.0 ## 2 c- b- 4.0 ## 3 d+ b+ 4.0 ## 4 b+ b+ 4.0 ## 5 f b+ 4.0 ## 6 b 3.7 4.0 ## 7 d+ b+ 3.7 ## 8 d 3.7 4.0 ## 9 f b+ 4.0 ## 10 39 c- b+ ## # ... 23 more rows
Comments
Post a Comment