r - Error in read.table duplicate row.names -


when tried read following table dataframe (data100) by:

data100 <- read.table(header=true, text='                                  verb_object session_id 1:   ba31c1cc63e5043483fae25f085e25e5 insert   41595370 2: bece6374d91d47e6285efdeba6d65bb9 database   41595371 3:   26d695c8ca82caffdf985201f3aa44d7 update   41595282 4:   26d695c8ca82caffdf985201f3aa44d7 update   41595282 5: 2bc5a4199a0dda16fa17a9ca1aa17c02 database   41595373 6:   6d944d54c54ed75d487288fe1505bb59 insert   41595368 ')  following error: error in read.table(header = true, text = "\n                               verb_object session_id\n   ba31c1cc63e5043483fae25f085e25e5 insert   41595370\n                      bece6374d91d47e6285efdeba6d65bb9 database   41595371\n                         26d695c8ca82caffdf985201f3aa44d7 update   41595282\n                         26d695c8ca82caffdf985201f3aa44d7 update   41595282\n                     2bc5a4199a0dda16fa17a9ca1aa17c02 database   41595373\n                         6d944d54c54ed75d487288fe1505bb59 insert   41595368\n") :    duplicate 'row.names' not allowed 

how can read it?

after usage of

lines <- readlines(textconnection("       verb_object session_id    > data100<-read.table(text=gsub('(?<=\\:)\\s+|\\s+(?=\\s[0-9])', " '", lines, perl=true), sep='', fill=true) 

the result followed:

> data100            v1                               v2       v3       v4 v5                                         v6       v7 1 verb_object                       session_id                na                                                     na 2         1:  ba31c1cc63e5043483fae25f085e25e5   insert 41595370 2: bece6374d91d47e6285efdeba6d65bb9 database  41595371 3         3:  26d695c8ca82caffdf985201f3aa44d7   update 41595282 4:   26d695c8ca82caffdf985201f3aa44d7 update  41595282 4         5:  2bc5a4199a0dda16fa17a9ca1aa17c02 database 41595373 6:   6d944d54c54ed75d487288fe1505bb59 insert  41595368 >  

we can read readlines, place quotes using gsub, , read read.table

lines <- readlines(textconnection("verb_object session_id 1:   ba31c1cc63e5043483fae25f085e25e5 insert   41595370 2: bece6374d91d47e6285efdeba6d65bb9 database   41595371 3:   26d695c8ca82caffdf985201f3aa44d7 update   41595282 4:   26d695c8ca82caffdf985201f3aa44d7 update   41595282 5: 2bc5a4199a0dda16fa17a9ca1aa17c02 database   41595373 6:   6d944d54c54ed75d487288fe1505bb59 insert   41595368"))    read.table(text=gsub('(?<=\\:)\\s+|\\s+(?=\\s[0-9])', " '", lines, perl=true), sep='') #                                  verb_object session_id #1:   ba31c1cc63e5043483fae25f085e25e5 insert    41595370 #2: bece6374d91d47e6285efdeba6d65bb9 database    41595371 #3:   26d695c8ca82caffdf985201f3aa44d7 update    41595282 #4:   26d695c8ca82caffdf985201f3aa44d7 update    41595282 #5: 2bc5a4199a0dda16fa17a9ca1aa17c02 database    41595373 #6:   6d944d54c54ed75d487288fe1505bb59 insert    41595368 

update

the op's new dataset can read readlines before,

lines <- readlines(textconnection("items newitem 1: ba31c1cc63e5043483fae25f085e25e5 insert ov1 2: bece6374d91d47e6285efdeba6d65bb9 database ov2 3: 26d695c8ca82caffdf985201f3aa44d7 update ov3 4: 2bc5a4199a0dda16fa17a9ca1aa17c02 database ov4 5: 6d944d54c54ed75d487288fe1505bb59 insert ov5"))    

we should note pattern matched in earlier dataset (\\s+(?=\\s[0-9])) won't work here first character in 'sessionid' number, while in 'newitem' uppercase letter. so, match 1 or more characters not : beginning of string (^[^:]+) followed :, followed 1 or more space (\\s+), capture characters group using parentheses () i.e. 1 or more characters not space followed 1 or more space , characters not space (([^ ]+\\s+[^ ]+), match 1 or more space (\\s+) followed 1 or more characters till end of string capture group ((.*)$). replace placing quotes around first capture group ('\\1') followed space followed second capture group.

read.table(text=gsub("^[^:]+:\\s+([^ ]+\\s+[^ ]+)\\s+(.*)$",          "'\\1' \\2", lines), header=true) #                                     items newitem #1   ba31c1cc63e5043483fae25f085e25e5 insert     ov1 #2 bece6374d91d47e6285efdeba6d65bb9 database     ov2 #3   26d695c8ca82caffdf985201f3aa44d7 update     ov3 #4 2bc5a4199a0dda16fa17a9ca1aa17c02 database     ov4 #5   6d944d54c54ed75d487288fe1505bb59 insert     ov5 

Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -