parsing - Parsec3 Text parser for quoted string, where everything is allowed in between quotes -


i have asked question before (here) turns out solution provided did not handle test cases. also, need 'text' parser rather 'string', need parsec3.

ok, parser should allow every type of char inbetween quotes, quotes. end of quoted text marked ' character, followed |, space or end of input.

so,

'aa''''|

should return string

aa'''

this have:

import text.parsec import text.parsec.text   quotedlabel :: parser text quotedlabel = -- reads first quote.     spaces     string "'"     lab <-  liftm pack $ endby1 anychar endofquote     return  lab  endofquote =     string "'"     try(eof) <|> try( oneof "| ") 

now, problem here of course eof has different type oneof "| ", compilation falls.

how fix this? there better way achieve trying do?

whitespace

first comment on handling white space...

generally practice write parsers consume whitespace following token or syntactic unit. it's common define combinator like:

lexeme p = p <* spaces 

to convert parser p 1 discards whitespace following whatever p parses. e.g., if have

number = many1 digit 

simply use lexeme number whenever want eat whitespace following number.

for more on approach handling whitespace , other advice on parsing languages, see this megaparsec tutorial.

label expressions

based on your previous question appears want parse expressions of form:

label1 | label2 | ... | labeln 

where each label may simple label or quoted label.

the idiomatic way parse pattern use sepby this:

labels :: parser string labels = sepby1 (try quotedlabel <|> simplelabel) (char '|') 

we define both simplelabel , quotedlabel in terms of characters may occur in them. simplelabel valid character non-| , non-space:

simplelabel :: parser string simplelabel = many (noneof "| ") 

a quotedlabel single quote followed run of valid quotedlabel-characters followed ending single quote:

sq = char '\''  quotedlabel :: parser string quotedlabel =   char sq   chs <- many validchar   char sq   return chs 

a validchar either non-single quote or single quote not followed eof or vertical bar:

validchar = noneof [sq] <|> try validquote  validquote =   char sq   notfollowedby eof   notfollowedby (char '|')   return sq 

the first notfollowedby fail if single quote appears before end of input. second notfollowedby fail if next character vertical bar. therefore sequence of 2 succeed if there non-vertical bar character following single quote. in case single quote should interpreted part of string , not terminating single quote.

unfortunately doesn't quite work because current implementation of notfollowedby succeed parser not consume input -- i.e. eof. (see this issue more details.)

to work around problem can use alternate implementation:

notfollowedby' :: (stream s m t, show a) => parsect s u m -> parsect s u m () notfollowedby' p = try $ join $       {a <- try p; return (unexpected (show a));}   <|> return (return ()) 

here complete solution tests. adding few lexeme calls can make parser eat white space decide not significant.

import text.parsec hiding (labels) import text.parsec.string import control.monad  notfollowedby' :: (stream s m t, show a) => parsect s u m -> parsect s u m () notfollowedby' p = try $ join $       {a <- try p; return (unexpected (show a));}   <|> return (return ())  sq = '\''  validchar =   noneof "'" <|> try validquote  validquote =   char sq   notfollowedby' eof   notfollowedby (char '|')   return sq  quotedlabel :: parser string quotedlabel =   char sq   str <- many validchar   char sq   return str  plainlabel :: parser string plainlabel = many (noneof "| ")  labels :: parser [string] labels = sepby1 (try quotedlabel <|> try plainlabel) (char '|')  test input expected =   case parse (labels <* eof) "" input of     left e -> putstrln $ "error: " ++ show e     right v -> if v == expected                  putstrln $ "ok - got: " ++ show v                  else putstrln $ "not ok - got: " ++ show v ++ "  expected: " ++ show expected  test1 = test "a|b|c"      ["a","b","c"] test2 = test "a|'b b'|c"  ["a", "b b", "c"] test3 = test "'abc''|def" ["abc'", "def" ] test4 = test "'abc'"      ["abc"] test5 = test "x|'abc'"    ["x","abc"] 

Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -