parsing - Parsec3 Text parser for quoted string, where everything is allowed in between quotes -
i have asked question before (here) turns out solution provided did not handle test cases. also, need 'text' parser rather 'string', need parsec3.
ok, parser should allow every type of char inbetween quotes, quotes. end of quoted text marked ' character, followed |, space or end of input.
so,
'aa''''|
should return string
aa'''
this have:
import text.parsec import text.parsec.text quotedlabel :: parser text quotedlabel = -- reads first quote. spaces string "'" lab <- liftm pack $ endby1 anychar endofquote return lab endofquote = string "'" try(eof) <|> try( oneof "| ")
now, problem here of course eof
has different type oneof "| "
, compilation falls.
how fix this? there better way achieve trying do?
whitespace
first comment on handling white space...
generally practice write parsers consume whitespace following token or syntactic unit. it's common define combinator like:
lexeme p = p <* spaces
to convert parser p 1 discards whitespace following whatever p parses. e.g., if have
number = many1 digit
simply use lexeme number
whenever want eat whitespace following number.
for more on approach handling whitespace , other advice on parsing languages, see this megaparsec tutorial.
label expressions
based on your previous question appears want parse expressions of form:
label1 | label2 | ... | labeln
where each label may simple label or quoted label.
the idiomatic way parse pattern use sepby
this:
labels :: parser string labels = sepby1 (try quotedlabel <|> simplelabel) (char '|')
we define both simplelabel , quotedlabel in terms of characters may occur in them. simplelabel valid character non-| , non-space:
simplelabel :: parser string simplelabel = many (noneof "| ")
a quotedlabel single quote followed run of valid quotedlabel-characters followed ending single quote:
sq = char '\'' quotedlabel :: parser string quotedlabel = char sq chs <- many validchar char sq return chs
a validchar either non-single quote or single quote not followed eof or vertical bar:
validchar = noneof [sq] <|> try validquote validquote = char sq notfollowedby eof notfollowedby (char '|') return sq
the first notfollowedby
fail if single quote appears before end of input. second notfollowedby
fail if next character vertical bar. therefore sequence of 2 succeed if there non-vertical bar character following single quote. in case single quote should interpreted part of string , not terminating single quote.
unfortunately doesn't quite work because current implementation of notfollowedby
succeed parser not consume input -- i.e. eof
. (see this issue more details.)
to work around problem can use alternate implementation:
notfollowedby' :: (stream s m t, show a) => parsect s u m -> parsect s u m () notfollowedby' p = try $ join $ {a <- try p; return (unexpected (show a));} <|> return (return ())
here complete solution tests. adding few lexeme
calls can make parser eat white space decide not significant.
import text.parsec hiding (labels) import text.parsec.string import control.monad notfollowedby' :: (stream s m t, show a) => parsect s u m -> parsect s u m () notfollowedby' p = try $ join $ {a <- try p; return (unexpected (show a));} <|> return (return ()) sq = '\'' validchar = noneof "'" <|> try validquote validquote = char sq notfollowedby' eof notfollowedby (char '|') return sq quotedlabel :: parser string quotedlabel = char sq str <- many validchar char sq return str plainlabel :: parser string plainlabel = many (noneof "| ") labels :: parser [string] labels = sepby1 (try quotedlabel <|> try plainlabel) (char '|') test input expected = case parse (labels <* eof) "" input of left e -> putstrln $ "error: " ++ show e right v -> if v == expected putstrln $ "ok - got: " ++ show v else putstrln $ "not ok - got: " ++ show v ++ " expected: " ++ show expected test1 = test "a|b|c" ["a","b","c"] test2 = test "a|'b b'|c" ["a", "b b", "c"] test3 = test "'abc''|def" ["abc'", "def" ] test4 = test "'abc'" ["abc"] test5 = test "x|'abc'" ["x","abc"]
Comments
Post a Comment