java - Regex Sentence Split -
i'm trying split string "sentences" i'm having issue trailing words. example:
"this isn't cool. doesn't work. this"
should split
[this cool., doesn't work., this]
so far i've been using "[^\\.!?]*[\\.\\s!?]+"
can't figure out how adjust trailing word since there no terminating character , nothing for. there can add or need adjust completely?
instead of splitting string can find sentences , matching trailing sentence can use anchor $
match end of string:
list<string> sentences = new arraylist<string>(); matcher m = pattern.compile("[^?!.]+(?:[.?!]|$)") .matcher("this isn't cool. doesn't work. this"); while (m.find()) { sentences.add(m.group()); }
Comments
Post a Comment