python - Get literal prefix of regex pattern -
here problem:
there list of thousands of regular expressions. need regular expression matches given string. hopefully, these regexes mutually exclusive, if several regexes matching @ same time, i'm ok returning any of them.
i assume of regular expressions starts literal prefix e.g.:
"some_literal_string(?:\?some_regular_part)?
" → "some_literal_string
"
i'd try following data structure make search fast:
regexes = [index=prefix_length:{key=prefix:{*prefixes}]
now, find prefix, need iterate index
min(len(string), len(longest_prefix))
down 0
, extract subset of regexes:
subset = regexes[i][string[0:i]]
now need check each element match , if pattern found, return it, otherwise, continue next index
.
the question is: how literal prefix of regular expression in common case?
i came following regular expression:
(?:[^.^$*+?{\\[|(]|(?:\\(?:[^\dabbddsswwz]|0|[0-7]{3})))*(?![*?|]|{\d+(?:,\d*)?})
it's needed replace backslash+symbol symbol in matched string after search:
\$
→$
it's needed replace octal escapes:
\0100
→@
Comments
Post a Comment