python - Get literal prefix of regex pattern -


here problem:

there list of thousands of regular expressions. need regular expression matches given string. hopefully, these regexes mutually exclusive, if several regexes matching @ same time, i'm ok returning any of them.

i assume of regular expressions starts literal prefix e.g.:

"some_literal_string(?:\?some_regular_part)?" → "some_literal_string"

i'd try following data structure make search fast:

regexes = [index=prefix_length:{key=prefix:{*prefixes}] 

now, find prefix, need iterate index min(len(string), len(longest_prefix)) down 0 , extract subset of regexes:

subset = regexes[i][string[0:i]] 

now need check each element match , if pattern found, return it, otherwise, continue next index.

the question is: how literal prefix of regular expression in common case?

i came following regular expression:

(?:[^.^$*+?{\\[|(]|(?:\\(?:[^\dabbddsswwz]|0|[0-7]{3})))*(?![*?|]|{\d+(?:,\d*)?}) 
  • it's needed replace backslash+symbol symbol in matched string after search:

    \$$

  • it's needed replace octal escapes:

    \0100@

https://regex101.com/r/ff8ab9/2


Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -