python - python3: re.sub works randomly -
i’ve got multiple block of text surrounded these html tags:
<code type="block" lexer="python"> text </code>
minimal working example
i need replace them other text (in minimal example here, simple string: "replacement"). provide 2 sample blocks: 1 correctly replaced, other 1 not: can’t understand why, don’t seem different. test included
print(old_blockcode, "\n\n", new_blockcode, "\n_______", "\n\n")
makes me think issue in re.sub, beats me why.
#!/usr/bin/python3 import re filecontent = """<code type="block" lexer="python">import re old_code, new_code in zip(codes_list, highlighted_list): pattern = re.sub(old_code, new_code, filecontent) pattern.append(pa)</code> <code type="block" lexer="python">import re inputfile = "test" outputfile = "testout"</code> """ blockcodes_list = [] blockhighlighted_list = [] blockcodes = re.finditer(r'<code type="block" lexer="python">(.*?)</code>', filecontent, flags=re.dotall) match in blockcodes: block = match.group(1) blockcodes_list.append(block) blockhighlighted = "replacement" blockhighlighted_list.append(blockhighlighted) newfilecontent = filecontent old_blockcode, new_blockcode in zip(blockcodes_list, blockhighlighted_list): newfilecontent = re.sub(old_blockcode, new_blockcode, newfilecontent) print(old_blockcode, "\n\n", new_blockcode, "\n_______", "\n\n") print(newfilecontent)
expected output
<code type="block" lexer="python">replacement</code> <code type="block" lexer="python">replacement</code>
real output
<code type="block" lexer="python">import re old_code, new_code in zip(codes_list, highlighted_list): pattern = re.sub(old_code, new_code, filecontent) pattern.append(pa)</code> <code type="block" lexer="python">replacement</code>
it did asked of it. wanted text beginning <code…>
tag followed string of anythings followed </code>
. , that's did:
<code type="block" lexer="python">import re old_code, new_code in zip(codes_list, highlighted_list): pattern = re.sub(old_code, new_code, filecontent) pattern.append(pa)</code> <code type="block" lexer="python">replacement</code>
which special case of maxim "you can't parse xhtml regular expressions". regular expressions cannot match nesting groups. there may come answers following can non-greedy qualifiers, that's mistaken.
use xml parser.
Comments
Post a Comment