python - python3: re.sub works randomly -


i’ve got multiple block of text surrounded these html tags:

<code type="block" lexer="python"> text </code> 

minimal working example

i need replace them other text (in minimal example here, simple string: "replacement"). provide 2 sample blocks: 1 correctly replaced, other 1 not: can’t understand why, don’t seem different. test included

print(old_blockcode, "\n\n", new_blockcode, "\n_______", "\n\n") 

makes me think issue in re.sub, beats me why.

#!/usr/bin/python3 import re filecontent = """<code type="block" lexer="python">import re old_code, new_code in zip(codes_list, highlighted_list): pattern = re.sub(old_code, new_code, filecontent) pattern.append(pa)</code> <code type="block" lexer="python">import re inputfile = "test" outputfile = "testout"</code> """ blockcodes_list = [] blockhighlighted_list = [] blockcodes = re.finditer(r'<code type="block" lexer="python">(.*?)</code>', filecontent, flags=re.dotall) match in blockcodes:     block = match.group(1)     blockcodes_list.append(block)     blockhighlighted = "replacement"     blockhighlighted_list.append(blockhighlighted) newfilecontent = filecontent old_blockcode, new_blockcode in zip(blockcodes_list, blockhighlighted_list):     newfilecontent = re.sub(old_blockcode, new_blockcode, newfilecontent)            print(old_blockcode, "\n\n", new_blockcode, "\n_______", "\n\n")    print(newfilecontent) 

expected output

<code type="block" lexer="python">replacement</code> <code type="block" lexer="python">replacement</code> 

real output

<code type="block" lexer="python">import re old_code, new_code in zip(codes_list, highlighted_list): pattern = re.sub(old_code, new_code, filecontent) pattern.append(pa)</code> <code type="block" lexer="python">replacement</code> 

it did asked of it. wanted text beginning <code…> tag followed string of anythings followed </code>. , that's did:

<code type="block" lexer="python">import re old_code, new_code in zip(codes_list, highlighted_list): pattern = re.sub(old_code, new_code, filecontent) pattern.append(pa)</code> <code type="block" lexer="python">replacement</code> 

which special case of maxim "you can't parse xhtml regular expressions". regular expressions cannot match nesting groups. there may come answers following can non-greedy qualifiers, that's mistaken.

use xml parser.


Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -