java - Jsoup remove ONLY html tags -
what proper way remove html tags (preserve custom/unknown tags) jsoup (not regex)?
expected input:
<html> <customtag> <div> dsgfdgdgf </div> </customtag> <123456789/> <123> <html123/> </html>
expected output:
<customtag> dsgfdgdgf </customtag> <123456789/> <123> <html123/>
i tried use cleaner whitelist.none(), removes custom tags also.
also tried:
string str = jsoup.parse(html).text()
but removes custom tags also.
this answer isn't me, because number of custom tags infinity.
you might want try this:
string[] tags = new string[]{"html", "div"}; document thing = jsoup.parse("<html><customtag><div>dsgfdgdgf</div></customtag><123456789/><123><html123/></html>"); (string tag : tags) { (element elem : thing.getelementsbytag(tag)) { elem.parent().insertchildren(elem.siblingindex(),elem.childnodes()); elem.remove(); } } system.out.println(thing.getelementsbytag("body").html());
please note <123456789/> , <123> don't conform xml standard, escaped. downside may have explicitly write down tags don't (aka html tags) , may sloooooow. haven't looked @ how fast going run.
mfg mist
Comments
Post a Comment