java - Jsoup remove ONLY html tags -


what proper way remove html tags (preserve custom/unknown tags) jsoup (not regex)?

expected input:

<html>   <customtag>     <div> dsgfdgdgf </div>   </customtag>   <123456789/>   <123>   <html123/> </html> 

expected output:

  <customtag>      dsgfdgdgf   </customtag>   <123456789/>   <123>   <html123/> 

i tried use cleaner whitelist.none(), removes custom tags also.

also tried:

string str = jsoup.parse(html).text() 

but removes custom tags also.

this answer isn't me, because number of custom tags infinity.

you might want try this:

string[] tags = new string[]{"html", "div"}; document thing = jsoup.parse("<html><customtag><div>dsgfdgdgf</div></customtag><123456789/><123><html123/></html>"); (string tag : tags) {     (element elem : thing.getelementsbytag(tag)) {         elem.parent().insertchildren(elem.siblingindex(),elem.childnodes());         elem.remove();     } } system.out.println(thing.getelementsbytag("body").html()); 

please note <123456789/> , <123> don't conform xml standard, escaped. downside may have explicitly write down tags don't (aka html tags) , may sloooooow. haven't looked @ how fast going run.

mfg mist


Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -