php - How can I get the Plain text AND the HTML of a DOM element created from XML? -


we have thousands of closed caption xml files have import database plain text, preserve html markup conversion cc format. have been able extract plain text quite easily, can't seem find correct way of extracting raw html well.

is there way accomplish "->htmlcontent" in same way ->textcontent works below?

$ctx = stream_context_create(array('http' => array('timeout' => 60))); $xml = @file_get_contents('http://blah-blah-blah/16th.xml', 0, $ctx);  $dom = new domdocument; $dom->loadxml($xml); $ptags = $dom->getelementsbytagname( "p" ); foreach( $ptags $p ) {     $text   = $p->textcontent; } 

typical <p> being processed:

<p begin="00:00:14.83" end="00:00:18.83" tts:textalign="left">     <metadata ccrow="12" cccol="8"/>     (male narrator)<br></br> 16th , 17th centuries<br></br> formative 200 years </p> 

successful ->textcontent result

(male narrator) 16th , 17th centuries formative 200 years 

desired html result

(male narrator)<br></br> 16th , 17th centuries<br></br> formative 200 years 

in other word save specific nodes - br elements , text nodes. can dom+xpath:

$document = new domdocument(); $document->preservewhitespace = false; $document->loadxml($html); $xpath = new domxpath($document);  foreach ($xpath->evaluate('//p') $p) {   $content = '';   foreach ($xpath->evaluate('.//br|.//text()', $p) $node) {     $content .= $document->savehtml($node);   }   var_dump($content); } 

output:

string(86) "     (male narrator)<br> 16th , 17th centuries<br> formative 200 years " 

the xpath expression

any descendant br: .//br
descendant text node: .//text()
combined expression: .//br|.//text()

namespaces

if xml uses namespaces have register , use them.

$document = new domdocument(); $document->preservewhitespace = false; $document->loadxml($html); $xpath = new domxpath($document); $xpath->registernamespace('tt', 'http://www.w3.org/2006/04/ttaf1');  foreach ($xpath->evaluate('//tt:p') $p) {   $content = '';   foreach ($xpath->evaluate('.//tt:br|.//text()', $p) $node) {     $content .= $document->savehtml($node);   }   var_dump($content); } 

Comments

Popular posts from this blog

javascript - Slick Slider width recalculation -

jsf - PrimeFaces Datatable - What is f:facet actually doing? -

angular2 services - Angular 2 RC 4 Http post not firing -