Quantcast
Channel: Trying to Parse Only the Images from an RSS Feed - Stack Overflow
Viewing all articles
Browse latest Browse all 3

Answer by IMSoP for Trying to Parse Only the Images from an RSS Feed

$
0
0

The <img> tags inside that RSS feed are not actually elements of the XML document, contrary to the syntax highlighting on this site - they are just text inside the <description> element which happen to contain the characters < and >.

The string <![CDATA[ tells the XML parser that everything from there until it encounters ]]> is to be treated as a raw string, regardless of what it contains. This is useful for embedding HTML inside XML, since the HTML tags wouldn't necessarily be valid XML. It is equivalent to escaping the whole HTML (e.g. with htmlspecialchars) so that the <img> tags would look like &lt;img&gt;. (I went into more technical details on another answer.)

So to extract the images from the RSS requires two steps: first, get the text of each <description>, and second, find all the <img> tags in that text.

$xml = simplexml_load_file('http://mywebsite.com/rss?t=2040&dl=1&i=1&r=ceddfb43483437b1ed08ab8a72cbc3d5');$descriptions = $xml->xpath('//item/description');foreach ( $descriptions as $description_node ) {    // The description may not be valid XML, so use a more forgiving HTML parser mode    $description_dom = new DOMDocument();    $description_dom->loadHTML( (string)$description_node );    // Switch back to SimpleXML for readability    $description_sxml = simplexml_import_dom( $description_dom );    // Find all images, and extract their 'src' param    $imgs = $description_sxml->xpath('//img');    foreach($imgs as $image) {        echo (string)$image['src'];    }}

Viewing all articles
Browse latest Browse all 3

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>