<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>l'indicible blog &#187; Php</title>
	<atom:link href="http://www.lindicible.com/blog/category/dev/php-dev/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.lindicible.com/blog</link>
	<description>&#60;!--:en--&#62;the details of the inexpressible&#60;!--:--&#62;&#60;!--:fr--&#62;les détails de l'ineffable&#60;!--:--&#62;</description>
	<lastBuildDate>Fri, 02 Sep 2011 19:30:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>DOMDocument and UTF-8, a php charset problem</title>
		<link>http://www.lindicible.com/blog/en/2009/10/18/domdocument-et-utf-8-un-probleme-de-charset-en-php/</link>
		<comments>http://www.lindicible.com/blog/en/2009/10/18/domdocument-et-utf-8-un-probleme-de-charset-en-php/#comments</comments>
		<pubDate>Sat, 17 Oct 2009 23:44:11 +0000</pubDate>
		<dc:creator>Matt</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Php]]></category>
		<category><![CDATA[SEO]]></category>
		<category><![CDATA[Tutorials]]></category>

		<guid isPermaLink="false">http://www.lindicible.com/blog/?p=481</guid>
		<description><![CDATA[Today we&#8217;ll see how to manipulate DOM elements with php. We&#8217;ll take for example a very useful function which is oddly kind of hard to find : how to add a specific attribute to some HTML elements. This can be useful for example to add a rel=&#8221;nofollow&#8221; attribute to some links to let the search [...]]]></description>
			<content:encoded><![CDATA[<p>Today we&#8217;ll see how to manipulate <a href="http://en.wikipedia.org/wiki/Document_Object_Model" rel="external">DOM</a> elements with php.<br />
We&#8217;ll take for example a very useful function which is oddly kind of hard to find : how to add a specific attribute to some HTML elements.<br />
This can be useful for example to add a rel=&#8221;nofollow&#8221; attribute to some links to let the search engines know they don&#8217;t need to follow them, while leaving those links available for your users.<br />
In an SEO point of view, it can be quite useful as it prevents your PageRank from leaking to all the links on your pages.</p>
<p>To achieve that goal, we&#8217;ll meet some vicious problems. Let&#8217;s start with a ready function, to preserve the time of the smartests among you :</p>
<pre>
function addAttribute($context, $tag, $attribute, $value)
{
	$initialEncoding = mb_detect_encoding($context);
	if( $initialEncoding != 'UTF-8' ){
		$context = utf8_encode($context);
	}

	$doc = new DOMDocument("4.01", "utf-8");

	$contentPrefix = '&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"&gt;&lt;html&gt;&lt;head&gt;&lt;title&gt;required meta for utf-8 handling!&lt;/title&gt;&lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;/head&gt;&lt;body&gt;';
	$contentSuffix = '&lt;/body&gt;&lt;/html&gt;';

	$doc->loadHTML($contentPrefix . $context . $contentSuffix);

	$elements = $doc->getElementsByTagName($tag);

	if(!is_array($value)){
		$value = array($value);
	}

	foreach($elements as $element)
	{

		foreach($value as $currentValue)
		{
			$alreadySet = false;

			if($element->hasAttribute($attribute))
			{

				$attributeCurrentValue = $element->getAttribute($attribute);

				$attributeCurrentValues = explode(' ', $attributeCurrentValue);

				foreach( $attributeCurrentValues as $attributeCurrentValue )
				{
					if($attributeCurrentValue == $currentValue){
						$alreadySet = true;
					}
				}
				if(!$alreadySet){
					$element->setAttribute($attribute, implode(' ', $attributeCurrentValues) . ' ' . $currentValue);
				}
			} else {
				$element->setAttribute($attribute, $currentValue);
			}
		}
	}

	$output = mb_substr($doc->saveHTML(), 236, -16);

	if( $initialEncoding != 'UTF-8' ){
		mb_convert_encoding($output, $initialEncoding, 'UTF-8');
	}

	return $output;
}</pre>
<h3>Explanations</h3>
<pre>
$initialEncoding = mb_detect_encoding($context);
	if( $initialEncoding != 'UTF-8' ){
		$context = utf8_encode($context);
	}</pre>
<p>We start by detecting the encoding format currently used in the given context.<br />
We store it in order to give the feedback in the same format, and we convert it to UTF-8.<br />
Why UTF-8 ? This format as the (huge) advantage to handle all characters, including the accented or special ones from various languages.</p>
<pre>
	$doc = new DOMDocument("4.01", "utf-8");</pre>
<p>Then, we create a new DOM object, to whose constructor we pass two parameters : the version of the document we&#8217;re going to use (typically &#8220;1.0&#8243; for XML and &#8220;4.01&#8243; for HTML), and the charset of this document.</p>
<pre>
	$contentPrefix = '&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"&gt;&lt;html&gt;&lt;head&gt;&lt;title&gt;required meta for utf-8 handling!&lt;/title&gt;&lt;meta http-equiv="Content-Type" content="text/html; charset=utf-8"&gt;&lt;/head&gt;&lt;body&gt;';
	$contentSuffix = '&lt;/body&gt;&lt;/html&gt;';

	$doc->loadHTML($contentPrefix . $context . $contentSuffix);</pre>
<p>Next, we&#8217;re going to add a header to the given context. Indeed, even if we have defined the <em>expected</em> values for the given document, those will be overrideen by the document if these headers aren&#8217;t declared. And I can assure you that when we don&#8217;t know this, it&#8217;s a hair pulling scenario !! It is undoubtfully the most tricky part of the manipulation.<br />
We can then load the content in that DOM object.</p>
<pre>
	if(!is_array($value)){
		$value = array($value);
	}</pre>
<p>This function allows us to add several values to a given attribute. Therefore if the argument passed to the function is a string, we convert it to an array.</p>
<p>I won&#8217;t spend much time ont the actual function, which is quite explicit.<br />
Just note that we preserve the previous values by storing them in an array, and that we check if the attribute value already exists before adding the new value.</p>
<pre>
	$output = mb_substr($doc->saveHTML(), 236, -16);</pre>
<p>We&#8217;ll then save the result, and remove the prefix and suffix we&#8217;ve added with a function handling multibyte characters. Indeed, a classic substr wouldn&#8217;t wotk well with some characters, as we are here using UTF-8 which uses several bytes to store some of them.</p>
<pre>
	if( $initialEncoding != 'UTF-8' ){
		mb_convert_encoding($output, $initialEncoding, 'UTF-8');
	}

	return $output;</pre>
<p>It&#8217;s time to set the result back to its original charset and to return the result.</p>
<p>And&#8230; Voila! </p>

<div class="like">
<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fwww.lindicible.com%2Fblog%2Fen%2F2009%2F10%2F18%2Fdomdocument-et-utf-8-un-probleme-de-charset-en-php%2F&amp;layout=standard&amp;show_faces=true&amp;width=450&amp;action=like&amp;font=&amp;colorscheme=light" scrolling="no" frameborder="0" allowTransparency="true" style="border:none; overflow:hidden; width:450px; height:62px; "></iframe>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.lindicible.com/blog/en/2009/10/18/domdocument-et-utf-8-un-probleme-de-charset-en-php/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

