(PHP 4, PHP 5)
urlencode — URL-kodiert einen String
Die Funktion ist geeignet, wenn ein String innerhalb eines Queryparts eines URL verwendet werden soll, und man einen komfortablen Weg benötigt, Variablen an die nächste Seite zu übergeben.
Der zu kodierende String.
Gibt einen String zurück, in dem alle nicht-alphanumerischen Zeichen außer -_. durch ein Prozentzeichen (%) gefolgt von zwei Hexadezimalwerten und Leerzeichen durch ein Plus (+) ersetzt werden. Das Encoding geschieht auf dem gleichen Wege, wie auch durch ein WWW-Formular gepostete Daten kodiert werden - das entspricht der Auszeichnung des Dateityps application/x-www-form-urlencoded. Diese Auszeichnung differiert von der Kodierung nach » RFC 1738 (siehe auch rawurlencode()) dadurch, dass aus historischen Gründen das Leerzeichen als Pluszeichen (+) kodiert wird.
Beispiel #1 urlencode()-Beispiel
<?php
echo '<a href="mycgi?foo=', urlencode($userinput), '">';
?>
Beispiel #2 urlencode() und htmlentities()-Beispiel
<?php
$query_string = 'foo=' . urlencode($foo) . '&bar=' . urlencode($bar);
echo '<a href="mycgi?' . htmlentities($query_string) . '">';
?>
Hinweis:
Seien Sie vorsichtig beim Umgang mit Variablen, die HTML-Entities enthalten könnten. Angaben wie &, © und £ werden vom Browser geparst und die eigentliche Entität wird anstelle des gewünschten Variablennamens verwendet. Dies ist eine naheliegende Schwierigkeit, über die das W3C bereits seit Jahren informiert. Die entsprechende Referenz finden Sie hier: » http://www.w3.org/TR/html4/appendix/notes.html#h-B.2.2.
PHP unterstützt den Austausch des vom W3C empfohlenen Argument-Trenners Semikolon durch einen selbstgewählten Trenner über die arg_separator-Direktive in der php.ini. Leider senden die meisten User-Agents Formulardaten nicht im standardkonformen semikolongetrennten Format. Ein möglicher Weg, mit diesem Problem umzugehen, ist die Verwendung von & anstelle von & als Trennzeichen. Sie müssen dafür nicht die php.ini-Direktive arg_separator verändern. Belassen Sie sie als &, und verwenden Sie statt dessen für Ihre URLs die Funktionen htmlentities() oder htmlspecialchars().
To easily encode an array to an url :
<?php
$y= ("I"=>"like","she"=>"likes");
array_walk($y , create_function('&$v,$k', '$v = $k."=".$v ;'));
"index.php?".htmlentities(urlencode(implode("&",$y)), ENT_QUOTES);
?>
returns :
"index.php?I=like&she=likes"
=)
urlencode function and rawurlencode are mostly based on RFC 1738.
However, since 2005 the current RFC in use for URIs standard is RFC 3986.
Here is a function to encode URLs according to RFC 3986.
<?php
function myUrlEncode($string) {
$entities = array('%21', '%2A', '%27', '%28', '%29', '%3B', '%3A', '%40', '%26', '%3D', '%2B', '%24', '%2C', '%2F', '%3F', '%25', '%23', '%5B', '%5D');
$replacements = array('!', '*', "'", "(", ")", ";", ":", "@", "&", "=", "+", "$", ",", "/", "?", "%", "#", "[", "]");
return str_replace($entities, $replacements, urlencode($string));
}
?>
I wrote this simple function that creates a GET query (for URLS) from an array:
<?php
function encode_array($args)
{
if(!is_array($args)) return false;
$c = 0;
$out = '';
foreach($args as $name => $value)
{
if($c++ != 0) $out .= '&';
$out .= urlencode("$name").'=';
if(is_array($value))
{
$out .= urlencode(serialize($value));
}else{
$out .= urlencode("$value");
}
}
return $out . "\n";
}
?>
If there are arrays within the $args array, they will be serialized before being urlencoded.
Some examples:
<?php
echo encode_array(array('foo' => 'bar')); // foo=bar
echo encode_array(array('foo&bar' => 'some=weird/value')); // foo%26bar=some%3Dweird%2Fvalue
echo encode_array(array('foo' => 1, 'bar' => 'two')); // foo=1&bar=two
echo encode_array(array('args' => array('key' => 'value'))); // args=a%3A1%3A%7Bs%3A3%3A%22key%22%3Bs%3A5%3A%22value%22%3B%7D
?>
I needed a function in PHP to do the same job as the complete escape function in Javascript. It took me some time not to find it. But findaly I decided to write my own code. So just to save time:
<?php
function fullescape($in)
{
$out = '';
for ($i=0;$i<strlen($in);$i++)
{
$hex = dechex(ord($in[$i]));
if ($hex=='')
$out = $out.urlencode($in[$i]);
else
$out = $out .'%'.((strlen($hex)==1) ? ('0'.strtoupper($hex)):(strtoupper($hex)));
}
$out = str_replace('+','%20',$out);
$out = str_replace('_','%5F',$out);
$out = str_replace('.','%2E',$out);
$out = str_replace('-','%2D',$out);
return $out;
}
?>
It can be fully decoded using the unscape function in Javascript.
if one wants to encode a url that contains unicode characters, he can se this code:
<?php
function encode_full_url(&$url)
{
$url = urlencode($url);
$url = str_replace("%2F", "/", $url);
$url = str_replace("%3A", ":", $url);
return $url;
}
?>
a little stupid, but gets the work done.
I needed encoding and decoding for UTF8 urls, I came up with these very simple fuctions. Hope this helps!
<?php
function url_encode($string){
return urlencode(utf8_encode($string));
}
function url_decode($string){
return utf8_decode(urldecode($string));
}
?>
Don't use urlencode() or urldecode() if the text includes an email address, as it destroys the "+" character, a perfectly valid email address character.
Unless you're certain that you won't be encoding email addresses AND you need the readability provided by the non-standard "+" usage, instead always use use rawurlencode() or rawurldecode().
What about this one? A bit more complex but very practically,...
<?php
static function urlencode($url)
{
// safely cast back already encoded "&" within the query
$url = str_replace( "&","&",$url );
$phpsep = (strlen(ini_get('arg_separator.input')>0))
?ini_get('arg_separator.output')
:"&";
// cut optionally anchor
$ancpos = strrpos($url,"#");
$lasteq = strrpos($url,"=");
$lastsep = strrpos($url,"&");
$lastqsep = strrpos($url,"?");
$firstsep = strpos($url, "?");
// recognize wrong positioned anchor example.php#anchor?asdasd
if ($ancpos !== false
|| ($ancpos > 0
&& ($lasteq > 0 && $lasteq < $ancpos )
&& ($lastsep > 0 && $lastsep < $ancpos )
&& ($lastqsep > 0 && $lastqsep < $ancpos )
)
)
{
$anc = "#" . urlencode( substr( $url,$ancpos+1 ) );
$url = substr( $url,0,$ancpos );
}
else
{
$anc = "";
}
// separate uri and query string
if ($firstsep == false)
{
$qry = ""; // no query
$urlenc = $url.$anc; // anchor
}
else
{
$qry = substr( $url, $firstsep + 1 ) ;
$vals = explode( "&", $qry );
$valsenc = array();
foreach( $vals as $v )
{
$buf = explode( "=", $v );
$buf[0]=urlencode($buf[0]);
$buf[1]=urlencode($buf[1]);
$valsenc[] = implode("=",$buf);
}
$urlenc = substr( $url, 0 , $firstsep ); // encoded origin uri
$urlenc.= "?" . implode($phpsep, $valsenc ) // encoded query string
. $anc; // anchor
}
$urlenc = htmlentities( $urlenc, ENT_QUOTES );
return $urlenc;
}
?>
I read a UTF-8 encoded string form a mysql database and wanted to use it as a parameter for
a javascript function which opens a popup window (the HTML site also encoded as UTF-8).
Somehow I had serious trouble with strings containing special characters like ' and especially german umlaute.
When using urlencode(), the umlaute weren't correctly displayed in the popup with IE.
I tried many variations with urlencode(), rawurlencode(), addslashes(), ...,
but none of them was working in all browsers (Safari, FF, IE).
If someone is experiencing the same problem:
Here is the code I got it finally working with very well in all browsers...
<?php
echo '<a href="javascript: popup_textanswer(\''.
htmlentities(addslashes($act_question['questiontext']),
ENT_QUOTES,'UTF-8').
'\');">question</a>';
?>
the javascript function:
<?php // actually javascript
function popup_textanswer(questiontext) {
var win = window.open('popup.php?id=1&questiontext=' +
questiontext, 'popup', 'width=600,height=400');
win.focus();
}
?>
When using XMLHttpRequest or another AJAX technique to submit data to a PHP script using GET (or POST with content-type header set to 'x-www-form-urlencoded') you must urlencode your data before you upload it. (In fact, if you don't urlencode POST data MS Internet Explorer may pop a "syntax error" dialog when you call XMLHttpRequest.send().) But, you can't call PHP's urlencode() function in Javascript! In fact, NO native Javascript function will urlencode data correctly for form submission. So here is a function to do the job fairly efficiently:
<?php /******
<script type="text/javascript" language="javascript1.6">
// PHP-compatible urlencode() for Javascript
function urlencode(s) {
s = encodeURIComponent(s);
return s.replace(/~/g,'%7E').replace(/%20/g,'+');
}
// sample usage: suppose form has text input fields for
// country, postcode, and city with id='country' and so-on.
// We'll use GET to send values of country and postcode
// to "city_lookup.php" asynchronously, then update city
// field in form with the reply (from database lookup)
function lookup_city() {
var elm_country = document.getElementById('country');
var elm_zip = document.getElementById('postcode');
var elm_city = document.getElementById('city');
var qry = '?country=' + urlencode(elm_country.value) +
'&postcode=' + urlencode(elm_zip.value);
var xhr;
try {
xhr = new XMLHttpRequest(); // recent browsers
} catch (e) {
alert('No XMLHttpRequest!');
return;
}
xhr.open('GET',('city_lookup.php'+qry),true);
xhr.onreadystatechange = function(){
if ((xhr.readyState != 4) || (xhr.status != 200)) return;
elm_city.value = xhr.responseText;
}
xhr.send(null);
}
</script>
******/ ?>
Regarding issues with %2Fs for slashes in encoded URLs, you simply need to enable the AllowEncodedSlashes directive in Apache:
http://httpd.apache.org/docs/2.2/mod/core.html#allowencodedslashes
Hope that helps
I'm running PHP version 5.0.5 and urlencode() doesn't seem to encode the "#" character, although the function's description says it encodes "all non-alphanumeric" characters. This was a particular problem for me when trying to open local files with a "#" in the filename as Firefox will interpret this as an anchor target (for better or worse). It seems a manual str_replace is required unless this was fixed in a future PHP version.
Example:
$str = str_replace("#", "%23", $str);
>> Hi muthuishere , i saw your excellent contribution, but couldnt make it work, so i corrected some bits and pieces and had the following done:
<?php
function SmartUrlEncode($url){
if (strpos($url, '=') === false):
return $url;
else:
$startpos = strpos($url, "?");
$tmpurl=substr($url, 0 , $startpos+1) ;
$qryStr=substr($url, $startpos+1 ) ;
$qryvalues=explode("&", $qryStr);
foreach($qryvalues as $value):
$buffer=explode("=", $value);
$buffer[1]=urlencode($buffer[1]);
endforeach;
$finalqrystr=implode("&", &$qryvalues);
$finalURL=$tmpurl . $finalqrystr;
return $finalURL;
endif;
}
?>
As you see its very much yours, modfied primarily using '&' instead of '&', and ofcourse an if test to see if anything in input is to be cursed... Thanks for great function !
Most of us may need a function
if they have entire URL but you need to be encoding only the query values , not the URL and not the parameters
The below function takes an URL as input and applies url encoding only to the parameter values
/******************************************
For eg www.google.com/?q=alka rani&start=100
urlencode
==> www.google.com/search?q=alka+rani&start=100
if you change it to rawurlencode
==> www.google.com/search?q=alka%20rani&start=100
**************************************/
function SmartUrlEncode($url){
//Extract the Querystr pos after ? mark
$startpos = stripos($url, "?");
//Extract the URl alone
$tmpurl=substr($url, 0 , $startpos+1) ;
//echo $tmpurl . "<br>";
//Extract the Querystr alone
$qryStr=substr($url, $startpos+1 ) ;
//echo $qryStr . "<br>";
//Split the querystring into & pairs
$qryvalues=explode("&", $qryStr);
foreach($qryvalues as &$value)
{
//Split the single data into two i.e data | value
$buffer=explode("=", $value);
// Urlencode only the value now , Change it to rawurlencode if necessary
$buffer[1]=urlencode($buffer[1]);
//Join values back in the array
$value = implode("=", $tmp);
}
//Output Querystr ,Join all the pairs with &
$finalqrystr=implode("&", $qryvalues);
// echo $finalqrystr . "<br>";
$finalURL=$tmpurl . $finalqrystr;
//echo $finalURL . "<br>";
return finalURL;
}
}
Kerdster's function works like a charm. It has only a minor beauty flaw in my humble opinion: It encodes every character, even the plain ascii ones. This just doesn't like nice in the browsers address bar. ;-)
Inspired by Mkaganer's utf8_urldecode example in urldecode comments here's the enhanced code:
<?php
function utf16_urlencode ( $str ) {
# convert characters > 255 into HTML entities
$convmap = array( 0xFF, 0x2FFFF, 0, 0xFFFF );
$str = mb_encode_numericentity( $str, $convmap, "UTF-8");
# escape HTML entities, so they are not urlencoded
$str = preg_replace( '/&#([0-9a-fA-F]{2,5});/i', 'mark\\1mark', $str );
$str = urlencode($str);
# now convert escaped entities into unicode url syntax
$str = preg_replace( '/mark([0-9a-fA-F]{2,5})mark/i', '%u\\1', $str );
return $str;
}
?>
Probably the above code could be optimized further, comments are highly appreciated!
Thanks, Simon
> php dot net at samokhvalov dot com
> 12-Dec-2006 09:49
Thanx for idea!
I have wrote more simple function based on your function to simulate JS function escape (); It uses mb_string functions unstead of iconv.
<?php
function utf16urlencode($str)
{
$str = mb_convert_encoding($str, 'UTF-16', 'UTF-8');
$out = '';
for ($i = 0; $i < mb_strlen($str, 'UTF-16'); $i++)
{
$out .= '%u'.bin2hex(mb_substr($str, $i, 1, 'UTF-16'));
}
return $out;
}
?>
Some people have difficulties with all urlencode and so on solutions. So I decided to solve using base64_encode several times for more security from this way:
//- First page:
<?
$url='mypage.php';
?>
<a href="index.php?page=<? echo encode($url,5); ?>">My page</a>
//- Second page:
<?
$mypage=$_GET['page'];
$mypage=decode($mypage,5);
echo file_get_contents($mypage);
?>
*file_get_contents could not run your php scripts see same function
-----------
function encode($ss,$ntime){
for($i=0;$i<$ntime;$i++){
$ss=base64_encode($ss);
}
retrun $ss;
}
function decode($ss,$ntime){
for($i=0;$i<$ntime;$i++){
$ss=base64_decode($ss);
}
retrun $ss;
}
As a reply to: mmj48.com
Your method of replacing just the slash would be BAD practice... UNLESS, it was used STRICTLY on the PATH part of the URL.
You must account for the URLQUERIES, but also the scheme, user, password, and fragment characters (:, /, &, #, etc)
However, these may change depending on the environment (mainly refering to the &, query var separator)
Escaping each of these would also be a bad practice, and impractical. Rather, build a class / tool which will generate your URL's, and render escapes. You could also use the PHP routine: parse_url() for some interesting results.
Reply to 'peter at mailinator dot com'
If you are having problems using urldecode in PHP following the escape() function in Javascript, try to do a decodeURI() before the escape(). This fixed it for me at least.
Thomas
What I use instead:
<?php
function escape($url)
{
return str_replace("%2F", "/", urlencode($url));
}
?>
Like "Benjamin dot Bruno at web dot de" earlier has writen, you can have problems with encode strings with special characters to flash. Benjamin write that:
<?php
function flash_encode ($input)
{
return rawurlencode(utf8_encode($input));
}
?>
... could do the problem. Unfortunately flash still have problems with read some quotations, but with this one:
<?php
function flash_encode($string)
{
$string = rawurlencode(utf8_encode($string));
$string = str_replace("%C2%96", "-", $string);
$string = str_replace("%C2%91", "%27", $string);
$string = str_replace("%C2%92", "%27", $string);
$string = str_replace("%C2%82", "%27", $string);
$string = str_replace("%C2%93", "%22", $string);
$string = str_replace("%C2%94", "%22", $string);
$string = str_replace("%C2%84", "%22", $string);
$string = str_replace("%C2%8B", "%C2%AB", $string);
$string = str_replace("%C2%9B", "%C2%BB", $string);
return $string;
}
?>
... should solve this problem.
I had difficulties with all above solutions. So I applied a dirty simple solution by using:
base64_encode($param)
and
base64_decode($param)
The string's length is a bit longer but no more problem with encoding.
quote: "Apache's mod_rewrite and mod_proxy are unable to handle urlencoded URLs properly - http://issues.apache.org/bugzilla/show_bug.cgi?id=34602"
The most simple solution is to use urlencode twice!
echo urlencode(urlencode($var));
Apache's mod_rewrite will handle it like a normal string using urlencode once.
kL's example is very bugged since it loops itself and the encode function is two-way.
Why do you replace all %27 through ' in the same string in that you replace all ' through %27?
Lets say I have a string: Hello %27World%27. It's a nice day.
I get: Hello Hello 'World'. It%27s a nice day.
With other words that solution is pretty useless.
Solution:
Just replace ' through %27 when encoding
Just replace %27 through ' when decoding. Or just use url_decode.
Another thing to keep in mind is that urlencode is not unicode.
For example, urlencoding enquête from an UTF-8 project will produce enqu%C3%AAte.
However, urlencode(utf8_decode('enquête')) produces enqu%EAte, like expected.
Addition to the previous note:
to make it work on *nix systems (where big-endian byte order in UTF-16 is being used, in contrast to WIN32) add following lines right after the second iconv():
if (strtoupper(substr(PHP_OS, 0, 3)) !== 'WIN') {
$b = $a;
$a[1] = $b[0];
$a[0] = $b[1];
}
In AJAX era you might need to use UCS-2 (UTF-16) url-encoding (chars represented in form '%uXXXX' - e.g. '%u043e' for Russian 'o'). But PHP is weak in working with multibyte encoded strings, so you cannot simply use urlencode() for the string in UCS-2. Here is simple function serving for this purpose.
Note, that this function takes UTF8-encoded string as input and, then, for internal purposes use some 1-byte encoding (cp1251 in my case). If you have the string in some 1-byte encoding, you may remove the first iconv() and modify the second one and thus slightly simplify the function.
function utf16urlencode($str)
{
$str = iconv("utf-8", "cp1251", $str);
$res = "";
for ($i = 0; $i < strlen($str); $i++) {
$res .= "%u";
$a = iconv("cp1251", "ucs-2", $str[$i]);
for ($j = 0; $j < strlen($a); $j++) {
$n = dechex(ord($a[$j]));
if (strlen($n) == 1) {
$n = "0$n";
}
$res .= $n;
}
}
return $res;
}
If you need to prepare strings with special characters (like German Umlaut characters) in order to import them into flash files via GET, try using utf8_encode and rawurlencode sequentially, like this:
<?php
function flash_encode ($input) {
return rawurlencode(utf8_encode($input));
}
?>
Thus, you can avoid having use encodeURI in JavaScript, which is only availabe in JavaScript 1.5.
Apache's mod_rewrite and mod_proxy are unable to handle urlencoded URLs properly - http://issues.apache.org/bugzilla/show_bug.cgi?id=34602
If you need to use any of these modules and handle paths that contain %2F or %3A (and few other encoded special url characters), you'll have use a different encoding scheme.
My solution is to replace "%" with "'".
<?php
function urlencode($u)
{
return str_replace(array("'",'%'),array('%27',"'"),urlencode($u));
}
function urldecode($u)
{
return urldecode(strtr($u,"'",'%'));
}
?>
I think this was mentioned earlier but it was confusing.. But I had some problems with the urlencode eating my '/' so I did a simple str_replace like the following:
$url = urlencode($img);
$img2 = "$url";
$img2 = str_replace('%2F54', '/', $img2);
$img2 = str_replace('+' , '%20' , $img2);
You don't need to replace the '+' but I just feel comfortable with my %20, although it may present a problem if whatever you're using the str_replace for has a '+' in it where it shouldn't be.
But that fixed my problem.. all the other encodes like htmlentities and rawurlencode just ate my /'s
Be carefull when using this function with JavaScript escape function.
In JavaScript when you try to encode utf-8 data with escape function you will get some strange encoded string which you wont be able to decode with php url(de)encode funtions.
I found a website which has some very good tool regarding this problem: http://www.webtoolkit.info/
It has components which deal with url (en)decode.
<?// urlencode + urldecode 4 Linux/Unix-Servers:=============
//==================================================
//=====This small script matches all encoded String for ========
//=====Linux/Unix-Servers For IIS it got to be The Other Way ==
//===== around...and remember in a propper Url =============
//===== there shoudn't be the 'dirty Letter': %C3==============
//==================================================
function int2hex($intega){
$Ziffer = "0123456789ABCDEF";
return $Ziffer[($intega%256)/16].$Ziffer[$intega%16];
}
function url_decode($text){
if(!strpos($text,"%C3"))
for($i=129;$i<255;$i++){
$in = "%".int2hex($i);
$out = "%C3%".int2hex($i-64);
$text = str_replace($in,$out,$text);
}
return urldecode($text);
}
function url_encode($text){
$text = urlencode($text);
if(!strpos($text,"%C3"))
for($i=129;$i<255;$i++){
$in = "%".int2hex($i);
$out = "%C3%".int2hex($i-64);
$text = str_replace($in,$out,$text);
}
return $text;
}//==================================================
?>
This very simple function makes an valid parameters part of an URL, to me it looks like several of the other versions here are decoding wrongly as they do not convert & seperating the variables into &.
$vars=array('name' => 'tore','action' => 'sell&buy');
echo MakeRequestUrl($vars);
/* Makes an valid html request url by parsing the params array
* @param $params The parameters to be converted into URL with key as name.
*/
function MakeRequestUrl($params)
{
$querystring=null;
foreach ($params as $name => $value)
{
$querystring=$name.'='.urlencode($value).'&'.$querystring;
}
// Cut the last '&'
$querystring=substr($querystring,0,strlen($querystring)-1);
return htmlentities($querystring);
}
Will output: action=sell%26buy&name=tore
I rewrote inus at flowingcreativity dot net function to generate an encoded url string from the POST, or GET array. It handles properly POST/GET array vars.
function _HTTPRequestToString($arr_request, $var_name, $separator='&') {
$ret = "";
if (is_array($arr_request)) {
foreach ($arr_request as $key => $value) {
if (is_array($value)) {
if ($var_name) {
$ret .= $this->_HTTPRequestToString($value, "{$var_name}[{$key}]", $separator);
} else {
$ret .= $this->_HTTPRequestToString($value, "{$key}", $separator);
}
} else {
if ($var_name) {
$ret .= "{$var_name}[{$key}]=".urlencode($value)."&";
} else {
$ret .= "{$key}=".urlencode($value)."&";
}
}
}
}
if (!$var_name) {
$ret = substr($ret,0,-1);
}
return $ret;
}
Just remember that according to W3C standards, you must rawurlencode() the link that's provided at the end of a mailto.
i.e.
<a href="mailto:jdoe@some.where.com?Subject=Simple testing(s)&bcc=jane@some.where.com">Mail Me</a>
Needs to be escaped (which rawurlencode() does for us).
The colon is OK after "mailto", as is the "@" after the e-mail name.
However, the rest of the URL needs to be encoded, replacing the following:
'?' => %3F
'=' => %3D
' ' => %20
'(' => %28
')' => %29
'&' => %26
'@' => %40 (note this one is in 'jane@some.where.com'
I tried to post the note with the correct text (that is the characters replaced in the note), but it said that there was a line that was too long, and so wouldn't let me add the note.
As a secondary note, I noticed that the auto-conversion routines at this site itself stopped the link at the space after "Simple testing(s)' in the first entry shown above.
Constructing hyperlinks safely HOW-TO:
<?php
$path_component = 'machine/generated/part';
$url_parameter1 = 'this is a string';
$url_parameter2 = 'special/weird "$characters"';
$url = 'http://example.com/lab/cgi/test/'. rawurlencode($path_component) . '?param1=' . urlencode($url_parameter1) . '¶m2=' . urlencode($url_parameter2);
$link_label = "Click here & you'll be <happy>";
echo '<a href="', htmlspecialchars($url), '">', htmlspecialchars($link_label), '</a>';
?>
This example covers all the encodings you need to apply in order to create URLs safely without problems with any special characters. It is stunning how many people make mistakes with this.
Shortly:
- Use urlencode for all GET parameters (things that come after each "=").
- Use rawurlencode for parts that come before "?".
- Use htmlspecialchars for HTML tag parameters and HTML text content.
Do not let the browser auto encode an invalid URL. Not all browsers perform the same encodeing. Keep it cross browser do it server side.
Diferrent from the above example you do not have to encode URLs in hrefs with this. The browser does it automaticaly, so you just have to encode it with htmlentities() ;)
I just came across the need for a function that exports an array into a query string. Being able to use urlencode($theArray) would be nice, but here's what I came up with:
<?php
function urlencode_array(
$var, // the array value
$varName, // variable name to be used in the query string
$separator = '&' // what separating character to use in the query string
) {
$toImplode = array();
foreach ($var as $key => $value) {
if (is_array($value)) {
$toImplode[] = urlencode_array($value, "{$varName}[{$key}]", $separator);
} else {
$toImplode[] = "{$varName}[{$key}]=".urlencode($value);
}
}
return implode($separator, $toImplode);
}
?>
This function supports n-dimensional arrays (it encodes recursively).
I was testing my input sanitation with some strange character entities. Ones like î and Ç were passed correctly and were in their raw form when I passed them through without any filtering.
However, some weird things happen when dealing with characters like (these are HTML entities): ‼ ▐ ┐and Θ have weird things going on.
If you try to pass one in Internet Explorer, IE will *disable* the submit button. Firefox, however, does something weirder: it will convert it to it's HTML entity. It will display properly, but only when you don't convert entities.
The point? Be careful with decorative characters.
PS: If you try copy/pasting one of these characters to a TXT file, it will translate to a ?.
The information on this page is misleading in that you might think the ampersand (&) will only need to be escaped as & when there is ambiguity with an existing character entity. This is false; the W3C page linked to from here clarifies that the ampersands must ALWAYS be escaped.
The following:
<a href='/script.php?variable1=value1&variable2=value2'>Link</a>
is INVALID HTML. It needs to be written as:
<a href='/script.php?variable1=value1&variable2=value2'>Link</a>
in order for the link to go to:
/script.php?variable1=value1&variable2=value2
I applaud the W3C's recommendation to use semicolons (';') instead of the ampersands, but it doesn't really change the fact that you still need to HTML-escape the value of all your HTML tag attributes. The following:
<span title='Rose & Mary'>Some text</span>
is also INVALID HTML. It needs to be escaped as:
<span title='Rose & Mary'>Some text</span>
---[ Editor's Note ]---
You can also use rawurlencode() here, and skip the functions provided in this note.
---[ /Editor's Nore]---
For handling slashes in redirections, (see comment from cameron at enprises dot com), try this :
function myurlencode ( $TheVal )
{
return urlencode (str_replace("/","%2f",$TheVal));
}
function myurldecode ( $TheVal )
{
return str_replace("%2f","/",urldecode ($TheVal));
}
This is effectively a double urlencode for slashes and single urlencode for everything else. So, it is more "standardised" than his suggestion of using a + sign, and more readable (and search engine indexable) than a full double encode/decode.
Be careful when encoding strings that came from simplexml in PHP 5. If you try to urlencode a simplexml object, the script tanks.
I got around the problem by using a cast.
$newValue = urlencode( (string) $oldValue );
If you want to pass a url with parameters as a value IN a url AND through a javascript function, such as...
<a href="javascript:openWin('page.php?url=index.php?id=4&pg=2');">
...pass the url value through the PHP urlencode() function twice, like this...
<?php
$url = "index.php?id=4&pg=2";
$url = urlencode(urlencode($url));
echo "<a href=\"javascript:openWin('page.php?url=$url');\">";
?>
On the page being opened by the javascript function (page.php), you only need to urldecode() once, because when javascript 'touches' the url that passes through it, it decodes the url once itself. So, just decode it once more in your PHP script to fully undo the double-encoding...
<?php
$url = urldecode($_GET['url']);
?>
If you don't do this, you'll find that the result url value in the target script is missing all the var=values following the ? question mark...
index.php?id=4
Just a simple comment, really, but if you need to encode apostrophes, you should be using rawurlencode as opposed to just urlencode.
Naturally, I figured that out the hard way.