(PHP 4, PHP 5)
ord — Gibt den ASCII-Wert eines Zeichens zurück
Gibt den ASCII-Wert des ersten Zeichens von string zurück.
Die Funktion ist das Gegenstück zu chr().
Ein Zeichen.
Gibt den ASCII-Wert als Integer zurück.
Beispiel #1 ord()-Beispiel
<?php
$str = "\n";
if (ord($str) == 10) {
echo "Das erste Zeichen von \$str ist ein Zeilenvorschub.\n";
}
?>
For anyone having trouble trying to detect the encoding of a string because PHP provides no easy way to see the characters (and byte values) of a string, here's a function that returns the characters and byte values for the ASCII and UTF-8 encodings:
<?php
function hex_chars($data) {
$mb_chars = '';
$mb_hex = '';
for ($i=0; $i<mb_strlen($data, 'UTF-8'); $i++) {
$c = mb_substr($data, $i, 1, 'UTF-8');
$mb_chars .= '{'. ($c). '}';
$o = unpack('N', mb_convert_encoding($c, 'UCS-4BE', 'UTF-8'));
$mb_hex .= '{'. hex_format($o[1]). '}';
}
$chars = '';
$hex = '';
for ($i=0; $i<strlen($data); $i++) {
$c = substr($data, $i, 1);
$chars .= '{'. ($c). '}';
$hex .= '{'. hex_format(ord($c)). '}';
}
return array(
'data' => $data,
'chars' => $chars,
'hex' => $hex,
'mb_chars' => $mb_chars,
'mb_hex' => $mb_hex,
);
}
function hex_format($o) {
$h = strtoupper(dechex($o));
$len = strlen($h);
if ($len % 2 == 1)
$h = "0$h";
return $h;
}
?>
Make sure that the parameter you are passing to the ord function is a string.
<?php
$num = 12345;
// Incorrect usage of square bracket notation
if(ord($num[0]) == 0) {
echo "Not a valid ASCII character";
}
// Using the substr method will account for any data type
if(ord(substr($num,0,1)) == 0) {
echo "Not a valid ASCII character";
}
?>
For people who are trying to create a uniord() function.
Why reinventing the wheel, if there is an excellent implementation of utf8/unicode conversion here:
http://iki.fi/hsivonen/php-utf8/
Here's my take on an earlier-posted UTF-8 version of ord, suitable for iterating through a string by Unicode value. The function can optionally take an index into a string, and optionally return the number of bytes consumed by a character so that you know how much to increment the index to get to the next character.
<?php
function ordUTF8($c, $index = 0, &$bytes = null)
{
$len = strlen($c);
$bytes = 0;
if ($index >= $len)
return false;
$h = ord($c{$index});
if ($h <= 0x7F) {
$bytes = 1;
return $h;
}
else if ($h < 0xC2)
return false;
else if ($h <= 0xDF && $index < $len - 1) {
$bytes = 2;
return ($h & 0x1F) << 6 | (ord($c{$index + 1}) & 0x3F);
}
else if ($h <= 0xEF && $index < $len - 2) {
$bytes = 3;
return ($h & 0x0F) << 12 | (ord($c{$index + 1}) & 0x3F) << 6
| (ord($c{$index + 2}) & 0x3F);
}
else if ($h <= 0xF4 && $index < $len - 3) {
$bytes = 4;
return ($h & 0x0F) << 18 | (ord($c{$index + 1}) & 0x3F) << 12
| (ord($c{$index + 2}) & 0x3F) << 6
| (ord($c{$index + 3}) & 0x3F);
}
else
return false;
}
?>
The following uniord function is simpler and more efficient than any of the ones suggested without depending on mbstring or iconv. It's also more validating (code points above U+10FFFF are invalid; sequences starting with 0xC0 and 0xC1 are invalid overlong encodings of characters below U+0080), though not entirely validating, so it still assumes proper input.
<?php
function uniord($c) {
$h = ord($c{0});
if ($h <= 0x7F) {
return $h;
} else if ($h < 0xC2) {
return false;
} else if ($h <= 0xDF) {
return ($h & 0x1F) << 6 | (ord($c{1}) & 0x3F);
} else if ($h <= 0xEF) {
return ($h & 0x0F) << 12 | (ord($c{1}) & 0x3F) << 6
| (ord($c{2}) & 0x3F);
} else if ($h <= 0xF4) {
return ($h & 0x0F) << 18 | (ord($c{1}) & 0x3F) << 12
| (ord($c{2}) & 0x3F) << 6
| (ord($c{3}) & 0x3F);
} else {
return false;
}
}
?>
If you want a fully validating function, you'll need to check the extra octets are between 0x80 and 0xBF, and there are no overlong encodings (characters below a certain point in n-octet sequences).
An update for ng4rrjanbiah's uniord() in #46267. Again, no mbstring requirement - this is just a code cleanup. Longer, but prettier - I think, anyway ;)
<?php
function uniord($ch) {
$n = ord($ch{0});
if ($n < 128) {
return $n; // no conversion required
}
if ($n < 192 || $n > 253) {
return false; // bad first byte || out of range
}
$arr = array(1 => 192, // byte position => range from
2 => 224,
3 => 240,
4 => 248,
5 => 252,
);
foreach ($arr as $key => $val) {
if ($n >= $val) { // add byte to the 'char' array
$char[] = ord($ch{$key}) - 128;
$range = $val;
} else {
break; // save some e-trees
}
}
$retval = ($n - $range) * pow(64, sizeof($char));
foreach ($char as $key => $val) {
$pow = sizeof($char) - ($key + 1); // invert key
$retval += $val * pow(64, $pow); // dark magic
}
return $retval;
}
?>
I found I wanted to sanitize a string for certain ASCII/ANSI characters, but to leave unicode alone. Since ord() breaks on processing unicode, I drew these two functions up to help with a saniziter which looked at ordinal values. (Finding "pack" and "unpack" was much better than my own powers-of-256 code.)
<?php
/*
By Darien Hager, Jan 2007... Use however you wish, but please
please give credit in source comments.
Change "UTF-8" to whichever encoding you are expecting to use.
*/
function ords_to_unistr($ords, $encoding = 'UTF-8'){
// Turns an array of ordinal values into a string of unicode characters
$str = '';
for($i = 0; $i < sizeof($ords); $i++){
// Pack this number into a 4-byte string
// (Or multiple one-byte strings, depending on context.)
$v = $ords[$i];
$str .= pack("N",$v);
}
$str = mb_convert_encoding($str,$encoding,"UCS-4BE");
return($str);
}
function unistr_to_ords($str, $encoding = 'UTF-8'){
// Turns a string of unicode characters into an array of ordinal values,
// Even if some of those characters are multibyte.
$str = mb_convert_encoding($str,"UCS-4BE",$encoding);
$ords = array();
// Visit each unicode character
for($i = 0; $i < mb_strlen($str,"UCS-4BE"); $i++){
// Now we have 4 bytes. Find their total
// numeric value.
$s2 = mb_substr($str,$i,1,"UCS-4BE");
$val = unpack("N",$s2);
$ords[] = $val[1];
}
return($ords);
}
?>
Well, i was thinking about a method to hash a string with md5 in a loose way, so md5("HELLO") isn't the same like md5("Hello"), even, i my case, it is about cd-titles i got submitted by users. So i made some function transforming my string to right what i want
Thisone is the "call" function returning the "loose hash".
It will get only the chars of a string, make them to uppercase and then hash with md5.
function loosehash($string){
return md5(strtoupper(onlyChars($string)));
}
Thisone is moving through a string like a chararray and check for the asciivalues, you can edit the values and condition to fit your needs
function onlyChars($string){
$strlength = strlen($string);
$retString = "";
for($i = 0; $i < $strlength; $i++){
if((ord($string[$i]) >= 48 && ord($string[$i]) <= 57) ||
(ord($string[$i]) >= 65 && ord($string[$i]) <= 90) ||
(ord($string[$i]) >= 97 && ord($string[$i]) <= 122)){
$retString .= $string[$i];
}
}
echo $retString;
}
Little improvement to v0rbiz at yahoo dot com function:
<?php
function uniord($u) {
$c = unpack("N", mb_convert_encoding($u, 'UCS-4BE', 'UTF-8'));
return $c[1];
}
?>
However you still need mbstring.
You can use the following function to generate a random string between the lengths of $x and $y...
$x = 1; //minimum length
$y = 10; //maximum length
$len = rand($x,$y); //get a random string length
for ($i = 0; $i < $len; $i++) { //loop $len no. of times
$whichChar = rand(1,3); //choose if its a caps, lcase or num
if ($whichChar == 1) { //it's a number
$string .= chr(rand(48,57)); //randomly generate a num
}
elseif ($whichChar == 2) { //it's a small letter
$string .= chr(rand(65,90)); //randomly generate an lcase
}
else { //it's a capital letter
$string .= chr(rand(97,122)); //randomly generate a ucase
}
}
echo $string; //echo out the generated string
I have a new characters table. i want send it below that.
<?php
$color = "#f1f1f1";
echo "<center>";
echo "<h1>From 32 To 255 Characters Table</h1>";
echo "</center>";
echo "<table border=\"0\" style=\"font-family:verdana;font-size:11px;\"".
" align=\"center\" width=\"800\"><tr style=\"font-weight:bold;\" ".
"bgcolor=\"#99cccc\">".
"<td width=\"15\">DEC</td><td width=\"15\">OCT</td>".
"<td width=\"15\">HEX</td><td width=\"15\">CHR</td>".
"<td width=\"15\">DEC</td><td width=\"15\">OCT</td>".
"<td width=\"15\">HEX</td><td width=\"15\">CHR</td>".
"<td width=\"15\">DEC</td><td width=\"15\">OCT</td>".
"<td width=\"15\">HEX</td><td width=\"15\">CHR</td>".
"<td width=\"15\">DEC</td><td width=\"15\">OCT</td>".
"<td width=\"15\">HEX</td><td width=\"15\">CHR</td>".
"<td width=\"15\">DEC</td><td width=\"15\">OCT</td>".
"<td width=\"15\">HEX</td><td width=\"15\">CHR</td>".
"<td width=\"15\">DEC</td><td width=\"15\">OCT</td>".
"<td width=\"15\">HEX</td><td width=\"15\">CHR</td>".
"</tr><tr>";
$ii = 0;
for ($i=32;$i<=255;$i++){
$char = chr($i);
$dec = ord($char);
if ($i == "32") {
$char = "Space";
}
echo "<td style=\"background-color:$color;width:15px;\">".
$dec."</td>\n<td style=\"background-color:$color;".
"width:15px;text-align:left;\">".decoct($dec)."</td>\n".
"<td style=\"background-color:$color;width:15px;".
"text-align:left;\">".dechex($dec)."</td>\n ".
"<td style=\"background-color:$color;width:15px;".
"text-align:left;\"><b>".$char."</b></td>\n ";
if (($ii % 6) == 5) {
echo "</tr>\n<tr>\n";
}
if (($ii % 2) == 1) {
$color = "#f1f1f1";
}else {
$color = "#ffffcc";
}
$ii++;
}
echo "</tr></table>";
?>
i was happy to find matthews function for replacing those nasty word copy-n-paste characters, but i did have problems using it, since it is not very performant, when it comes to cleaning large amounts of text.
therefore i implemented this function as a replacement, which basically does the same job, but only uses one call to preg_replace - for whom it may concern :)
note: the \xnn values are hex-values of the according ascii-codes, the following implementation matched my needs - feel free to correct me
function clean_string_input($input)
{
$search = array(
'/[\x60\x82\x91\x92\xb4\xb8]/i', // single quotes
'/[\x84\x93\x94]/i', // double quotes
'/[\x85]/i', // ellipsis ...
'/[\x00-\x0d\x0b\x0c\x0e-\x1f\x7f-\x9f]/i' // all other non-ascii
);
$replace = array(
'\'',
'"',
'...',
''
);
return preg_replace($search,$replace,$input);
}
I wrote the following function to clean illegal characters from input strings.
(Background: I have a php-based news website. People were writing articles in MS Word, then copy-and-pasting the text into the website. Word uses non-standard characters for opening and closing quotes and double-quotes, and for "..." - and this was resulting in articles on the website that failed XHTML validation)
<?php
function clean_string_input($input)
{
$interim = strip_tags($input);
if(get_magic_quotes_gpc())
{
$interim=stripslashes($interim);
}
// now check for pure ASCII input
// special characters that might appear here:
// 96: opening single quote (not strictly illegal, but substitute anyway)
// 145: opening single quote
// 146: closing single quote
// 147: opening double quote
// 148: closing double quote
// 133: ellipsis (...)
// 163: pound sign (this is safe, so no substitution required)
// these can be substituted for safe equivalents
$result = '';
for ($i=0; $i<strlen($interim); $i++)
{
$char = $interim{$i};
$asciivalue = ord($char);
if ($asciivalue == 96)
{
$result .= '\\'';
}
else if (($asciivalue > 31 && $asciivalue < 127) ||
($asciivalue == 163) || // pound sign
($asciivalue == 10) || // lf
($asciivalue == 13)) // cr
{
// it's already safe ASCII
$result .= $char;
}
else if ($asciivalue == 145) // opening single quote
{
$result .= '\\'';
}
else if ($asciivalue == 146) // closing single quote
{
$result .= '\\'';
}
else if ($asciivalue == 147) // opening double quote
{
$result .= '"';
}
else if ($asciivalue == 148) // closing double quote
{
$result .= '"';
}
else if ($asciivalue == 133) // ellipsis
{
$result .= '...';
}
}
return $result;
}
?>
For getting UTF16 value of Chinese ideographs:
PLEASE ALWAYS use the uniord() ng4rrjanbiah at rediffmail dot com (#46267) but do not use #42778's method.
native mbstring has problem with Chinese charset. For simplified Chinese it only (afaik) can deal with characters with GB2312 charset, that means a lot of person names will fail.
for example, if you pass U+97E1 to uniord as provided by #42778 you will get value is 'E1', this is incorrect. the '97' part is ignored by mbstring when doing conversion.
another suggestion is never use mb_convert_encoding for simplified Chinese at all. Use iconv instead. I am the one in greater trouble with it because my own name gets lots everytime mbstring wishing to convert something.
A function to convert a unicode-string to ascii-string of the Vietnamese langague, according to The VIQR Convention (http://www.vietstd.org/). If you have the same kind of convention just create the arrays. Contents of my arrays were to large to post here.
//ascii-sign-characters
$mapsigns2 =Array("sign");
//ascii-sign-characters
$mapsigns1 =Array("sign");
//ascii-letters-characters
$mapascii =Array("letters");
//unicode characters
$mapunicode = Array("unicode");
function uniword2ascii($str)
{
global $mapsigns1, $mapsigns2,$mapunicode, $mapascii;
$length = strlen($str);
$ReturnStr = "";
for ($i=0; $i<$length; $i++)
{
$uniord = 0;
if (ord($str{$i})>=0 && ord($str{$i})<=127)
$uniord = ord($str{$i});
elseif (ord($str{$i})>=192 && ord($str{$i})<=223)
{
$uniord = (ord($str{$i})-192)*64 + (ord($str{$i+1})-128);
$i = $i+1;
}
elseif (ord($str{$i})>=224 && ord($str{$i})<=239)
{
$uniord = (ord($str{$i})-224)*4096 + (ord($str{$i+1})-128)*64 + (ord($str{$i+2})-128);
$i = $i+2;
}
elseif (ord($str{$i})>=240 && ord($str{$i})<=247)
{
$uniord = (ord($str{$i})-240)*262144 + (ord($str{$i+1})-128)*4096 + (ord($str{$i+2})-128)*64 + (ord($str{$i+3})-128);
$i = $i+3;
}
elseif (ord($str{$i})>=248 && ord($str{$i})<=251)
{
$uniord = (ord($str{$i})-248)*16777216 + (ord($str{$i+1})-128)*262144 + (ord($str{$i+2})-128)*4096 + (ord($str{$i+3})-128)*64 + (ord($str{$i+4})-128);
$i = $i+4;
}
elseif (ord($str{$i})>=252 && ord($str{$i})<=253)
{
$uniord = (ord($str{$i})-252)*1073741824 + (ord($str{$i+1})-128)*16777216 + (ord($str{$i+2})-128)*262144 + (ord($str{$i+3})-128)*4096 + (ord($str{$i+4})-128)*64 + (ord($str{$i+5})-128);
$i = $i+5;
}
elseif (ord($str{$i})>=254 && ord($str{$i})<=255) //error
$uniord = 0;
//This part is for converting the string to a VIQR-string;
if ($uniord > 127 )
{
$key = array_search($uniord,$mapunicode);
if ($key)
{
$ReturnStr .= chr($mapascii[$key]) . chr($mapsigns1[$key]) . chr($mapsigns2[$key]);
}
else
{
$ReturnStr .= chr($uniord);
}
}
else
{
$ReturnStr .= chr($uniord);
}
//endOFfor
}
return $ReturnStr;
}
Best regards/Nguyen Van Nhu
The email-encoding function, a little extended:
# $strEmail. The E-mail address to encode.
# $strDisplay. What will be displayed in the browser. If omitted it takes the e-mail address as it's value.
# $blnCreateLink. Set to true to creates a link. Set to false (and omit $strDisplay) just displays the e-mail address.
function asciiEncodeEmail($strEmail,$strDisplay,$blnCreateLink) {
$strMailto = "mailto:";
for ($i=0; $i < strlen($strEmail); $i++) {
$strEncodedEmail .= "&#".ord(substr($strEmail,$i)).";";
}
if(strlen(trim($strDisplay))>0) {
$strDisplay = $strDisplay;
}
else {
$strDisplay = $strEncodedEmail;
}
if($blnCreateLink) {
return "<a href=\"".$strMailto.$strEncodedEmail."\">".$strDisplay."</a>";
}
else {
return $strDisplay;
}
}
#examples:
echo asciiEncodeEmail("yourname@yourdomain.com","Your Name",true);
echo "<br />"
echo asciiEncodeEmail("yourname@yourdomain.com","",true);
echo "<br />"
echo asciiEncodeEmail("yourname@yourdomain.com","",false);
Contrary to what jacobfri says below, ord does not use any particular character encoding. It simply converts bytes from a string type to the corresponding integer. These bytes could be characters in ASCII, or some other 7/8-bit code (e.g. ISO-8859-1), or bytes making up characters in some multibyte code (e.g. UTF-8). It all depends on what character encoding your application is using.
Function using ord() to strip out garbage characters and punctuation from a string. This is handy when trying to be smart about what an input is "trying" to be..
<?
function cleanstr($string){
$len = strlen($string);
for($a=0; $a<$len; $a++){
$p = ord($string[$a]);
# chr(32) is space, it is preserved..
(($p > 64 && $p < 123) || $p == 32) ? $ret .= $string[$a] : $ret .= "";
}
return $ret;
}
?>
[Fixed a bug in my previous note; ord() is missing in first condition]
uniord() function like "v0rbiz at yahoo dot com" (Note# 42778), but without using mbstring extension. Note: If the passed character is not valid, it may throw "Uninitialized string offset" notice (may set the error reporting to 0).
<?php
/**
* @Algorithm: http://www1.tip.nl/~t876506/utf8tbl.html
* @Logic: UTF-8 to Unicode conversion
**/
function uniord($c)
{
$ud = 0;
if (ord($c{0})>=0 && ord($c{0})<=127)
$ud = ord($c{0});
if (ord($c{0})>=192 && ord($c{0})<=223)
$ud = (ord($c{0})-192)*64 + (ord($c{1})-128);
if (ord($c{0})>=224 && ord($c{0})<=239)
$ud = (ord($c{0})-224)*4096 + (ord($c{1})-128)*64 + (ord($c{2})-128);
if (ord($c{0})>=240 && ord($c{0})<=247)
$ud = (ord($c{0})-240)*262144 + (ord($c{1})-128)*4096 + (ord($c{2})-128)*64 + (ord($c{3})-128);
if (ord($c{0})>=248 && ord($c{0})<=251)
$ud = (ord($c{0})-248)*16777216 + (ord($c{1})-128)*262144 + (ord($c{2})-128)*4096 + (ord($c{3})-128)*64 + (ord($c{4})-128);
if (ord($c{0})>=252 && ord($c{0})<=253)
$ud = (ord($c{0})-252)*1073741824 + (ord($c{1})-128)*16777216 + (ord($c{2})-128)*262144 + (ord($c{3})-128)*4096 + (ord($c{4})-128)*64 + (ord($c{5})-128);
if (ord($c{0})>=254 && ord($c{0})<=255) //error
$ud = false;
return $ud;
}
//debug
echo uniord('A'); //65
echo uniord("\xe0\xae\xb4"); //2996
?>
HTH,
R. Rajesh Jeba Anbiah
In reply to jacobfri, the range 128-255 equals to whatever character set you are using.
Try toying with header('Content-type: text/html; charset=iso-8859-1'); and replacing that iso-8859-1 with values like cp850, and suddenly your bytes might start looking surprisingly similar to the ones at www.asciitable.com.
Strings don't store glyphs. They only store bytes. It's the matter of your UA/terminal of deciding what alphabets those bytes are made to look like.
That's where the character set/encoding steps in.
Just to get things straight about which character table ord() and chr() uses.
The range 128-255 is _not_ equivalent with the widely used extended ASCII-table, like the one described in www.asciitable.com. The actual equivalent is the 128-255 range of Unicode.
That's a good thing, because then ord() and chr() is compatible with javascript, and any other language that uses Unicode.
But it's rather nice to know it, and the description of ord() is kind of misleading, when it only refers to www.asciitable.com.
I did not found a unicode/multibyte capable 'ord' function, so...
function uniord($u) {
$k = mb_convert_encoding($u, 'UCS-2LE', 'UTF-8');
$k1 = ord(substr($k, 0, 1));
$k2 = ord(substr($k, 1, 1));
return $k2 * 256 + $k1;
}
If you're looking to provide bare bones protection to email addresses posted to the web try this:
<?php
$string = 'arjini@mac.com';
for($i=0;$i<strlen($string);++$i){
$n = rand(0,1);
if($n)
$finished.='&#x'.sprintf("%X",ord($string{$i})).';';
else
$finished.='&#'.ord($string{$i}).';';
}
echo $finished;
?>
This randomly encodes a mix of hex and oridinary HTML entities for every character in the address. Note that a decoding mechanism for this could probably be written just as easily, so eventually the bots will be able to cut through this like butter, but for now, it seems like most harvesters are only looking for non-hex html entities.
In the notes for bin2hex, there is a function for hex encoding an email address in a "mailto" tag to avoid spam bots. The hex encoding works in the anchor tag itself, but not for the link text. To display the email address as the link text, you can use the function below to ascii encode the address and keep the spam bots at bay.
function ascii_encode($string) {
for ($i=0; $i < strlen($string); $i++) {
$encoded .= '&#'.ord(substr($string,$i)).';';
}
return $encoded;
}
The above comment about bindec is wrong, I think. bindec accepts a string containing a binary number, not a "binary string" - right?
We need to clean up this terminology.
If you just want to extract a dword/long int from a binary string, this is more accurate (intel endian):
$Number = ord($Buffer{0}) | (ord($Buffer{1})<<8) | (ord($Buffer{2})<<16) | (ord($Buffer{3})<<24);
If you find that you use this function a lot you really out to consider doing whatever you're doing in C instead. :-)
At least it takes more effort to type "ord($var)-ord('a')" than "var - 'a'".
But hey, you can't get everything you wish for, and at least PHP script very seldom manage to segfault. :-D
[Ed: bindec() does this for you... it only doesn't get the sign-bit. Your solution will result in a float with the sign is set!
http://www.php.net/bindec
--jeroen ]
Erm this one took me a while to work out, in the end a friend told me, if ur working out the value of an 32bit integer ($data) then this is 4 u ;-)
$num=ord($data[0])+(ord($data[1])<<8)+(ord($data[2])<<16)+(ord($data[3])<<24);
if ($num>=4294967294){
$num-=4294967296;
}