(PHP 4 >= 4.4.3, PHP 5 >= 5.1.3)
mb_check_encoding — Check if the string is valid for the specified encoding
Checks if the specified byte stream is valid for the specified encoding. It is useful to prevent so-called "Invalid Encoding Attack".
The byte stream to check. If it is omitted, this function checks all the input from the beginning of the request.
The expected encoding.
Gibt bei Erfolg TRUE zurück. Im Fehlerfall wird FALSE zurückgegeben.
In order to check if a string is encoded correctly in utf-8, I suggest the following function, that implements the RFC3629 better than mb_check_encoding():
<?php
function check_utf8($str) {
$len = strlen($str);
for($i = 0; $i < $len; $i++){
$c = ord($str[$i]);
if ($c > 128) {
if (($c > 247)) return false;
elseif ($c > 239) $bytes = 4;
elseif ($c > 223) $bytes = 3;
elseif ($c > 191) $bytes = 2;
else return false;
if (($i + $bytes) > $len) return false;
while ($bytes > 1) {
$i++;
$b = ord($str[$i]);
if ($b < 128 || $b > 191) return false;
$bytes--;
}
}
}
return true;
} // end of check_utf8
?>
This function does not check for bad byte sequence(s), it only checks if the byte stream is valid. If you want to verify a encoded string is valid, (IE: does not contain any bad byte sequences do the following...
<?php
/* check a strings encoded value */
function checkEncoding ( $string, $string_encoding )
{
$fs = $string_encoding == 'UTF-8' ? 'UTF-32' : $string_encoding;
$ts = $string_encoding == 'UTF-32' ? 'UTF-8' : $string_encoding;
return $string === mb_convert_encoding ( mb_convert_encoding ( $string, $fs, $ts ), $ts, $fs );
}
/* test 1 variables */
$string = "\x00\x81";
$encoding = "Shift_JIS";
/* test 1 mb_check_encoding (test for bad byte stream) */
if ( true === mb_check_encoding ( $string, $encoding ) )
{
echo 'valid (' . $encoding . ') encoded byte stream!<br />';
}
else
{
echo 'invalid (' . $encoding . ') encoded byte stream!<br />';
}
/* test 1 checkEncoding (test for bad byte sequence(s)) */
if ( true === checkEncoding ( $string, $encoding ) )
{
echo 'valid (' . $encoding . ') encoded byte sequence!<br />';
}
else
{
echo 'invalid (' . $encoding . ') encoded byte sequence!<br />';
}
/* test 2 */
/* test 2 variables */
$string = "\x00\xE3";
$encoding = "UTF-8";
/* test 2 mb_check_encoding (test for bad byte stream) */
if ( true === mb_check_encoding ( $string, $encoding ) )
{
echo 'valid (' . $encoding . ') encoded byte stream!<br />';
}
else
{
echo 'invalid (' . $encoding . ') encoded byte stream!<br />';
}
/* test 2 checkEncoding (test for bad byte sequence(s)) */
if ( true === checkEncoding ( $string, $encoding ) )
{
echo 'valid (' . $encoding . ') encoded byte sequence!<br />';
}
else
{
echo 'invalid (' . $encoding . ') encoded byte sequence!<br />';
}
?>