PHP Doku:: Decode HTML numeric string reference to character - function.mb-decode-numericentity.html

Verlauf / Chronik / History: (21) anzeigen

Sie sind hier:
Doku-StartseitePHP-HandbuchFunktionsreferenzUnterstützung menschlicher Sprache und ZeichenkodierungMultibyte StringMultibyte String Funktionenmb_decode_numericentity

Ein Service von Reinhard Neidl - Webprogrammierung.

Multibyte String Funktionen

<<mb_decode_mimeheader

mb_detect_encoding>>

mb_decode_numericentity

(PHP 4 >= 4.0.6, PHP 5)

mb_decode_numericentityDecode HTML numeric string reference to character

Beschreibung

string mb_decode_numericentity ( string $str , array $convmap , string $encoding )

Convert numeric string reference of string str in a specified block to character.

Parameter-Liste

str

The string being decoded.

convmap

convmap is an array that specifies the code area to convert.

encoding

Der encoding Parameter legt das Zeichenencoding fest. Wird er nicht übergeben so wird das interne Zeichenencoding genutzt.

Rückgabewerte

The converted string.

Beispiele

Beispiel #1 convmap example

$convmap = array (
   int start_code1, int end_code1, int offset1, int mask1,
   int start_code2, int end_code2, int offset2, int mask2,
   ........
   int start_codeN, int end_codeN, int offsetN, int maskN );
// Specify Unicode value for start_codeN and end_codeN
// Add offsetN to value and take bit-wise 'AND' with maskN, 
// then convert value to numeric string reference.

Siehe auch


6 BenutzerBeiträge:
- Beiträge aktualisieren...
Navi
1.04.2009 11:00
Manual entity => utf8 conversion:
<?php
       
// parse entities
       
$raw = preg_replace_callback
       
(
           
"/&#(\\d+);/u",
           
"_pcreEntityToUtf",
           
$raw
       
);

    function
_pcreEntityToUtf($matches)
    {
       
$char = intval(is_array($matches) ? $matches[1] : $matches);

        if (
$char < 0x80)
        {
           
// to prevent insertion of control characters
           
if ($char >= 0x20) return htmlspecialchars(chr($char));
            else return
"&#$char;";
        }
        else if (
$char < 0x8000)
        {
            return
chr(0xc0 | (0x1f & ($char >> 6))) . chr(0x80 | (0x3f & $char));
        }
        else
        {
            return
chr(0xe0 | (0x0f & ($char >> 12))) . chr(0x80 | (0x3f & ($char >> 6))). chr(0x80 | (0x3f & $char));
        }
    }
?>
donovan at conduit it
19.04.2006 18:05
note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities.  This fact would have saved me a good hour of time in debugging.

For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions.
dirk at camindo de
30.01.2005 18:51
By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the

function mb_encode_numericentity before:

  // convert $text from UTF-8 to ISO-8859-1
  $convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF);
  $text = mb_encode_numericentity($text, $convmap, "UTF-8");
  $text = utf8_decode($text);

The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF
Andrew Simpson
11.12.2004 2:29
Many web browsers will tend upload high order characters as UTF-8 encoded entities.

Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters:

<?php
  
//decode decimal HTML entities added by web browser
 
$body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body );
 
//decode hex HTML entities added by web browser
 
$body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body );

//callback function for the regex
function utf8_entity_decode($entity){
 
$convmap = array(0x0, 0x10000, 0, 0xfffff);
 return
mb_decode_numericentity($entity, $convmap, 'UTF-8');
}
?>
php at cNhOiSpPpAlMe dot org
31.03.2004 10:55
Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text.

<?php

// Supported characters:
//    (space)
//     !#$%&()*+,./0123456789:;<=>?@
//    ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
//    abcdefghijklmnopqrstuvwxyz{|}
// (Katakana isn't supported.)

function f_han2zen ($string,$encoding = null) {
  if (
is_null($encoding)) $encoding = mb_internal_encoding();
 
$convmap = array(
    
0x20,0x20,0x3000-0x20,0xffff,   // Space
    
0x21,0x7e,0xff01-0x21,0xffff);
 
$temp = mb_encode_numericentity($string,$convmap,$encoding);
 
$convmap = array(0,0xffff,0,0xffff);
  return
mb_decode_numericentity($temp,$convmap,$encoding);
}
function
f_zen2han ($string,$encoding = null) {
  if (
is_null($encoding)) $encoding = mb_internal_encoding();
 
$convmap = array(
    
0x3000,0x3000,-(0x3000-0x20),0xffff,   // Space
    
0xff01,0xff5e,-(0xff01-0x21),0xffff);
 
$temp = mb_encode_numericentity($string,$convmap,$encoding);
 
$convmap = array(0,0xffff,0,0xffff);
  return
mb_decode_numericentity($temp,$convmap,$encoding);
}

// Sample usage:
f_han2zen("test","shift_jis");
f_han2zen("test","utf-8");

?>
dev at glossword info
19.11.2003 16:43
Just two great functions for daily use:

/* Converts any HTML-entities into characters */
function my_numeric2character($t)
{
    $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
    return mb_decode_numericentity($t, $convmap, 'UTF-8');
}
/* Converts any characters into HTML-entities */
function my_character2numeric($t)
{
    $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
    return mb_encode_numericentity($t, $convmap, 'UTF-8');
}
print my_numeric2character('&#8217; &#7936; &#226;');
print my_character2numeric(' ’ â ');



PHP Powered Diese Seite bei php.net
The PHP manual text and comments are covered by the Creative Commons Attribution 3.0 License © the PHP Documentation Group - Impressum - mail("TO:Reinhard Neidl",...)