mb_decode_numericentity

(PHP 4 >= 4.0.6, PHP 5)

mb_decode_numericentity — Decode HTML numeric string reference to character

Beschreibung

string mb_decode_numericentity ( string $str , array $convmap , string $encoding )

Convert numeric string reference of string str in a specified block to character.

Parameter-Liste

str: The string being decoded.
convmap: convmap is an array that specifies the code area to convert.
encoding: Der encoding Parameter legt das Zeichenencoding fest. Wird er nicht übergeben so wird das interne Zeichenencoding genutzt.

Rückgabewerte

The converted string.

Beispiele

Beispiel #1 convmap example

$convmap = array (
   int start_code1, int end_code1, int offset1, int mask1,
   int start_code2, int end_code2, int offset2, int mask2,
   ........
   int start_codeN, int end_codeN, int offsetN, int maskN );
// Specify Unicode value for start_codeN and end_codeN
// Add offsetN to value and take bit-wise 'AND' with maskN, 
// then convert value to numeric string reference.

Siehe auch

mb_encode_numericentity() - Encode character to HTML numeric string reference

6 BenutzerBeiträge:
- Beiträge aktualisieren...

Navi
1.04.2009 11:00


Manual entity => utf8 conversion:


<?php


        // parse entities


        $raw = preg_replace_callback


        (


            "/&#(\\d+);/u",


            "_pcreEntityToUtf",


            $raw


        );





    function _pcreEntityToUtf($matches)


    {


        $char = intval(is_array($matches) ? $matches[1] : $matches);





        if ($char < 0x80)


        {


            // to prevent insertion of control characters


            if ($char >= 0x20) return htmlspecialchars(chr($char));


            else return "&#$char;";


        }


        else if ($char < 0x8000)


        {


            return chr(0xc0 | (0x1f & ($char >> 6))) . chr(0x80 | (0x3f & $char));


        }


        else


        {


            return chr(0xe0 | (0x0f & ($char >> 12))) . chr(0x80 | (0x3f & ($char >> 6))). chr(0x80 | (0x3f & $char));


        }


    }


?>

donovan at conduit it
19.04.2006 18:05


note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities.  This fact would have saved me a good hour of time in debugging.



For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions.

dirk at camindo de
30.01.2005 18:51


By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the 



function mb_encode_numericentity before:



  // convert $text from UTF-8 to ISO-8859-1

  $convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF);

  $text = mb_encode_numericentity($text, $convmap, "UTF-8");

  $text = utf8_decode($text);



The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF

Andrew Simpson
11.12.2004 2:29


Many web browsers will tend upload high order characters as UTF-8 encoded entities. 



Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters:



<?php

   //decode decimal HTML entities added by web browser

  $body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body );

  //decode hex HTML entities added by web browser

  $body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body );



//callback function for the regex

function utf8_entity_decode($entity){

 $convmap = array(0x0, 0x10000, 0, 0xfffff);

 return mb_decode_numericentity($entity, $convmap, 'UTF-8');

}

?>

php at cNhOiSpPpAlMe dot org
31.03.2004 10:55


Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text.



<?php



// Supported characters:

//    (space)

//     !#$%&()*+,./0123456789:;<=>?@

//    ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`

//    abcdefghijklmnopqrstuvwxyz{|}

// (Katakana isn't supported.)



function f_han2zen ($string,$encoding = null) {

  if (is_null($encoding)) $encoding = mb_internal_encoding();

  $convmap = array(

     0x20,0x20,0x3000-0x20,0xffff,   // Space

     0x21,0x7e,0xff01-0x21,0xffff);

  $temp = mb_encode_numericentity($string,$convmap,$encoding);

  $convmap = array(0,0xffff,0,0xffff);

  return mb_decode_numericentity($temp,$convmap,$encoding);

}

function f_zen2han ($string,$encoding = null) {

  if (is_null($encoding)) $encoding = mb_internal_encoding();

  $convmap = array(

     0x3000,0x3000,-(0x3000-0x20),0xffff,   // Space

     0xff01,0xff5e,-(0xff01-0x21),0xffff);

  $temp = mb_encode_numericentity($string,$convmap,$encoding);

  $convmap = array(0,0xffff,0,0xffff);

  return mb_decode_numericentity($temp,$convmap,$encoding);

}



// Sample usage:

f_han2zen("test","shift_jis");

f_han2zen("test","utf-8");



?>

dev at glossword info
19.11.2003 16:43