(PHP 4 >= 4.0.6, PHP 5)
mb_convert_encoding — Convert character encoding
Converts the character encoding of string str to to_encoding from optionally from_encoding.
The string being encoded.
The type of encoding that str is being converted to.
Is specified by character code names before conversion. It is either an array, or a comma separated enumerated list. If from_encoding is not specified, the internal encoding will be used.
"auto" may be used, which expands to "ASCII,JIS,UTF-8,EUC-JP,SJIS".
The encoded string.
Beispiel #1 mb_convert_encoding() example
/* Convert internal character encoding to SJIS */
$str = mb_convert_encoding($str, "SJIS");
/* Convert EUC-JP to UTF-7 */
$str = mb_convert_encoding($str, "UTF-7", "EUC-JP");
/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");
/* "auto" is expanded to "ASCII,JIS,UTF-8,EUC-JP,SJIS" */
$str = mb_convert_encoding($str, "EUC-JP", "auto");
If you want to convert japanese to ISO-2022-JP it is highly recommended to use ISO-2022-JP-MS as the target encoding instead. This includes the extended character set and avoids ? in the text. For example the often used "1 in a circle" ① will be correctly converted then.
I've been trying to find the charset of a norwegian (with a lot of ø, æ, å) txt file written on a Mac, i've found it in this way:
$text = "A strange string to pass, maybe with some ø, æ, å characters.";
foreach(mb_list_encodings() as $chr){
echo mb_convert_encoding($text, 'UTF-8', $chr)." : ".$chr."<br>";
The line that looks good, gives you the encoding it was written in.
Hope can help someone
Note that `mb_convert_encoding($val, 'HTML-ENTITIES')` does not escape '\'', '"', '<', '>', or '&'.
It appears that when dealing with an unknown "from encoding" the function will both throw an E_WARNING and proceed to convert the string from ISO-8859-1 to the "to encoding".
I used this function insted mb_convert_encoding, because mbstring wasn't enabled at my comercial server. It only suports utf7, 8 e iso 8859-1:
function my_convert_encoding($string,$to,$from)
// Convert string to ISO_8859-1
if ($from == "UTF-8")
$iso_string = utf8_decode($string);
if ($from == "UTF7-IMAP")
$iso_string = imap_utf7_decode($string);
$iso_string = $string;
// Convert ISO_8859-1 string to result coding
if ($to == "UTF-8")
if ($to == "UTF7-IMAP")
instead of ini_set(), you can try this
aaron, to discard unsupported characters instead of printing a ?, you might as well simply set the configuration directive:
mbstring.substitute_character = "none"
in your php.ini. Be sure to include the quotes around none. Or at run-time with
ini_set('mbstring.substitute_character', "none");
My solution below was slightly incorrect, so here is the correct version (I posted at the end of a long day, never a good idea!)
Again, this is a quick and dirty solution to stop mb_convert_encoding from filling your string with question marks whenever it encounters an illegal character for the target encoding.
function convert_to ( $source, $target_encoding )
// detect the character encoding of the incoming file
$encoding = mb_detect_encoding( $source, "auto" );
// escape all of the question marks so we can remove artifacts from
// the unicode conversion process
$target = str_replace( "?", "[question_mark]", $source );
// convert the string to the target encoding
$target = mb_convert_encoding( $target, $target_encoding, $encoding);
// remove any question marks that have been introduced because of illegal characters
$target = str_replace( "?", "", $target );
// replace the token string "[question_mark]" with the symbol "?"
$target = str_replace( "[question_mark]", "?", $target );
return $target;
Hope this helps someone! (Admins should feel free to delete my previous, incorrect, post for clarity)
If mb_convert_encoding doesn't work for you, and iconv gives you a headache, you might be interested in this free class I found. It can convert almost any charset to almost any other charset. I think it's wonderful and I wish I had found it earlier. It would have saved me tons of headache.
I use it as a fail-safe, in case mb_convert_encoding is not installed. Download it from http://mikolajj.republika.pl/
This is not my own library, so technically it's not spamming, right? ;)
Hope this helps.
For the php-noobs (like me) - working with flash and php.
Here's a simple snippet of code that worked great for me, getting php to show special Danish characters, from a Flash email form:
// Name Escape
$escName = mb_convert_encoding($_POST["Name"], "ISO-8859-1", "UTF-8");
// message escape
$escMessage = mb_convert_encoding($_POST["Message"], "ISO-8859-1", "UTF-8");
// Headers.. and so on...
rodrigo at bb2 dot co dot jp wrote that inconv works better than mb_convert_encoding, I find that when converting from uft8 to shift_jis
$conv_str = mb_convert_encoding($str,$toCS,$fromCS);
works while
$conv_str = iconv($fromCS,$toCS.'//IGNORE',$str);
removes tildes from $str.
Clean a string for use as filename by simply replacing all unwanted characters with underscore (ASCII converts to 7bit). It removes slightly more chars than necessary. Hope its useful.
$fileName = 'Test:!"$%&/()=ÖÄÜöäü<<';
echo strtr(mb_convert_encoding($fileName,'ASCII'),
' ,;:?*#!§$%&/(){}<>=`´|\\\'"',
For those who can´t use mb_convert_encoding() to convert from one charset to another as a metter of lower version of php, try iconv().
I had this problem converting to japanese charset:
And I could fix it by using this:
$txt = iconv('UTF-8', 'SJIS', $txt);
Maybe it´s helpfull for someone else! ;)
To petruzanauticoyahoo?com!ar
If you don't specify a source encoding, then it assumes the internal (default) encoding. ñ is a multi-byte character whose bytes in your configuration default (often iso-8859-1) would actually mean ñ. mb_convert_encoding() is upgrading those characters to their multi-byte equivalents within UTF-8.
Try this instead:
print mb_convert_encoding( "ñ", "UTF-8", "UTF-8" );
Of course this function does no work (for the most part - it can actually be used to strip characters which are not valid for UTF-8).
Hey guys. For everybody who's looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here's your solution:
public function encodeToUtf8($string) {
return mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
public function encodeToIso($string) {
return mb_convert_encoding($string, "ISO-8859-1", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
For me these functions are working fine. Give it a try
When converting Japanese strings to ISO-2022-JP or JIS on PHP >= 5.2.1, you can use "ISO-2022-JP-MS" instead of them.
Kishu-Izon (platform dependent) characters are converted correctly with the encoding, as same as with eucJP-win or with SJIS-win.
As an alternative to Johannes's suggestion for converting strings from other character sets to a 7bit representation while not just deleting latin diacritics, you might try this:
$text = iconv($from_enc, 'US-ASCII//TRANSLIT', $text);
The only disadvantage is that it does not convert "ä" to "ae", but it handles punctuation and other special characters better.
I'd like to share some code to convert latin diacritics to their
traditional 7bit representation, like, for example,
- à,ç,é,î,... to a,c,e,i,...
- ß to ss
- ä,Ä,... to ae,Ae,...
- ë,... to e,...
(mb_convert "7bit" would simply delete any offending characters).
I might have missed on your country's typographic
conventions--correct me then.
* @args string $text line of encoded text
* string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)
* @returns 7bit representation
function to7bit($text,$from_enc) {
$text = mb_convert_encoding($text,'HTML-ENTITIES',$from_enc);
$text = preg_replace(
return $text;
Enjoy :-)
For those wanting to convert from $set to MacRoman, use iconv():
$string = iconv('UTF-8', 'macintosh', $string);
('macintosh' is the IANA name for the MacRoman character set.)
many people below talk about using
to convert non-ascii code into html-readable stuff. Due to my webserver being out of my control, I was unable to set the database character set, and whenever PHP made a copy of my $s variable that it had pulled out of the database, it would convert it to nasty latin1 automatically and not leave it in it's beautiful UTF-8 glory.
So [insert korean characters here] turned into ?????.
I found myself needing to pass by reference (which of course is deprecated/nonexistent in recent versions of PHP)
so instead of
which worked perfectly until I upgraded, so I had to use
call_user_func_array('mb_convert_encoding', array(&$s,'HTML-ENTITIES','UTF-8'));
Hope it helps someone else out
Why did you use the php html encode functions? mbstring has it's own Encoding which is (as far as I tested it) much more usefull:
$text = mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8");
To add to the Flash conversion comment below, here's how I convert back from what I've stored in a database after converting from Flash HTML text field output, in order to load it back into a Flash HTML text field:
function htmltoflash($htmlstr)
return str_replace("<br />","\n",
Here's a tip for anyone using Flash and PHP for storing HTML output submitted from a Flash text field in a database or whatever.
Flash submits its HTML special characters in UTF-8, so you can use the following function to convert those into HTML entity characters:
function utf8html($utf8str)
return htmlentities(mb_convert_encoding($utf8str,"ISO-8859-1","UTF-8"));
be careful when converting from iso-8859-1 to utf-8.
even if you explicitly specify the character encoding of a page as iso-8859-1(via headers and strict xml defs), windows 2000 will ignore that and interpret it as whatever character set it has natively installed.
for example, i wrote char #128 into a page, with char encoding iso-8859-1, and it displayed in internet explorer (& mozilla) as a euro symbol.
it should have displayed a box, denoting that char #128 is undefined in iso-8859-1. The problem was it was displaying in "Windows: western europe" (my native character set).
this led to confusion when i tried to convert this euro to UTF-8 via mb_convert_encoding()
IE displays UTF-8 correctly- and because PHP correctly converted #128 into a box in UTF-8, IE would show a box.
so all i saw was mb_convert_encoding() converting a euro symbol into a box. It took me a long time to figure out what was going on.
Another sample of recoding without MultiByte enabling.
(Russian koi->win, if input in win-encoding already, function recode() returns unchanged string)
// 0 - win
// 1 - koi
function detect_encoding($str) {
$win = 0;
$koi = 0;
for($i=0; $i<strlen($str); $i++) {
if( ord($str[$i]) >224 && ord($str[$i]) < 255) $win++;
if( ord($str[$i]) >192 && ord($str[$i]) < 223) $koi++;
if( $win < $koi ) {
return 1;
} else return 0;
// recodes koi to win
function koi_to_win($string) {
$kw = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 254, 224, 225, 246, 228, 229, 244, 227, 245, 232, 233, 234, 235, 236, 237, 238, 239, 255, 240, 241, 242, 243, 230, 226, 252, 251, 231, 248, 253, 249, 247, 250, 222, 192, 193, 214, 196, 197, 212, 195, 213, 200, 201, 202, 203, 204, 205, 206, 207, 223, 208, 209, 210, 211, 198, 194, 220, 219, 199, 216, 221, 217, 215, 218);
$wk = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 225, 226, 247, 231, 228, 229, 246, 250, 233, 234, 235, 236, 237, 238, 239, 240, 242, 243, 244, 245, 230, 232, 227, 254, 251, 253, 255, 249, 248, 252, 224, 241, 193, 194, 215, 199, 196, 197, 214, 218, 201, 202, 203, 204, 205, 206, 207, 208, 210, 211, 212, 213, 198, 200, 195, 222, 219, 221, 223, 217, 216, 220, 192, 209);
$end = strlen($string);
$pos = 0;
do {
$c = ord($string[$pos]);
if ($c>128) {
$string[$pos] = chr($kw[$c-128]);
} while (++$pos < $end);
return $string;
function recode($str) {
$enc = detect_encoding($str);
if ($enc==1) {
$str = koi_to_win($str);
return $str;