PHP Doku:: Split multibyte string using regular expression - function.mb-split.html

Verlauf / Chronik / History: (1) anzeigen

Sie sind hier:
Doku-StartseitePHP-HandbuchFunktionsreferenzUnterstützung menschlicher Sprache und ZeichenkodierungMultibyte StringMultibyte String Funktionenmb_split

Ein Service von Reinhard Neidl - Webprogrammierung.

Multibyte String Funktionen

<<mb_send_mail

mb_strcut>>

mb_split

(PHP 4 >= 4.2.0, PHP 5)

mb_splitSplit multibyte string using regular expression

Beschreibung

array mb_split ( string $pattern , string $string [, int $limit = -1 ] )

Split a multibyte string using regular expression pattern and returns the result as an array.

Parameter-Liste

pattern

The regular expression pattern.

string

The string being split.

limit
If optional parameter limit is specified, it will be split in limit elements as maximum.

Rückgabewerte

The result as an array.

Anmerkungen

Hinweis:

Das interne Encoding oder das mit mb_regex_encoding() festgelegte Zeichenencoding wird als Zeichenencoding für diese Funktion genutzt.

Siehe auch


5 BenutzerBeiträge:
- Beiträge aktualisieren...
boukeversteegh at gmail dot com
10.09.2010 18:43
In addition to Sezer Yalcin's tip.

This function splits a multibyte string into an array of characters. Comparable to str_split().

<?php
function mb_str_split( $string ) {
   
# Split at all position not after the start: ^
    # and not before the end: $
   
return preg_split('/(?<!^)(?!$)/u', $string );
}

$string   = '火车票';
$charlist = mb_str_split( $string );

print_r( $charlist );
?>

# Prints:
Array
(
    [0] => 火
    [1] => 车
    [2] => 票
)
qdb at kukmara dot ru
25.03.2010 11:46
an other way to str_split multibyte string:
<?php
$s
='әӘөүҗңһ';

//$temp_s=iconv('UTF-8','UTF-16',$s);
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_a_len=count($temp_a);
for(
$i=0;$i<$temp_a_len;$i++){
   
//$temp_a[$i]=iconv('UTF-16','UTF-8',$temp_a[$i]);
   
$temp_a[$i]=mb_convert_encoding($temp_a[$i],'UTF-8','UTF-16');
}

echo(
'<pre>');
print_r($temp_a);
echo(
'</pre>');

//also possible to directly use UTF-16:
define('SLS',mb_convert_encoding('/','UTF-16'));
$temp_s=mb_convert_encoding($s,'UTF-16','UTF-8');
$temp_a=str_split($temp_s,4);
$temp_s=implode(SLS,$temp_a);
$temp_s=mb_convert_encoding($temp_s,'UTF-8','UTF-16');
echo(
$temp_s);
?>
gert dot matern at web dot de
3.08.2009 12:34
We are talking about Multi Byte ( e.g. UTF-8) strings here, so preg_split will fail for the following string:

'Weiße Rosen sind nicht grün!'

And because I didn't find a regex to simulate a str_split I optimized the first solution from adjwilli a bit:

<?php
$string
= 'Weiße Rosen sind nicht grün!'
$stop   = mb_strlen( $string);
$result = array();

for(
$idx = 0; $idx < $stop; $idx++)
{
  
$result[] = mb_substr( $string, $idx, 1);
}
?>

Here is an example with adjwilli's function:

<?php
mb_internal_encoding
( 'UTF-8');
mb_regex_encoding( 'UTF-8'); 

function
mbStringToArray
( $string
)
{
 
$stop   = mb_strlen( $string);
 
$result = array();

  for(
$idx = 0; $idx < $stop; $idx++)
  {
    
$result[] = mb_substr( $string, $idx, 1);
  }

  return
$result;
}

echo
'<pre>', PHP_EOL,
print_r( mbStringToArray( 'Weiße Rosen sind nicht grün!', true)), PHP_EOL,
'</pre>';
?>

Let me know [by personal email], if someone found a regex to simulate a str_split with mb_split.
Sezer Yalcin
19.02.2009 2:13
To split by mb letters, use preg_split with /u modifier instead of calling mb functions thousand times.
adjwilli at yahoo dot com
26.12.2007 18:37
I figure most people will want a simple way to break-up a multibyte string into its individual characters. Here's a function I'm using to do that. Change UTF-8 to your chosen encoding method.

<?php
function mbStringToArray ($string) {
   
$strlen = mb_strlen($string);
    while (
$strlen) {
       
$array[] = mb_substr($string,0,1,"UTF-8");
       
$string = mb_substr($string,1,$strlen,"UTF-8");
       
$strlen = mb_strlen($string);
    }
    return
$array;
}
?>



PHP Powered Diese Seite bei php.net
The PHP manual text and comments are covered by the Creative Commons Attribution 3.0 License © the PHP Documentation Group - Impressum - mail("TO:Reinhard Neidl",...)