PHP Doku:: Führt eine Suche mit einem regulären Ausdruck durch

Version	Beschreibung
5.2.2	Benannte Teilsuchmuster akzeptieren nun sowohl die Syntaxen (?<name>) und (?'name') als auch (?P<name>). Vorherige Versionen akzeptierten nur (?P<name>).
4.3.3	Den Parameter `offset` hinzugefügt
4.3.0	Das Flag `PREG_OFFSET_CAPTURE` hinzugefügt
4.3.0	Den Parameter `flags` hinzugefügt

53 BenutzerBeiträge:
- Beiträge aktualisieren...

sainnr at gmail dot com
30.12.2010 15:12


This sample regexp may be useful if you are working with DB field types. 



(?P<type>\w+)($|\((?P<length>(\d+|(.*)))\))



For example, if you are have a such type as "varchar(255)" or "text", the next fragment



<?php

   $type = 'varchar(255)';  // type of field

   preg_match('/(?P<type>\w+)($|\((?P<length>(\d+|(.*)))\))/', $type, $field);

   print_r($field);

?>



will output something like this:

Array ( [0] => varchar(255) [type] => varchar [1] => varchar [2] => (255) [length] => 255 [3] => 255 [4] => 255 )

ian_channing at hotmail dot com
27.12.2010 10:55


When trying to check a file path that could be windows or unix it took me quite a few tries to get the escape characters right.



The Unix directory separator must be escaped once and the windows directory separator must be escaped twice.



This will match path/to/file and path\to\file.exe



preg_match('/^[a-z0-9_.\/\\\]*$/i', $file_string);

clbmuvn at gmail dot com
25.09.2010 17:42


<?php

$str ='/^00+[0-1]+[0-1]+[0-1]

+[0-1]+[0-1]+[0-1]+00+[0-1]+

[0-1]+[0-1]+[0-1]+[0-1]+[0-1]+00+

[0-1]+[0-1]+[0-1]+[0-1]+[0-1]+[0-1]+00/i';

        if ((preg_match($str,"00000000000000000000000000")))

        {echo "OK";}

?>

Not work

SoN9ne at gmail dot com
8.06.2010 19:10


I have been working on a email system that will automatically generate a text email from a given HTML email by using strip_tags(). 

The only issue I ran into, for my needs, were that the anchors would not keep their links. 

I search for a little while and could not find anything to strip the links from the tags so I generated my own little snippet. 

I am posting it here in hopes that others may find it useful and for later reference.



A note to keep in mind:

I was primarily concerned with valid HTML so if attributes do no use ' or " to contain the values then this will need to be tweaked.

If you can edit this to work better, please let me know.

<?php

/**

 * Replaces anchor tags with text

 * - Will search string and replace all anchor tags with text (case insensitive)

 * 

 * How it works:

 * - Searches string for an anchor tag, checks to make sure it matches the criteria

 *         Anchor search criteria:

 *             - 1 - <a (must have the start of the anchor tag )

 *             - 2 - Can have any number of spaces or other attributes before and after the href attribute

 *             - 3 - Must close the anchor tag

 * 

 * - Once the check has passed it will then replace the anchor tag with the string replacement

 * - The string replacement can be customized

 * 

 * Know issue:

 * - This will not work for anchors that do not use a ' or " to contain the attributes. 

 *         (i.e.- <a href=http: //php.net>PHP.net</a> will not be replaced)

 */

function replaceAnchorsWithText($data) {

    /**

     * Had to modify $regex so it could post to the site... so I broke it into 6 parts.

     */

    $regex  = '/(<a\s*'; // Start of anchor tag

    $regex .= '(.*?)\s*'; // Any attributes or spaces that may or may not exist

    $regex .= 'href=[\'"]+?\s*(?P<link>\S+)\s*[\'"]+?'; // Grab the link

    $regex .= '\s*(.*?)\s*>\s*'; // Any attributes or spaces that may or may not exist before closing tag 

    $regex .= '(?P<name>\S+)'; // Grab the name

    $regex .= '\s*<\/a>)/i'; // Any number of spaces between the closing anchor tag (case insensitive)

    

    if (is_array($data)) {

        // This is what will replace the link (modify to you liking)

        $data = "{$data['name']}({$data['link']})";

    }

    return preg_replace_callback($regex, 'replaceAnchorsWithText', $data);

}



$input  = 'Test 1: <a href="http: //php.net1">PHP.NET1</a>.<br />';

$input .= 'Test 2: <A name="test" HREF=\'HTTP: //PHP.NET2\' target="_blank">PHP.NET2</A>.<BR />';

$input .= 'Test 3: <a hRef=http: //php.net3>php.net3</a><br />';

$input .= 'This last line had nothing to do with any of this';



echo replaceAnchorsWithText($input).'<hr/>';

?>

Will output:

Test 1: PHP.NET1(http: //php.net1).

Test 2: PHP.NET2(HTTP: //PHP.NET2).

Test 3: php.net3 (is still an anchor)

This last line had nothing to do with any of this



Posting to this site is painful...

Had to break up the regex and had to break the test links since it was being flagged as spam...

teracci2002
9.04.2010 18:00


When you use preg_match() for security purpose or huge data processing,

mayby you should make consideration for backtrack_limit and recursion_limit.

http://www.php.net/manual/en/pcre.configuration.php



These limits may bring wrong matching result.

You can verify whether you hit these limits by checking preg_last_error().

http://www.php.net/manual/en/function.preg-last-error.php

eric at devotia dot com
22.03.2010 20:31


I see quite a few email address validation patterns below that seem to me to be overly strict.  



using the username@domain.tld model:



username:  the rules are dictated by the receiving email server, this means that in theory anything goes here.  I could write my own email server and dictate my own rules.



domain: Domain names are regulated and there are rules.  Still I would opt for a loose interpretation and not bother checking length.



tld: Vanity TLD's are just around the corner, once again try not to be too restrictive.



So using a loose pattern, like so:

preg_match('/^[^@]+@[a-zA-Z0-9._-]+\.[a-zA-Z]+$/', $email)



username:  at least 1 character and it isn't an @



domain: at least 1 character and contains only valid characters.



tld: at least 1 character, alpha only (actually not entirely sure what the new custom TLD's will allow may need to broaden the scope here).



A strict pattern is not going to guarantee the result of a valid email address (it won't make sure it exists), but many of the patterns below can result in a syntactically valid email address not being accepted (not good).



Spaces and quotes while rare are still occasionally encountered in the username section of an address.  I actually see both with some regularity.



Domain name length, while there is the 63 character (255 overall) per section limitation I think we can pretty much agree that someone isn't likely to give you a false email address by violating that 1.



There is at least 1 single character domain name that I am aware of, qwest owns q.com so in theory bob@q.com could be an actual email address.

Kae Cyphet
18.03.2010 3:29


for those coming over from ereg, preg_match can be quite intimidating. to get started here is a migration tip.





<?php


if(ereg('[^0-9A-Za-z]',$test_string)) // will be true if characters arnt 0-9, A-Z or a-z.





if(preg_match('/[^0-9A-Za-z]/',$test_string)) // this is the preg_match version. the /'s are now required.


?>

plasma
22.02.2010 1:53


To extract scheme, host, path, ect. simply use 





<?php





  $url  = 'http://name:pass@';


  $url .= 'example.com:10000';


  $url .= '/path/to/file.php?a=1&amp;b=2#anchor';





  $url_data = parse_url ( $url );





  print_r ( $url_data );





?>


___


prints out something like:





Array


(


    [scheme] => http


    [host] => wild.subdomain.orgy.domain.co.uk


    [port] => 10000


    [user] => name


    [pass] => pass


    [path] => /path/to/file.php


    [query] => a=1&b=2


    [fragment] => anchor


)





In my tests parse_url is up to 15x faster than preg_match(_all)!

Becheru Petru-Ioan
20.02.2010 19:11


http://pw-newspaper.googlecode.com/ project provides code for matching strings of printable ASCII(space to tilda) with newlines (newline is formed CR and LF).



<?php

preg_match('/^[ -~\xA\xD]{0,65535}$/i', "Hello\n World!");

?>



There are more line terminators that CR or LF: see http://en.wikipedia.org/wiki/Newline#Unicode

Dr@ke
18.02.2010 16:58


Hello,

There is a bug with somes new PCRE versions (like:7.9 2009-04-1),

In patterns:

\w+ !== [a-zA-Z0-9]+



But it's ok, if i replace \w+ by [a-z0-9]+ or [a-zA-Z0-9]+

saberdream at live dot fr
11.02.2010 0:53


I made a function to circumvent the problem of length of a string... This verifies that the link is an image.





<?php


function verifiesimage($lien, $limite) {


    if( preg_match('#^http:\/\/(.*)\.(gif|png|jpg)$#i', $lien) && strlen($lien) < $limite )


    {


        $msg = TRUE; // link ok


    }


    else


    {


        $msg = FALSE; // the link isn't image


    }


    return $msg; // return TRUE or FALSE


}


?>





Example :





<?php


if(verifierimage($votrelien, 50) == TRUE)


{


    // we display the content...


}


?>

Anonymous
6.02.2010 17:00


The regular expression for breaking-down a URI reference into its components:



      ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

       12            3  4          5       6  7        8 9



Source: ietf.org/rfc/rfc2396.txt

cebelab at gmail dot com
24.01.2010 7:43


I noticed that in order to deal with UTF-8 texts, without having to recompile php with the PCRE UTF-8 flag enabled, you can just add the following sequence at the start of your pattern: (*UTF8)



for instance : '#(*UTF8)[[:alnum:]]#' will return TRUE for 'é' where '#[[:alnum:]]#' will return FALSE



found this very very useful tip after hours of research over the web directly in pcre website right here : http://www.pcre.org/pcre.txt

there are many further informations about UTF-8 support in the lib



hop that will help!



--

cedric

Anonymous
1.12.2009 12:08


<?php

/**

 * @param integer $vat_number VAT number to test e.g. GB123 4567 89

 * @return integer -1 if country not included OR 1 if the VAT Num matches for the country OR 0 if no match

*/

function checkVatNumber(  $vat_number ) {

    switch(strtoupper(substr($vat_number,0, 2))) {

        case 'AT':

            $regex = '/^(AT){0,1}U[0-9]{8}$/i';

            break;

        case 'BE':

            $regex = '/^(BE){0,1}[0]{0,1}[0-9]{9}$/i';

            break;

        case 'BG':

            $regex = '/^(BG){0,1}[0-9]{9,10}$/i';

            break;

        case 'CY':

            $regex = '/^(CY){0,1}[0-9]{8}[A-Z]$/i';

            break;

        case 'CZ':

            $regex = '/^(CZ){0,1}[0-9]{8,10}$/i';

            break;

        case 'DK':

            $regex = '/^(DK){0,1}([0-9]{2}[\ ]{0,1}){3}[0-9]{2}$/i';

            break;

        case 'EE':

        case 'DE':

        case 'PT':

        case 'EL':

            $regex = '/^(EE|EL|DE|PT){0,1}[0-9]{9}$/i';

            break;

        case 'FR':

            $regex = '/^(FR){0,1}[0-9A-Z]{2}[\ ]{0,1}[0-9]{9}$/i';

            break;

        case 'FI':

        case 'HU':

        case 'LU':

        case 'MT':

        case 'SI':

            $regex = '/^(FI|HU|LU|MT|SI){0,1}[0-9]{8}$/i';

            break;

        case 'IE':

            $regex = '/^(IE){0,1}[0-9][0-9A-Z\+\*][0-9]{5}[A-Z]$/i';

            break;

        case 'IT':

        case 'LV':

            $regex = '/^(IT|LV){0,1}[0-9]{11}$/i';

            break;

        case 'LT':

            $regex = '/^(LT){0,1}([0-9]{9}|[0-9]{12})$/i';

            break;

        case 'NL':

            $regex = '/^(NL){0,1}[0-9]{9}B[0-9]{2}$/i';

            break;

        case 'PL':

        case 'SK':

            $regex = '/^(PL|SK){0,1}[0-9]{10}$/i';

            break;

        case 'RO':

            $regex = '/^(RO){0,1}[0-9]{2,10}$/i';

            break;

        case 'SE':

            $regex = '/^(SE){0,1}[0-9]{12}$/i';

            break;

        case 'ES':

            $regex = '/^(ES){0,1}([0-9A-Z][0-9]{7}[A-Z])|([A-Z][0-9]{7}[0-9A-Z])$/i';

            break;

        case 'GB':

            $regex = '/^(GB){0,1}([1-9][0-9]{2}[\ ]{0,1}[0-9]{4}[\ ]{0,1}[0-9]{2})|([1-9][0-9]{2}[\ ]{0,1}[0-9]{4}[\ ]{0,1}[0-9]{2}[\ ]{0,1}[0-9]{3})|((GD|HA)[0-9]{3})$/i';

            break;

        default:

            return -1;

            break;

    }

   

    return preg_match($regex, $vat_number);

}

?>

Stefan
17.11.2009 23:47


I spent a while replacing all my ereg() calls to preg_match(), since ereg() is now deprecated and will not be supported as of v 6.0.





Just a warning regarding the conversion, the two functions behave very similarly, but not exactly alike. Obviously, you will need to delimit your pattern with '/' or '|' characters.





The difference that stumped me was that preg_replace overwrites the $matches array regardless if a match was found. If no match was found, $matches is simply empty.





ereg(), however, would leave $matches alone if a match was not found. In my code, I had repeated calls to ereg, and was populating $matches with each match. I was only interested in the last match. However, with preg_match, if the very last call to the function did not result in a match, the $matches array would be overwritten with a blank value.





Here is an example code snippet to illustrate:





<?php


$test = array('yes','no','yes','no','yes','no');





foreach ($test as $key=>$value) {


  ereg("yes",$value,$matches1);


  preg_match("|yes|",$value,$matches2);


}


  print "ereg result: $matches1[0]<br>";


  print "preg_match result: $matches2[0]<br>";


?>





The output is:


ereg result: yes


preg_match result: 





($matches2[0] in this case is empty)





I believe the preg_match behavior is cleaner. I just thought I would report this to hopefully save others some time.

ruakuu at NOSPAM dot com
4.11.2009 6:32


Was working on a site that needed japanese and alphabetic letters and needed to 

validate input using preg_match, I tried using \p{script} but didn't work:



<?php

$pattern ='/^([-a-zA-Z0-9_\p{Katakana}\p{Hiragana}\p{Han}]*)$/u'; // Didn't work

?>



So I tried with ranges and it worked:



<?php

$pattern ='/^[-a-zA-Z0-9_\x{30A0}-\x{30FF}'

         .'\x{3040}-\x{309F}\x{4E00}-\x{9FBF}\s]*$/u';

$match_string = '印刷最安 ニキビ跡除去 ゲームボーイ';



if (preg_match($pattern, $match_string)) {

    echo "Found - pattern $pattern";

} else {

    echo "Not found - pattern $pattern";

}

?>



U+4E00–U+9FBF Kanji

U+3040–U+309F Hiragana

U+30A0–U+30FF Katakana



Hope its useful, it took me several hours to figure it out.

splattermania at freenet dot de
21.10.2009 17:50


Addition to my last note:



I just posted the regex, but there are missing the delimiters for it. The correct way to check against the regex ist as follows:



<?

    if(preg_match("/^$regex$/", $url))

    {

        return true;

    }

?>

Anonymous
12.10.2009 11:24


If your regular expression does not match with long input text when you think it should, you might have hit the PCRE backtrack default limit of 100000. See http://php.net/pcre.backtrack-limit.

splattermania at freenet dot de
1.10.2009 14:01


As I wasted lots of time finding a REAL regex for URLs and resulted in building it on my own, I now have found one, that seems to work for all kinds of urls:





<?php


    $regex = "((https?|ftp)\:\/\/)?"; // SCHEME


    $regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?"; // User and Pass


    $regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP


    $regex .= "(\:[0-9]{2,5})?"; // Port


    $regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path


    $regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?"; // GET Query


    $regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor


?>





Then, the correct way to check against the regex ist as follows:





<?php


       if(preg_match("/^$regex$/", $url))


       {


               return true;


       }


?>

luc _ santeramo at t yahoo dot com
3.09.2009 16:46


If you want to validate an email in one line, use filter_var() function !

http://fr.php.net/manual/en/function.filter-var.php



easy use, as described in the document example :

var_dump(filter_var('bob@example.com', FILTER_VALIDATE_EMAIL));

marcosc at tekar dot net
27.08.2009 18:31


When using accented characters and "ñ" (áéíóúñ), preg_match does not work. It is a charset problem, use utf8_decode/decode to fix.

ian_channing at hotmail dot com
20.08.2009 15:13


This is a function that uses regular expressions to match against the various VAT formats required across the EU.





<?php


/**


 * @param integer $country Country name


 * @param integer $vat_number VAT number to test e.g. GB123 4567 89


 * @return integer -1 if country not included OR 1 if the VAT Num matches for the country OR 0 if no match


*/


function checkVatNumber( $country, $vat_number ) {


    switch($country) {


        case 'Austria':


            $regex = '/^(AT){0,1}U[0-9]{8}$/i';


            break;


        case 'Belgium':


            $regex = '/^(BE){0,1}[0]{0,1}[0-9]{9}$/i';


            break;


        case 'Bulgaria':


            $regex = '/^(BG){0,1}[0-9]{9,10}$/i';


            break;


        case 'Cyprus':


            $regex = '/^(CY){0,1}[0-9]{8}[A-Z]$/i';


            break;


        case 'Czech Republic':


            $regex = '/^(CZ){0,1}[0-9]{8,10}$/i';


            break;


        case 'Denmark':


            $regex = '/^(DK){0,1}([0-9]{2}[\ ]{0,1}){3}[0-9]{2}$/i';


            break;


        case 'Estonia':


        case 'Germany':


        case 'Greece':


        case 'Portugal':


            $regex = '/^(EE|EL|DE|PT){0,1}[0-9]{9}$/i';


            break;


        case 'France':


            $regex = '/^(FR){0,1}[0-9A-Z]{2}[\ ]{0,1}[0-9]{9}$/i';


            break;


        case 'Finland':


        case 'Hungary':


        case 'Luxembourg':


        case 'Malta':


        case 'Slovenia':


            $regex = '/^(FI|HU|LU|MT|SI){0,1}[0-9]{8}$/i';


            break;


        case 'Ireland':


            $regex = '/^(IE){0,1}[0-9][0-9A-Z\+\*][0-9]{5}[A-Z]$/i';


            break;


        case 'Italy':


        case 'Latvia':


            $regex = '/^(IT|LV){0,1}[0-9]{11}$/i';


            break;


        case 'Lithuania':


            $regex = '/^(LT){0,1}([0-9]{9}|[0-9]{12})$/i';


            break;


        case 'Netherlands':


            $regex = '/^(NL){0,1}[0-9]{9}B[0-9]{2}$/i';


            break;


        case 'Poland':


        case 'Slovakia':


            $regex = '/^(PL|SK){0,1}[0-9]{10}$/i';


            break;


        case 'Romania':


            $regex = '/^(RO){0,1}[0-9]{2,10}$/i';


            break;


        case 'Sweden':


            $regex = '/^(SE){0,1}[0-9]{12}$/i';


            break;


        case 'Spain':


            $regex = '/^(ES){0,1}([0-9A-Z][0-9]{7}[A-Z])|([A-Z][0-9]{7}[0-9A-Z])$/i';


            break;


        case 'United Kingdom':


            $regex = '/^(GB){0,1}([1-9][0-9]{2}[\ ]{0,1}[0-9]{4}[\ ]{0,1}[0-9]{2})|([1-9][0-9]{2}[\ ]{0,1}[0-9]{4}[\ ]{0,1}[0-9]{2}[\ ]{0,1}[0-9]{3})|((GD|HA)[0-9]{3})$/i';


            break;


        default:


            return -1;


            break;


    }


    


    return preg_match($regex, $vat_number);


}


?>

Rob
19.08.2009 21:03


The following function works well for validating ip addresses





<?php


function valid_ip($ip) {


    return preg_match("/^([1-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])" .


            "(\.([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}$/", $ip);


}


?>

KOmaSHOOTER at gmx dot de
9.08.2009 15:12


reading files from a dir without "." or ".."

<?php

$handle = opendir('content/pages/');

$pages = array();

while (false !== ($file = readdir($handle))) {

      $case=preg_match("/^[.]/",$file,$out, PREG_OFFSET_CAPTURE);

      //echo($case);

      if(!$case){

       echo("$file<br />");

       array_push($pages,$file);

       }

}

echo(count($pages));

?>

suit dot 2009 at rebell dot at
6.08.2009 12:49


Reno's expression is still wrong.



The domain-part may contain lots of characters for IDN. IP adresses in the domain part are allowed. In local-part are more characters allowed too but specific combinations are forbidden. Check RFC 2822 and RFC 2821.



"foo bar"@example.com is valid

foo\@bar@example.com is valid

$?^_`.!#{|}%&'*+-/=~@example.com is also valid e-mail-address (ok, quite uncommon example)



but



foo.@example.com is invalid, dot at beginning and end of the string is not allowed - but Reno's expression would match.



an e-mail adress should not be longer than 256 characters (or 64 in local part, 255 in domain part) since its limited by SMTP - check RFC 5321



Of course you should check RFC 2606 (Section 3) too - domain.tld or host.tld are not good as example-domains. Use example.com / org / net (for domains) or .invalid / . example (for top level domains).

david at blue-labs dot org
18.05.2009 15:06


Reno, your email validation regex is still invalid.  Email addresses can contain the "+" in the localpart.



i.e. david+something@domain.com

matt
8.05.2009 22:07


To support large Unicode ranges (ie: [\x{E000}-\x{FFFD}] or \x{10FFFFF}) you must use the modifier '/u' at the end of your expression.

daniel dot chcouri at gmail dot com
3.05.2009 15:09


Html tags delete using regular expression





<?php


function removeHtmlTagsWithExceptions($html, $exceptions = null){


    if(is_array($exceptions) && !empty($exceptions))


    {


        foreach($exceptions as $exception)


        {


            $openTagPattern  = '/<(' . $exception . ')(\s.*?)?>/msi';


            $closeTagPattern = '/<\/(' . $exception . ')>/msi';





            $html = preg_replace(


                array($openTagPattern, $closeTagPattern),


                array('||l|\1\2|r||', '||l|/\1|r||'),


                $html


            );


        }


    }





    $html = preg_replace('/<.*?>/msi', '', $html);





    if(is_array($exceptions))


    {


        $html = str_replace('||l|', '<', $html);


        $html = str_replace('|r||', '>', $html);


    }





    return $html;


}





// example:


print removeHtmlTagsWithExceptions(<<<EOF


<h1>Whatsup?!</h1>


Enjoy <span style="text-color:blue;">that</span> script<br />


<br />


EOF


, array('br'));


?>

corey [works at] effim [delete] .com
25.04.2009 5:52


I see a lot of people trying to put together phone regex's and struggling (hey, no worries...they're complicated). Here's one that we use that's pretty nifty. It's not perfect, but it should work for most non-idealists.





*** Note: Only matches U.S. phone numbers. ***





<?php





// all on one line...


$regex = '/^(?:1(?:[. -])?)?(?:\((?=\d{3}\)))?([2-9]\d{2})(?:(?<=\(\d{3})\))? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/';





// or broken up


$regex = '/^(?:1(?:[. -])?)?(?:\((?=\d{3}\)))?([2-9]\d{2})'


        .'(?:(?<=\(\d{3})\))? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})'


        .'[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/';





?>





If you're wondering why all the non-capturing subpatterns (which look like this "(?:", it's so that we can do this:





<?php





$formatted = preg_replace($regex, '($1) $2-$3 ext. $4', $phoneNumber);





// or, provided you use the $matches argument in preg_match





$formatted = "($matches[1]) $matches[2]-$matches[3]";


if ($matches[4]) $formatted .= " $matches[4]";





?>





*** Results: ***


520-555-5542 :: MATCH


520.555.5542 :: MATCH


5205555542 :: MATCH


520 555 5542 :: MATCH


520) 555-5542 :: FAIL


(520 555-5542 :: FAIL


(520)555-5542 :: MATCH


(520) 555-5542 :: MATCH


(520) 555 5542 :: MATCH


520-555.5542 :: MATCH


520 555-0555 :: MATCH


(520)5555542 :: MATCH


520.555-4523 :: MATCH


19991114444 :: FAIL


19995554444 :: MATCH


514 555 1231 :: MATCH


1 555 555 5555 :: MATCH


1.555.555.5555 :: MATCH


1-555-555-5555 :: MATCH


520-555-5542 ext.123 :: MATCH


520.555.5542 EXT 123 :: MATCH


5205555542 Ext. 7712 :: MATCH


520 555 5542 ext 5 :: MATCH


520) 555-5542 :: FAIL


(520 555-5542 :: FAIL


(520)555-5542 ext .4 :: FAIL


(512) 555-1234 ext. 123 :: MATCH


1(555)555-5555 :: MATCH

daevid at daevid dot com
7.03.2009 0:18


I just learned about named groups from a Python friend today and was curious if PHP supported them, guess what -- it does!!!



http://www.regular-expressions.info/named.html



<?php

   preg_match("/(?P<foo>abc)(.*)(?P<bar>xyz)/",

                       'abcdefghijklmnopqrstuvwxyz',

                       $matches);

   print_r($matches);

?>



will produce: 



Array

(

    [0] => abcdefghijklmnopqrstuvwxyz

    [foo] => abc

    [1] => abc

    [2] => defghijklmnopqrstuvw

    [bar] => xyz

    [3] => xyz

)



Note that you actually get the named group as well as the numerical key

value too, so if you do use them, and you're counting array elements, be

aware that your array might be bigger than you initially expect it to be.

wjaspers4 [at] gmail [dot] com
28.02.2009 0:16


I recently encountered a problem trying to capture multiple instances of named subpatterns from filenames.

Therefore, I came up with this function.



The function allows you to pass through flags (in this version it applies to all expressions tested), and generates an array of search results.



Enjoy!



<?php



/**

 * Allows multiple expressions to be tested on one string.

 * This will return a boolean, however you may want to alter this.

 *

 * @author William Jaspers, IV <wjaspers4@gmail.com>

 * @created 2009-02-27 17:00:00 +6:00:00 GMT

 * @access public

 *

 * @param array $patterns An array of expressions to be tested.

 * @param String $subject The data to test.

 * @param array $findings Optional argument to store our results.

 * @param mixed $flags Pass-thru argument to allow normal flags to apply to all tested expressions.

 * @param array $errors A storage bin for errors

 *

 * @returns bool Whether or not errors occurred.

 */

function preg_match_multiple( 

  array $patterns=array(), 

  $subject=null,

  &$findings=array(),

  $flags=false,

  &$errors=array()

) {

  foreach( $patterns as $name => $pattern )

  {

    if( 1 <= preg_match_all( $pattern, $subject, $found, $flags ) )

    {

      $findings[$name] = $found;

    } else 

    {

      if( PREG_NO_ERROR !== ( $code = preg_last_error() ))

      {

        $errors[$name] = $code;

      } else $findings[$name] = array();

    }

  }

  return (0===sizeof($errors));

}

?>

skds1433 at hotmail dot com
19.02.2009 15:41


here is a small tool for someone learning to use regular expressions. it's very basic, and allows you to try different patterns and combinations. I made it to help me, because I like to try different things, to get a good understanding of how things work.



<?php

$search = isset($_POST['search'])?$_POST['search']:"//";

$match = isset($_POST['match'])?$_POST['match']:"<>";



echo '<form method="post">';

echo 's: <input style="width:400px;" name="search" type="text" value="'.$search.'" /><br />';

echo 'm:<input style="width:400px;" name="match" type="text" value="'.$match.'" /><input type="submit" value="go" /></form><br />';

if (preg_match($search, $match)){echo "matches";}else{echo "no match";}

?>

Svoop
10.02.2009 14:42


I have written a short introduction and a colorful cheat sheet for Perl Compatible Regular Expressions (PCRE):



http://www.bitcetera.com/en/techblog/2008/04/01/regex-in-a-nutshell/

akniep at rayo dot info
30.01.2009 12:05


Bugs of preg_match (PHP-version 5.2.5)



In most cases, the following example will show one of two PHP-bugs discovered with preg_match depending on your PHP-version and configuration.



<?php



$text = "test=";

// creates a rather long text

for ($i = 0; $i++ < 100000;)

    $text .= "%AB";



// a typical URL_query validity-checker (the pattern's function does not matter for this example)

$pattern    = '/^(?:[;\/?:@&=+$,]|(?:[^\W_]|[-_.!~*\()\[\] ])|(?:%[\da-fA-F]{2}))*$/';

    

var_dump( preg_match( $pattern, $text ) );



?>



Possible bug (1):

=============

On one of our Linux-Servers the above example crashes PHP-execution with a C(?) Segmentation Fault(!). This seems to be a known bug (see http://bugs.php.net/bug.php?id=40909), but I don't know if it has been fixed, yet.

If you are looking for a work-around, the following code-snippet is what I found helpful. It wraps the possibly crashing preg_match call by decreasing the PCRE recursion limit in order to result in a Reg-Exp error instead of a PHP-crash.



<?php

[...]



// decrease the PCRE recursion limit for the (possibly dangerous) preg_match call

$former_recursion_limit = ini_set( "pcre.recursion_limit", 10000 );



// the wrapped preg_match call

$result = preg_match( $pattern, $text );



// reset the PCRE recursion limit to its original value

ini_set( "pcre.recursion_limit", $former_recursion_limit );



// if the reg-exp fails due to the decreased recursion limit we may not make any statement, but PHP-execution continues

if ( PREG_RECURSION_LIMIT_ERROR === preg_last_error() )

{

    // react on the failed regular expression here

    $result = [...];

    

    // do logging or email-sending here

    [...]

} //if



?>



Possible bug (2):

=============

On one of our Windows-Servers the above example does not crash PHP, but (directly) hits the recursion-limit. Here, the problem is that preg_match does not return boolean(false) as expected by the description / manual of above.

In short, preg_match seems to return an int(0) instead of the expected boolean(false) if the regular expression could not be executed due to the PCRE recursion-limit. So, if preg_match results in int(0) you seem to have to check preg_last_error() if maybe an error occurred.

Reno
6.01.2009 1:52


I modified your email validation pattern to solve these issues:



- the string MUST contain a TLD

- TLD can be 2 letters long as well as 3 or more (ie: .ca, .us, .uk, .fr, etc.)

- domain name (tld not included) must contain at least 2 characters

- domain name can contain "-"if it's not the first nor the last character.



<?php



$pattern = '/^([a-z0-9])(([-a-z0-9._])*([a-z0-9]))*\@([a-z0-9])' .

'(([a-z0-9-])*([a-z0-9]))+' . '(\.([a-z0-9])([-a-z0-9_-])?([a-z0-9])+)+$/i';



echo preg_match ($pattern, "email-address-to-validate@host.tld");



?>

shamun dot toha at gmail dot com
25.12.2008 23:58


The above patterns are tested but for this type of 

emails those get fails. This is most valid pattern. 

<?php

/** 

 * Most corrected pattern for Email validation.

 *

 */



 // Valid email 

echo preg_match('/^([a-z0-9])(([-a-z0-9._])*([a-z0-9]))*

\@([a-z0-9])*(\.([a-z0-9])([-a-z0-9_-])([a-z0-9])+)*$/i'

,'09_az..AZ@host.dOMain.cOM');



// Invalid emails              

echo preg_match('/^([a-z0-9])(([-a-z0-9._])*([a-z0-9]))*

\@([a-z0-9])*(\.([a-z0-9])([-a-z0-9_-])([a-z0-9])+)*$/i'

,'09_azAZ@ho...st...........domain.com');

                            

echo preg_match('/^([a-z0-9])(([-a-z0-9._])*([a-z0-9]))*

\@([a-z0-9])*(\.([a-z0-9])([-a-z0-9_-])([a-z0-9])+)*$/i'

,'09_azAZ@host.do@main.com');                      

?>

----------------------------

Output:

----------------------------

1 = valid

0 = invalid

0 = invalid

Alex Zinchenko
11.12.2008 3:15


If you need to check whether string is a serialized representation of variable(sic!) you can use this :



<?php



$string = "a:0:{}";

if(preg_match("/(a|O|s|b)\x3a[0-9]*?

((\x3a((\x7b?(.+)\x7d)|(\x22(.+)\x22\x3b)))|(\x3b))/", $string)) 

{

echo "Serialized.";

}

else 

{

echo "Not serialized.";

}



?>



But don't forget, string in serialized representation could be VERY big, 

so match work can be slow, even with fast preg_* functions.

rbotzer at yahoo dot com
1.12.2008 20:36


@Ben:



Your pattern will match 1.1.255.299  (it matches the .29 at the end out of subpattern .299)



This pattern eliminates such false positives:

/^((1?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(1?\d{1,2}|2[0-4]\d|25[0-5]){1}$/



Ronen

dbreen at gmail dot com
21.11.2008 18:35


When I was using the above example's syntax for named capturing groups, it worked fine on my development server (PHP 5.2.6), but then gave me a regex error on the live server (PHP 5.0.4).



By adding a 'P' in front of the parameter name, it seems to have resolved the issue (this is in accordance w/ the PCRE implementation).



To use the above example, here's the original:

<?php

preg_match('/(?<name>\w+): (?<digit>\d+)/', $str, $matches);

?>



And here's the fix:

<?php

preg_match('/(?P<name>\w+): (?P<digit>\d+)/', $str, $matches);

?>

Ben
25.10.2008 8:47


Marc your pattern will match 259.259.259.259



I think you're actually after something like this:



/((1?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(1?\d{1,2}|2[0-4]\d|25[0-5])/

phil dot taylor at gmail dot com
23.10.2008 2:01


If you need to check for .com.br and .com.au and .uk and all the other crazy domain endings i found the following expression works well if you want to validate an email address. Its quite generous in what it will allow





<?php





        $email_address = "phil.taylor@a_domain.tv";





    if (preg_match("/^[^@]*@[^@]*\.[^@]*$/", $email_address)) {


        return "E-mail address";        


    }


        


?>

Jonathan Camenisch
16.10.2008 16:21


@ Marc



A little more work to do--your expression matched ...256... through ...259..., and will not match 1- or 2-digit numbers that do not start with 1. It could also be a little more concise, as in:



/^(1?\d{1,2}|2([0-4]\d|5[0-5]))(\.(1?\d{1,2}|2([0-4]\d|5[0-5]))){3}$/



Also, I put together a primitive regex tester at http://j-r.camenisch.net/regex/ -- to help someone find more flaws to correct. ;-)

Marc
6.10.2008 10:16


@ Steve Todorov:

Your regex will not only match 999.999... but also 9999.9999... etc.



I'd rather take this regex:



/^(1\d{0,2}|2(\d|[0-5]\d)?)\.(1\d{0,2}|2(\d|[0-5]\d)?)

\.(1\d{0,2}|2(\d|[0-5]\d)?)\.(1\d{0,2}|2(\d|[0-5]\d)?)$/



this should represent any ip (v4). At least it did in a small test here ;)

Steve Todorov
3.10.2008 3:23


While I was reading the preg_match documentation I didn't found how to match an IP..

Let's say you need to make a script that is working with ip/host and you want to show the hostname - not the IP.



Well this is the way to go:



<?php

/* This is an ip that is "GET"/"POST" from somewhere */

$ip = $_POST['ipOrHost'];



if(preg_match('/(\d+).(\d+).(\d+).(\d+)/',$ip))

  $host = gethostbyaddr($ip); 

else

  $host = gethostbyname($ip);



echo $host;

?>



This is a really simple script made for beginners !

If you'd like you could add restriction to the numbers. 

The code above will accept all kind of numbers and we know that IP address could be MAX 255.255.255.255 and the example accepts to 999.999.999.999.



Wish you luck!



Best wishes,

Steve

Ashus
12.09.2008 17:18


If you need to match specific wildcards in IP address, you can use this regexp:



<?php



$ip = '10.1.66.22';

$cmp = '10.1.??.*';



$cnt = preg_match('/^'

     .str_replace(

     array('\*','\?'),

     array('(.*?)','[0-9]'),

     preg_quote($cmp)).'$/',

     $ip);



echo $cnt;



?>



where '?' is exactly one digit and '*' is any number of any characters. $cmp mask can be provided wild by user, $cnt equals (int) 1 on match or 0.

wjaspers4[at]gmail[dot]com
28.08.2008 16:55


I found this rather useful for testing mutliple strings when developing a regex pattern.

<?php 

/**

 * Runs preg_match on an array of strings and returns a result set.

 * @author wjaspers4[at]gmail[dot]com

 * @param String $expr The expression to match against

 * @param Array $batch The array of strings to test.

 * @return Array

 */

function preg_match_batch( $expr, $batch=array() )

{

// create a placeholder for our results

    $returnMe = array();



// for every string in our batch ...

    foreach( $batch as $str )

    {

// test it, and dump our findings into $found

        preg_match($expr, $str, $found);



// append our findings to the placeholder

        $returnMe[$str] = $found;

    }



    return $returnMe;

}

?>

Dino Korah AT webroot DOT com
9.07.2008 1:11


preg_match and preg_replace_callback doesnt match up in the structure of the array that they fill-up for a match.

preg_match, as the example shows, supports named patterns, whereas preg_replace_callback doesnt seem to support it at all. It seem to ignore any named pattern matched.

Tim
8.07.2008 17:01


I made a mistake in my previous post. Mail addresses may of course only be "exotic" in their local parts, not in the domain part. Therefore, an exotic mail address would be "exotic#%$mail@domain.com".

Tim
7.07.2008 23:51


For those not so familiar with regex's, I post my algorithmic email validation routine. It can more easily be changed for individual needs than regex's. My function does NOT recognize exotic email addresses as allowed by RFC. (For example, info@exotic%&$#mail.com is a legal email address but not allowed by my function.)

-Tim



<?php

function email_is_valid($email) {

   if (substr_count($email, '@') != 1)

      return false;

   if ($email{0} == '@')

      return false;

   if (substr_count($email, '.') < 1)

      return false;

   if (strpos($email, '..') !== false)

      return false;

   $length = strlen($email);

   for ($i = 0; $i < $length; $i++) {

      $c = $email{$i};

      if ($c >= 'A' && $c <= 'Z')

         continue;

      if ($c >= 'a' && $c <= 'z')

         continue;

      if ($c >= '0' && $c <= '9')

         continue;

      if ($c == '@' || $c == '.' || $c == '_' || $c == '-')

         continue;

      return false;

   }

   $TLD = array (

         'COM',   'NET',

         'ORG',   'MIL',

         'EDU',   'GOV',

         'BIZ',   'NAME',

         'MOBI',  'INFO',

         'AERO',  'JOBS',

         'MUSEUM'

      );

   $tld = strtoupper(substr($email, strrpos($email, '.') + 1));

   if (strlen($tld) != 2 && !in_array($tld, $TLD))

      return false;

   return true;

}

?>

mailinglist dot php at hydras-world dot com
3.07.2008 23:30


The regexp below thinks that the e-mail address:



'me@de.com' is invalid, which it is not.



'/^([a-z0-9])(([-a-z0-9._])*([a-z0-9]))*\@

([a-z0-9])([-a-z0-9_])+([a-z0-9])*

(\.([a-z0-9])([-a-z0-9_-])([a-z0-9])+)*$/i'



I modified it and it seems to work for me in my limited tests of it.



YMMV.

brferreira at grad dot ufsc dot br
26.06.2008 4:48


Paperweight, this pattern worked fine for me (even for intranet adresses, like "john@localhost"; and also for subdomain emails, like "john@foo.bar.com"):

'/([a-z0-9])([-a-z0-9._])+([a-z0-9])\@

([a-z0-9])([-a-z0-9_])+([a-z0-9])

(\.([a-z0-9])([-a-z0-9_-])([a-z0-9])+)*/i'



but, still, this won't replace the "activation link", that is the better way to check if an e-mail is valid or not.

jonathan dot lydall at gmail dot removethispart dot com
26.05.2008 21:50


Because making a truly correct email validation function is harder than one may think, consider using this one which comes with PHP through the filter_var function (http://www.php.net/manual/en/function.filter-var.php):



<?php

$email = "someone@domain .local";



if(!filter_var($email, FILTER_VALIDATE_EMAIL)) {

    echo "E-mail is not valid";

} else {

    echo "E-mail is valid";

}

?>

Georg
4.04.2008 11:36


In addition to reiner-keller's comment about Umlaute using setlocale (LC_ALL, 'de_DE');



To enable 'de_DE' on my Debian 4 machine I first had to:

- uncomment 'de_DE' in file /etc/locale.gen and afterwards

- run locale-gen from the shell

Ein Service von Reinhard Neidl - Webprogrammierung.

preg_match

Beschreibung

Parameter-Liste

Rückgabewerte

Changelog

Beispiele

Anmerkungen

Siehe auch