PHP Doku:: Konvertiert einen "quoted-printable"-String in einen 8-Bit-String

26 BenutzerBeiträge:
- Beiträge aktualisieren...

feedr
6.03.2009 23:39


Another (improved) version of quoted_printable_encode(). Please note the order of the array elements in str_replace().

I've just rewritten the previous function for better readability.



<?php

if (!function_exists("quoted_printable_encode")) {

  /**

  * Process a string to fit the requirements of RFC2045 section 6.7. Note that

  * this works, but replaces more characters than the minimum set. For readability

  * the spaces and CRLF pairs aren't encoded though.

  */

  function quoted_printable_encode($string) {

        $string = str_replace(array('%20', '%0D%0A', '%'), array(' ', "\r\n", '='), rawurlencode($string));

        $string = preg_replace('/[^\r\n]{73}[^=\r\n]{2}/', "$0=\r\n", $string);



        return $string;

  }

}

?>

andre at luyer dot nl
2.11.2008 21:36


A small update for Andrew's code below. This one leaves the original CRLF pairs intact (and allowing the preg_replace to work as intended):



<?php

if (!function_exists("quoted_printable_encode")) {

  /**

  * Process a string to fit the requirements of RFC2045 section 6.7. Note that

  * this works, but replaces more characters than the minimum set. For readability

  * the spaces and CRLF pairs aren't encoded though.

  */

  function quoted_printable_encode($string) {

    return preg_replace('/[^\r\n]{73}[^=\r\n]{2}/', "$0=\r\n",

      str_replace("%", "=", str_replace("%0D%0A", "\r\n", 

        str_replace("%20"," ",rawurlencode($string)))));

  }

}

?>



Regards, André

Karora
20.10.2008 5:27


Taking a bunch of the earlier comments together, you can synthesize a nice short and reasonably efficient quoted_printable_encode function like this:





Note that I put this in my standard library file, so I wrap it in a !function_exists in order that if there is a pre-existing PHP one it will just work and this will evaluate to a noop.





<?php


if ( !function_exists("quoted_printable_encode") ) {


  /**


  * Process a string to fit the requirements of RFC2045 section 6.7.  Note that


  * this works, but replaces more characters than the minimum set. For readability


  * the spaces aren't encoded as =20 though.


  */


  function quoted_printable_encode($string) {


    return preg_replace('/[^\r\n]{73}[^=\r\n]{2}/', "$0=\r\n", str_replace("%","=",str_replace("%20"," ",rawurlencode($string))));


  }


}


?>





Regards,


Andrew McMillan.

zg
27.09.2008 15:48


<?php



function quoted_printable_encode( $str, $chunkLen = 72 )

{

    $offset = 0;

    

    $str = strtr(rawurlencode($str), array('%' => '='));

    $len = strlen($str);

    $enc = '';

    

    while ( $offset < $len )

    {

        if ( $str{ $offset + $chunkLen - 1 } === '=' )

        {

            $line = substr($str, $offset, $chunkLen - 1);

            $offset += $chunkLen - 1;

        }

        elseif ( $str{ $offset + $chunkLen - 2 } === '=' )

        {

            $line = substr($str, $offset, $chunkLen - 2);

            $offset += $chunkLen - 2;

        }

        else 

        {

            $line = substr($str, $offset, $chunkLen);

            $offset += $chunkLen;

        }

        

        if ( $offset + $chunkLen < $len )

            $enc .= $line ."=\n";

        else 

            $enc .= $line;

    }

    

    return $enc;

}



?>

Christian Albrecht
25.09.2008 0:10


In Addition to david lionhead's function:



<?php

function quoted_printable_encode($txt) {

    /* Make sure there are no %20 or similar */

    $txt = rawurldecode($txt);

    $tmp="";

    $line="";

    for ($i=0;$i<strlen($txt);$i++) {

        if (($txt[$i]>='a' && $txt[$i]<='z') || ($txt[$i]>='A' && $txt[$i]<='Z') || ($txt[$i]>='0' && $txt[$i]<='9')) {

            $line.=$txt[$i];

            if (strlen($line)>=75) {

                $tmp.="$line=\n";

                $line="";

            }

        }

        else {

            /* Important to differentiate this case from the above */

            if (strlen($line)>=72) {

            $tmp.="$line=\n";

            $line="";

            }

            $line.="=".sprintf("%02X",ord($txt[$i]));

        }

    }

    $tmp.="$line\n";

    return $tmp;

}

?>

lewiswright [ at t] gmail [d dot t] com
24.09.2008 20:16


In response to Bobo, using:



chunk_split($string, 76, "=\n");



Can result to broken encoding (in some rare cases). This is for a couple of reasons. Firstly, the chunk split could insert a newline between an encoding if it was on the end of a line. So it could turn an entity: =20 in to =2=\n0 - which breaks it. Secondly, the spec says that the line should be no longer than 76 in total, which includes the '=' used to indicate a soft newline.



Additionally, there's a couple of minor readability issues with it. Firstly, chunk_split leaves a trailing newline - which is unnecessary. Lastly, it does not consider newlines already in the text (which breaks readability) so it often wraps when not needed.



Here's a snippet which will fix all the issues mentioned:



preg_replace('/[^\r\n]{73}[^=\r\n]{2}/', "$0=\r\n", $data);

david at lionhead dot nl
18.08.2008 14:54


My version of quoted_printable encode, as the convert.quoted-printable-encode filter breaks on outlook express. This one seems to work on express/outlook/thunderbird/gmail.



function quoted_printable_encode($txt) {

    $tmp="";

    $line="";

    for ($i=0;$i<strlen($txt);$i++) {

        if (($txt[$i]>='a' && $txt[$i]<='z') || ($txt[$i]>='A' && $txt[$i]<='Z') || ($txt[$i]>='0' && $txt[$i]<='9'))

            $line.=$txt[$i];

        else

            $line.="=".sprintf("%02X",ord($txt[$i]));

        if (strlen($line)>=75) {

            $tmp.="$line=\n";

            $line="";

        }

    }

    $tmp.="$line\n";

    return $tmp;

}

Bobo
12.07.2008 18:20


Following up to Hoffa's function, return the result like:



chunk_split($string, 76, "=\n");



to conform to the RFC, which requires that lines are no longer than 76 chars.

h atnospam hoffa dotelidot se
1.04.2008 3:32


I do like this to encode



function quoted_printable_encode($string) {

  $string = rawurlencode($string);

  $string = str_replace("%","=",$string);

  RETURN $string;

}



best regards / Hoffa

vita dot plachy at seznam dot cz
23.03.2008 17:22


<?php


$text = <<<EOF


This function enables you to convert text to a quoted-printable string as well as to create encoded-words used in email headers (see http://www.faqs.org/rfcs/rfc2047.html).





No line of returned text will be longer than specified. Encoded-words will not contain a newline character. Special characters are removed.


EOF;





define('QP_LINE_LENGTH', 75);


define('QP_LINE_SEPARATOR', "\r\n");





function quoted_printable_encode($string, $encodedWord = false)


{


    if(!preg_match('//u', $string)) {


        throw new Exception('Input string is not valid UTF-8');


    }


    


    static $wordStart = '=?UTF-8?Q?';


    static $wordEnd = '?=';


    static $endl = QP_LINE_SEPARATOR;


    


    $lineLength = $encodedWord


        ? QP_LINE_LENGTH - strlen($wordStart) - strlen($wordEnd)


        : QP_LINE_LENGTH;


    


    $string = $encodedWord


        ? preg_replace('~[\r\n]+~', ' ', $string)    // we need encoded word to be single line


        : preg_replace('~\r\n?~', "\n", $string);    // normalize line endings


    $string = preg_replace('~[\x00-\x08\x0B-\x1F]+~', '', $string);    // remove control characters


    


    $output = $encodedWord ? $wordStart : '';


    $charsLeft = $lineLength;


    


    $chr = isset($string{0}) ? $string{0} : null;


    $ord = ord($chr);


    


    for ($i = 0; isset($chr); $i++) {


        $nextChr = isset($string{$i + 1}) ? $string{$i + 1} : null;


        $nextOrd = ord($nextChr);


        


        if (


            $ord > 127 or    // high byte value


            $ord === 95 or    // underscore "_"


            $ord === 63 && $encodedWord or    // "?" in encoded word


            $ord === 61 or    // equal sign "="


            // space or tab in encoded word or at line end


            $ord === 32 || $ord === 9 and $encodedWord || !isset($nextOrd) || $nextOrd === 10


        ) {


            $chr = sprintf('=%02X', $ord);    


        }


        


        if ($ord === 10) {    // line feed


            $output .= $endl;


            $charsLeft = $lineLength;


        } elseif (


            strlen($chr) < $charsLeft or


            strlen($chr) === $charsLeft and $nextOrd === 10 || $encodedWord


        ) {    // add character


            $output .= $chr;


            $charsLeft-=strlen($chr);


        } elseif (isset($nextOrd)) {    // another line needed


            $output .= $encodedWord


                ? $wordEnd . $endl . "\t" . $wordStart . $chr


                : '=' . $endl . $chr;


            $charsLeft = $lineLength - strlen($chr);


        }


        


        $chr = $nextChr;


        $ord = $nextOrd;


    }


    


    return $output . ($encodedWord ? $wordEnd : '');


}





echo quoted_printable_encode($text/*, true*/);

ludwig at gramberg-webdesign dot de
24.09.2007 21:13


my approach for quoted printable encode using the stream converting abilities



<?php

/**

 * @param string $str

 * @return string

 * */

function quoted_printable_encode($str) {

    $fp = fopen('php://temp', 'w+');

    stream_filter_append($fp, 'convert.quoted-printable-encode');

    fwrite($fp, $str);    

    fseek($fp, 0);

    $result = '';

    while(!feof($fp))

        $result .= fread($fp, 1024);

    fclose($fp);

    return $result;

}

?>

roelof
24.07.2007 16:06


I modified the below version of legolas558 at users dot sausafe dot net and added a wrapping option.



<?php

/**

 *    Codeer een String naar zogenaamde 'quoted printable'. Dit type van coderen wordt

 *    gebruikt om de content van 8 bit e-mail berichten als 7 bits te versturen.

 *

 *    @access public

 *    @param string    $str    De String die we coderen

 *    @param bool      $wrap   Voeg linebreaks toe na 74 tekens?

 *    @return string

 */



function quoted_printable_encode($str, $wrap=true)

{

    $return = '';

    $iL = strlen($str);

    for($i=0; $i<$iL; $i++)

    {

        $char = $str[$i];

        if(ctype_print($char) && !ctype_punct($char)) $return .= $char;

        else $return .= sprintf('=%02X', ord($char));

    }

    return ($wrap === true)

        ? wordwrap($return, 74, " =\n")

        : $return;

}



?>

legolas558 at users dot sausafe dot net
21.02.2007 20:09


As soletan at toxa dot de reported, that function is very bad and does not provide valid enquoted printable strings. While using it I saw spam agents marking the emails as QP_EXCESS and sometimes the email client did not recognize the header at all; I really lost time :(. This is the new version (we use it in the Drake CMS core) that works seamlessly:



<?php



//L: note $encoding that is uppercase

//L: also your PHP installation must have ctype_alpha, otherwise write it yourself

function quoted_printable_encode($string, $encoding='UTF-8') {

// use this function with headers, not with the email body as it misses word wrapping

       $len = strlen($string);

       $result = '';

       $enc = false;

       for($i=0;$i<$len;++$i) {

        $c = $string[$i];

        if (ctype_alpha($c))

            $result.=$c;

        else if ($c==' ') {

            $result.='_';

            $enc = true;

        } else {

            $result.=sprintf("=%02X", ord($c));

            $enc = true;

        }

       }

       //L: so spam agents won't mark your email with QP_EXCESS

       if (!$enc) return $string;

       return '=?'.$encoding.'?q?'.$result.'?=';

}



I hope it helps ;)



?>

soletan at toxa dot de
21.02.2007 0:54


Be warned! The method below for encoding text does not work as requested by RFC1521!



Consider a line consisting of 75 'A' and a single é (or similar non-ASCII character) ... the method below would encode and return a line of 78 octets, breaking with RFC 1521, 5.1 Rule #5: "The Quoted-Printable encoding REQUIRES that encoded lines be no more than 76 characters long."



Good QP-encoding takes a bit more than this.

legolas558
12.02.2007 19:32


Please note that in the below encode function there is a bug!



<?php

if (($c==0x3d) || ($c>=0x80) || ($c<0x20))

?>



$c should be checked against less or equal to encode spaces!



so the correct code is



<?php

if (($c==0x3d) || ($c>=0x80) || ($c<=0x20))

?>



Fix the code or post this note, please

MagicalTux at ookoo dot org
6.02.2007 12:44


I checked Thomas Pequet / Memotoo.com's quoted_printable_encode() function, and it's rather non-respective to standards...



Here's my own, fixed version :



<?php

function quoted_printable_encode($string, $linelen = 0, $linebreak="=\r\n", $breaklen = 0, $encodecrlf = false) {

        // Quoted printable encoding is rather simple.

        // Each character in the string $string should be encoded if:

        //  Character code is <0x20 (space)

        //  Character is = (as it has a special meaning: 0x3d)

        //  Character is over ASCII range (>=0x80)

        $len = strlen($string);

        $result = '';

        for($i=0;$i<$len;$i++) {

                if ($linelen >= 76) { // break lines over 76 characters, and put special QP linebreak

                        $linelen = $breaklen;

                        $result.= $linebreak;

                }

                $c = ord($string[$i]);

                if (($c==0x3d) || ($c>=0x80) || ($c<0x20)) { // in this case, we encode...

                        if ((($c==0x0A) || ($c==0x0D)) && (!$encodecrlf)) { // but not for linebreaks

                                $result.=chr($c);

                                $linelen = 0;

                                continue;

                        }

                        $result.='='.str_pad(strtoupper(dechex($c)), 2, '0');

                        $linelen += 3;

                        continue;

                }

                $result.=chr($c); // normal characters aren't encoded

                $linelen++;

        }

        return $result;

}



$test = 'This is a test !! héhéhé ....'."\r\n";

$test.= 'You can write really long pieces of text without having to worry about mail transfert. Quoted printable supports special way to handle such lines,  by instering an escaped linebreak. This linebreak, once converted back to 8bit, will disappear.';



echo quoted_printable_encode(utf8_encode($test))."\r\n";

?>



By default, the function encodes text to be used as a quoted-printable-encoded body, however by tweaking the parameters, you can get subject-header encoding and stuff like that.

Thomas Pequet / Memotoo.com
19.10.2006 15:15


If you want a function to do the reverse of "quoted_printable_decode()", follow the link you will find the "quoted_printable_encode()" function:

http://www.memotoo.com/softs/public/PHP/quoted printable_encode.inc.php



Compatible "ENCODING=QUOTED-PRINTABLE"

Example: 

quoted_printable_encode(ut8_encode("c'est quand l'été ?")) 

-> "c'est quand l'=C3=A9t=C3=A9 ?"

umu
6.02.2006 23:38


...deed.ztinmehc-ut.zrh@umuumu@hrz.tu-chemnitz.deed...



and write [^=]{0,2} in the last regular expression

to avoid the soft linebreak "=" being the 77th char in line,

see my imap_8bit() emulation at    

http://www.php.net/manual/en/function.imap-8bit.php#61216

steffen dot weber at computerbase dot de
22.07.2005 23:08


As the two digit hexadecimal representation SHOULD be in uppercase you want to use "=%02X" (uppercase X) instead of "=%02x" as the first argument to sprintf().

bendi at interia dot pl
29.03.2005 11:08


This function appeared twice already and in both versions it's got error in rule replacing chunk_split, because it misses the last line in encoded text if it's shorter than 73 characters. 



My solution (got rid of replace_callback)

<?

function quoted_printable_encode( $sString ) {

   /*instead of replace_callback i used <b>e</b> modifier for regex rule, which works as eval php function*/

   $sString = preg_replace( '/[^\x21-\x3C\x3E-\x7E\x09\x20]/e', 'sprintf( "=%02x", ord ( "$0" ) ) ;',  $sString );

   /*now added to this rule one or more chars which lets last line to be matched and included in results*/

   preg_match_all( '/.{1,73}([^=]{0,3})?/', $sString, $aMatch );

   return implode( '=' . CR, $aMatch[0] );

}



?>

dmitry at koterov dot ru
19.02.2005 14:26


Previous comment has a bug: encoding of short test does not work because of incorrect usage of preg_match_all(). Have somebody read it at all? :-)



Correct version (seems), with additional imap_8bit() function emulation:



if (!function_exists('imap_8bit')) {

 function imap_8bit($text) {

   return quoted_printable_encode($text);

 }

}



function quoted_printable_encode_character ( $matches ) {

   $character = $matches[0];

   return sprintf ( '=%02x', ord ( $character ) );

}



// based on http://www.freesoft.org/CIE/RFC/1521/6.htm

function quoted_printable_encode ( $string ) {

   // rule #2, #3 (leaves space and tab characters in tact)

   $string = preg_replace_callback (

     '/[^\x21-\x3C\x3E-\x7E\x09\x20]/',

     'quoted_printable_encode_character',

     $string

   );

   $newline = "=\r\n"; // '=' + CRLF (rule #4)

   // make sure the splitting of lines does not interfere with escaped characters

   // (chunk_split fails here)

   $string = preg_replace ( '/(.{73}[^=]{0,3})/', '$1'.$newline, $string);

   return $string;

}

drm at melp dot nl
9.02.2005 12:32


A easier, improved way of encoding for quoted-printable transfer:



------

function quoted_printable_encode_character ( $matches ) {

   $character = end ( $matches );

   return sprintf ( '=%02x', ord ( $character ) );

}



// based on http://www.freesoft.org/CIE/RFC/1521/6.htm

function quoted_printable_encode ( $string ) {

   // rule #2, #3 (leaves space and tab characters in tact)

   $string = preg_replace_callback ( 

      '/[^\x21-\x3C\x3E-\x7E\x09\x20]/', 

      'quoted_printable_encode_character',

      $string 

   ); 

   

   $newline = "=\r\n"; // '=' + CRLF (rule #4)



   // make sure the splitting of lines does not interfere with escaped characters 

   // (chunk_split fails here)

   preg_match_all ( '/.{73}([^=]{0,3})/', $string, $match ); // Rule #1

   return implode ( $newline, $match[0] );

}

-----

tamas dot tompa at kirowski dot com
20.02.2004 16:46


i have found a bug in pob at medienrecht dot NOSPAM dot org

's qp_enc function. in quoted printable messages need to convert the first pont in the lines too...

here is the fixed code:



function qp_enc( $input = "", $line_max = 76, $space_conv = false ) {



    $hex = array('0','1','2','3','4','5','6','7','8','9',

'A','B','C','D','E','F');

    $lines = preg_split("/(?:\r\n|\r|\n)/", $input);

    $eol = "\r\n";

    $escape = "=";

    $output = "";



        while( list(, $line) = each($lines) ) {

                //$line = rtrim($line); // remove trailing white 

space -> no =20\r\n necessary

                $linlen = strlen($line);

                $newline = "";

                for($i = 0; $i < $linlen; $i++) {

                        $c = substr( $line, $i, 1 );

                        $dec = ord( $c );

                        if ( ( $i == 0 ) && ( $dec == 46 ) ) { // 

convert first point in the line into =2E

                                $c = "=2E";

                        }

                        if ( $dec == 32 ) {

                                if ( $i == ( $linlen - 1 ) ) { // convert

 space at eol only

                                        $c = "=20";

                                } else if ( $space_conv ) {

                                        $c = "=20";

                                }

                        } elseif ( ($dec == 61) || ($dec < 32 ) || 

($dec > 126) ) { // always encode "\t", which is *not* required

                                $h2 = floor($dec/16);

                                $h1 = floor($dec%16);

                                $c = $escape.$hex["$h2"].$hex["$h1"];

                        }

                        if ( (strlen($newline) + strlen($c)) >= 

$line_max ) { // CRLF is not counted

                                $output .= $newline.$escape.$eol; // 

soft line break; " =\r\n" is okay

                                $newline = "";

                                // check if newline first character will 

be point or not

                                if ( $dec == 46 ) {

                                        $c = "=2E";

                                }

                        }

                        $newline .= $c;

                } // end of for

                $output .= $newline.$eol;

        } // end of while

        return trim($output);

}

pob at medienrecht dot NOSPAM dot org
18.07.2001 22:06


If you do not have access to imap_* and do not want to use 


�$message = chunk_split( base64_encode($message) );�


because you want to be able to read the �source� of your mails, you might want to try this:


(any suggestions very welcome!)








function qp_enc($input = "quoted-printable encoding test string", $line_max = 76) {





    $hex = array('0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F');


    $lines = preg_split("/(?:\r\n|\r|\n)/", $input);


    $eol = "\r\n";


    $escape = "=";


    $output = "";





    while( list(, $line) = each($lines) ) {


        //$line = rtrim($line); // remove trailing white space -> no =20\r\n necessary


        $linlen = strlen($line);


        $newline = "";


        for($i = 0; $i < $linlen; $i++) {


            $c = substr($line, $i, 1);


            $dec = ord($c);


            if ( ($dec == 32) && ($i == ($linlen - 1)) ) { // convert space at eol only


                $c = "=20"; 


            } elseif ( ($dec == 61) || ($dec < 32 ) || ($dec > 126) ) { // always encode "\t", which is *not* required


                $h2 = floor($dec/16); $h1 = floor($dec%16); 


                $c = $escape.$hex["$h2"].$hex["$h1"]; 


            }


            if ( (strlen($newline) + strlen($c)) >= $line_max ) { // CRLF is not counted


                $output .= $newline.$escape.$eol; // soft line break; " =\r\n" is okay


                $newline = "";


            }


            $newline .= $c;


        } // end of for


        $output .= $newline.$eol;


    }


    return trim($output);





}





$eight_bit = "\xA7 \xC4 \xD6 \xDC \xE4 \xF6 \xFC \xDF         =          xxx             yyy       zzz          \r\n"


            ." \xA7      \r \xC4 \n \xD6 \x09    "; 


print $eight_bit."\r\n---------------\r\n";


$encoded = qp_enc($eight_bit); 


print $encoded;

madmax at express dot ru
4.08.2000 2:41


Some  browser (netscape, for example)


send 8-bit quoted printable text like this:


=C5=DD=A3=D2=C1= =DA





"= =" means continuos word.


 php function not detect this situations and translate in string like:


 abcde=f

gustavf at spamstop dot com
7.07.2000 19:25


This function does not recognize lowercase letters as part of the Quoted Printable encoding. RFC 1521 specifies:





Uppercase letters must be used when sending hexadecimal data, though a robust implementation may choose to recognize lowercase letters on receipt.

Ein Service von Reinhard Neidl - Webprogrammierung.

quoted_printable_decode

Beschreibung

Parameter-Liste

Rückgabewerte

Siehe auch