PHP Doku:: Entfernt HTML- und PHP-Tags aus einem String - function.strip-tags.html

Verlauf / Chronik / History: (1) anzeigen

Sie sind hier:
Doku-StartseitePHP-HandbuchFunktionsreferenzTextverarbeitungZeichenkettenString-Funktionenstrip_tags

Ein Service von Reinhard Neidl - Webprogrammierung.

String-Funktionen

<<strcspn

stripcslashes>>

strip_tags

(PHP 4, PHP 5)

strip_tagsEntfernt HTML- und PHP-Tags aus einem String

Beschreibung

string strip_tags ( string $str [, string $allowable_tags ] )

Diese Funktion versucht, einen String zurückzugeben, der die um alle HTML- und PHP-Tags reduzierte Version von str darstellt. Sie verwendet die gleiche Engine zum Entfernen der Tags wie fgetss().

Parameter-Liste

str

Die Eingabezeichenkette.

allowable_tags

Sie können den optionalen zweiten Parameter verwenden, um die Tags anzugeben, die nicht entfernt werden sollen.

Hinweis:

HTML-Kommentare und PHP-Tags werden ebenfalls entfernt. Dieses Verhalten ist hartkodiert und kann nicht mittels allowable_tags verändert werden.

Rückgabewerte

Gibt die reduzierte Zeichenkette zurück.

Changelog

Version Beschreibung
5.0.0 Die Funktion strip_tags() ist jetzt Binary safe.
4.3.0 HTML-Kommentare werden ab jetzt immer entfernt.

Beispiele

Beispiel #1 strip_tags()-Beispiel

<?php
$text 
'<p>Test-Absatz.</p><!-- Kommentar --> <a href="#fragment">Anderer Text</a>';
echo 
strip_tags($text);
echo 
"\n";

// <p> und <a> zulassen
echo strip_tags($text'<p><a>');
?>

Das oben gezeigte Beispiel erzeugt folgende Ausgabe:

Test-Absatz. Anderer Text
<p>Test-Absatz.</p> <a href="#fragment">Anderer Text</a>

Anmerkungen

Warnung

Da strip_tags() HTML nicht wirklich validiert, kann es passieren, dass bei unvollständigen oder unkorrekten Tags mehr Text/Daten gelöscht werden als erwartet.

Warnung

Diese Funktion modifiziert keine Attribute bei Tags, die via allowable_tags erlaubt wurden, dies betrifft auch style und onmouseover Attribute, die ein böswilliger User verwenden kann, um einen Text zu posten, der von anderen Usern gesehen werden soll.

Siehe auch


48 BenutzerBeiträge:
- Beiträge aktualisieren...
frank at silverwolf media ddoott comm
19.11.2010 11:30
Note that strip_tags may stumble when it encounters two consecutive quotes. Regardless of whether that's a bug or a feature (different PHP versions seem to behave differently) here's a workaround:

<?php
  $wtf
= '
    <p>First line</p>
    <a href=\"foo">bar</a>
    <p>Second line</p>
    <a href=\"foo\"">bar</a>
    <p>Third line</p>
  '
;
  echo
'Raw: ' . $wtf . "\n";
  echo
'strip_tags(): ' . strip_tags ($wtf);
  echo
'Regexp: ' . preg_replace ('/<[^>]*>/', '', $wtf);
?>

Raw output:

  <p>First line</p>
  <a href=\"foo">bar</a>
  <p>Second line</p>
  <a href=\"foo\"">bar</a>
  <p>Third line</p>

strip_tags() output:

  First line
  bar
  Third line

preg_replace() output:

  First line
  bar
  Second line
  bar
  Third line
gagomat at gmail dot com
22.09.2010 17:06
here's the improved strip_only function (originally submitted by LWC / steve).
This one can distinguish between <bla> and <blas>

<?php
function strip_only_tags($str, $tags, $stripContent=false) {
   
$content = '';
    if(!
is_array($tags)) {
       
$tags = (strpos($str, '>') !== false ? explode('>', str_replace('<', '', $tags)) : array($tags));
        if(
end($tags) == '') array_pop($tags);
    }
    foreach(
$tags as $tag) {
        if (
$stripContent)
            
$content = '(.+</'.$tag.'(>|\s[^>]*>)|)';
        
$str = preg_replace('#</?'.$tag.'(>|\s[^>]*>)'.$content.'#is', '', $str);
    }
    return
$str;
}
?>
tom at cowin dot us
28.08.2010 4:04
With most web based user input of more than a line of text, it seems I get 90% 'paste from Word'. I've developed this fn over time to try to strip all of this cruft out. A few things I do here are application specific, but if it helps you - great, if you can improve on it or have a better way - please - post it...

<?php

   
function strip_word_html($text, $allowed_tags = '<b><i><sup><sub><em><strong><u><br>')
    {
       
mb_regex_encoding('UTF-8');
       
//replace MS special characters first
       
$search = array('/&lsquo;/u', '/&rsquo;/u', '/&ldquo;/u', '/&rdquo;/u', '/&mdash;/u');
       
$replace = array('\'', '\'', '"', '"', '-');
       
$text = preg_replace($search, $replace, $text);
       
//make sure _all_ html entities are converted to the plain ascii equivalents - it appears
        //in some MS headers, some html entities are encoded and some aren't
       
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
       
//try to strip out any C style comments first, since these, embedded in html comments, seem to
        //prevent strip_tags from removing html comments (MS Word introduced combination)
       
if(mb_stripos($text, '/*') !== FALSE){
           
$text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');
        }
       
//introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be
        //'<1' becomes '< 1'(note: somewhat application specific)
       
$text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text);
       
$text = strip_tags($text, $allowed_tags);
       
//eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one
       
$text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);
       
//strip out inline css and simplify style tags
       
$search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu', '#<(em|i)[^>]*>(.*?)</(em|i)>#isu', '#<u[^>]*>(.*?)</u>#isu');
       
$replace = array('<b>$2</b>', '<i>$2</i>', '<u>$1</u>');
       
$text = preg_replace($search, $replace, $text);
       
//on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears
        //that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains
        //some MS Style Definitions - this last bit gets rid of any leftover comments */
       
$num_matches = preg_match_all("/\<!--/u", $text, $matches);
        if(
$num_matches){
             
$text = preg_replace('/\<!--(.)*--\>/isu', '', $text);
        }
        return
$text;
    }
cyex at hotmail dot com
8.07.2010 12:40
I thought someone else might find this useful... a simple way to strip BBCode:

<?php

$bbcode_str
= "Here is some [b]bold text[/b] and some [color=#FF0000]red text[/color]!";

$plain_text = strip_tags(str_replace(array('[',']'), array('<','>'), $bbcode_str));

//Outputs: Here is some bold text, and some red text!

?>
php at wizap dot com
17.04.2010 1:58
This could be overkill but this strips all HTML tags and gives you the option to preserve the ones you define. It also takes into account tags like <script> removing all the javascript, too! You can also strip out all the content between any tag that has an opening and closing tag, like <table>, <object>, etc.

Have fun. Let me know what you think. http://zt.a.atr.im

<?php
   
function remove_HTML($s , $keep = '' , $expand = 'script|style|noframes|select|option'){
       
/**///prep the string
       
$s = ' ' . $s;
       
       
/**///initialize keep tag logic
       
if(strlen($keep) > 0){
           
$k = explode('|',$keep);
            for(
$i=0;$i<count($k);$i++){
               
$s = str_replace('<' . $k[$i],'[{(' . $k[$i],$s);
               
$s = str_replace('</' . $k[$i],'[{(/' . $k[$i],$s);
            }
        }
       
       
//begin removal
        /**///remove comment blocks
       
while(stripos($s,'<!--') > 0){
           
$pos[1] = stripos($s,'<!--');
           
$pos[2] = stripos($s,'-->', $pos[1]);
           
$len[1] = $pos[2] - $pos[1] + 3;
           
$x = substr($s,$pos[1],$len[1]);
           
$s = str_replace($x,'',$s);
        }
       
       
/**///remove tags with content between them
       
if(strlen($expand) > 0){
           
$e = explode('|',$expand);
            for(
$i=0;$i<count($e);$i++){
                while(
stripos($s,'<' . $e[$i]) > 0){
                   
$len[1] = strlen('<' . $e[$i]);
                   
$pos[1] = stripos($s,'<' . $e[$i]);
                   
$pos[2] = stripos($s,$e[$i] . '>', $pos[1] + $len[1]);
                   
$len[2] = $pos[2] - $pos[1] + $len[1];
                   
$x = substr($s,$pos[1],$len[2]);
                   
$s = str_replace($x,'',$s);
                }
            }
        }
       
       
/**///remove remaining tags
       
while(stripos($s,'<') > 0){
           
$pos[1] = stripos($s,'<');
           
$pos[2] = stripos($s,'>', $pos[1]);
           
$len[1] = $pos[2] - $pos[1] + 1;
           
$x = substr($s,$pos[1],$len[1]);
           
$s = str_replace($x,'',$s);
        }
       
       
/**///finalize keep tag
       
for($i=0;$i<count($k);$i++){
           
$s = str_replace('[{(' . $k[$i],'<' . $k[$i],$s);
           
$s = str_replace('[{(/' . $k[$i],'</' . $k[$i],$s);
        }
       
        return
trim($s);
    }
?>
LWC
1.03.2010 16:07
Here is support for stripping content for the reverse strip_tags function:

<?php
function strip_only($str, $tags, $stripContent = false) {
   
$content = '';
    if(!
is_array($tags)) {
       
$tags = (strpos($str, '>') !== false ? explode('>', str_replace('<', '', $tags)) : array($tags));
        if(
end($tags) == '') array_pop($tags);
    }
    foreach(
$tags as $tag) {
        if (
$stripContent)
            
$content = '(.+</'.$tag.'[^>]*>|)';
        
$str = preg_replace('#</?'.$tag.'[^>]*>'.$content.'#is', '', $str);
    }
    return
$str;
}

$str = '<font color="red">red</font> text';
$tags = 'font';
$a = strip_only($str, $tags); // red text
$b = strip_only($str, $tags, true); // text
?>

Note this function always assumed no two tags start the same way (e.g. <bla> and <blas>) and therefore censors blas along with bla.
Steve
16.09.2009 19:03
Here is a function like strip_tags, only it removes only the tags (with attributes) specified:

<?php
function strip_only($str, $tags) {
    if(!
is_array($tags)) {
       
$tags = (strpos($str, '>') !== false ? explode('>', str_replace('<', '', $tags)) : array($tags));
        if(
end($tags) == '') array_pop($tags);
    }
    foreach(
$tags as $tag) $str = preg_replace('#</?'.$tag.'[^>]*>#is', '', $str);
    return
$str;
}

$str = '<p style="text-align:center">Paragraph</p><strong>Bold</strong><br/><span style="color:red">Red</span><h1>Header</h1>';

echo
strip_only($str, array('p', 'h1'));
echo
strip_only($str, '<p><h1>');
?>

Both return:
Paragraph<strong>Bold</strong><br/><span style="color:red">Red</span>Header

Hope this helps somebody else
dan at micamedia dot com
9.09.2009 5:11
re-wrote the strip_selected_tags function below to work for XHML self closing tags.

<?php
function strip_selected_tags($str, $tags = "", $stripContent = false)
{
   
preg_match_all("/<([^>]+)>/i", $tags, $allTags, PREG_PATTERN_ORDER);
    foreach (
$allTags[1] as $tag) {
       
$replace = "%(<$tag.*?>)(.*?)(<\/$tag.*?>)%is";
       
$replace2 = "%(<$tag.*?>)%is";
        echo
$replace;
        if (
$stripContent) {
           
$str = preg_replace($replace,'',$str);
           
$str = preg_replace($replace2,'',$str);
        }
           
$str = preg_replace($replace,'${2}',$str);
           
$str = preg_replace($replace2,'${2}',$str);
    }
    return
$str;
}
?>
nauthiz693 at gmail dot com
12.06.2009 22:31
Wanted a function to do what nick's was supposed to do: "strip tags and attributes, but with allowable attributes." but I couldn't get his to work properly, I think it had something to do with greedy / non greedy searching.  Anyway, I modified his a bit:

<?php
function strip_tags_attributes($string,$allowtags=NULL,$allowattributes=NULL){
   
$string = strip_tags($string,$allowtags);
    if (!
is_null($allowattributes)) {
        if(!
is_array($allowattributes))
           
$allowattributes = explode(",",$allowattributes);
        if(
is_array($allowattributes))
           
$allowattributes = implode(")(?<!",$allowattributes);
        if (
strlen($allowattributes) > 0)
           
$allowattributes = "(?<!".$allowattributes.")";
       
$string = preg_replace_callback("/<[^>]*>/i",create_function(
           
'$matches',
           
'return preg_replace("/ [^ =]*'.$allowattributes.'=(\"[^\"]*\"|\'[^\']*\')/i", "", $matches[0]);'   
       
),$string);
    }
    return
$string;
}
?>

[EDIT BY danbrown AT php DOT net: Original function by (nick AT optixsolutions DOT co DOT uk) on 31-MAR-09 with the following note:]

Function to strip tags and attributes, but with allowable attributes.

Usage:

Allowable attributes can be comma seperated or array

Example:

<?php strip_tags_attributes($string,'<strong><em><a>','href,rel'); ?>
roly426 at gmail dot com
9.06.2009 12:41
To check for broken tags an easier approach would be to use xml_parse(). You can also get a descriptive error with xml_get_error_code().
magus at otserv dot com dot br
7.06.2009 9:41
Here's a function that verify if in a string have broken HTML tags, returning true if do. Useful for previning broken tags to affect the page.

<?php
# Example usage: broken_tags("<b>This is a string</u>") returns TRUE, broken_tags("<b>This is a string</b>") returns FALSE.
function broken_tags($str)
{
   
preg_match_all("/(<\w+)(?:.){0,}?>/", $str, $v1);
   
preg_match_all("/<\/\w+>/", $str, $v2);
   
$open = array_map('strtolower', $v1[1]);
   
$closed = array_map('strtolower', $v2[0]);
    foreach (
$open as $tag)
    {
       
$end_tag = preg_replace("/<(.*)/", "</$1>", $tag);
        if (!
in_array($end_tag, $closed)) return true;
        unset(
$closed[array_search($end_tag, $closed)]);
    }
    return
false;
}
?>
brettz9 AAT yah
5.04.2009 17:10
Works on shortened <?...?> syntax and thus also will remove XML processing instructions.
hongong at webafrica dot org dot za
26.03.2009 21:52
An easy way to clean a string of all CDATA encapsulation.

<?php
function strip_cdata($string)
{
   
preg_match_all('/<!\[cdata\[(.*?)\]\]>/is', $string, $matches);
    return
str_replace($matches[0], $matches[1], $string);
}
?>

Example: echo strip_cdata('<![CDATA[Text]]>');
Returns: Text
nathan@8
8.03.2009 19:55
Improperly formatted javascript being add to tags and the limit of 15 instences of recursion before memory allocation runs out are some of the concerns involved in coding.  Here is the code that I created to leave tags intact but strip scripting from only inside the tags...

<?php
function strip_javascript($filter){
  
   
// realign javascript href to onclick
   
$filter = preg_replace("/href=(['\"]).*?javascript:(.*)?
\\1/i"
, "onclick=' $2 '", $filter);

   
//remove javascript from tags
   
while( preg_match("/<(.*)?javascript.*?\(.*?((?>[^()]+)
|(?R)).*?\)?\)(.*)?>/i"
, $filter))
       
$filter = preg_replace("/<(.*)?javascript.*?\(.*?((?>
[^()]+)|(?R)).*?\)?\)(.*)?>/i"
, "<$1$3$4$5>", $filter);
            
   
// dump expressions from contibuted content
   
if(0) $filter = preg_replace("/:expression\(.*?((?>[^
(.*?)]+)|(?R)).*?\)\)/i"
, "", $filter);

    while(
preg_match("/<(.*)?:expr.*?\(.*?((?>[^()]+)|(?
R)).*?\)?\)(.*)?>/i"
, $filter))
       
$filter = preg_replace("/<(.*)?:expr.*?\(.*?((?>[^()]
+)|(?R)).*?\)?\)(.*)?>/i"
, "<$1$3$4$5>", $filter);
       
   
// remove all on* events   
   
while( preg_match("/<(.*)?\s?on.+?=?\s?.+?(['\"]).*?\\2
\s?(.*)?>/i"
, $filter) )
      
$filter = preg_replace("/<(.*)?\s?on.+?=?\s?.+?
(['\"]).*?\\2\s?(.*)?>/i"
, "<$1$3>", $filter);

    return
$filter;
}
?>

As you can see this does not clean up correctly... it does
however remove dangerous stuff...

<a href=javascript: { {({}{}()())}alert('xss') ) ) }>

<div onload..;,;..'alert(\"xss_attack\");'>

<a href='javascript:{ alert(\"xss_attack\"); otherxss();}'
 onclick= 'alert(\"xss_attack\");' onhover='alert
(\"xss_attack\");' onmouseout=alert(\"xss_attack\")
class='thisclass'> link</a>

style='width:expression(alert(\"xss_attack\"));'

This can be completed before using strip_tags().
kai at froghh dot de
6.03.2009 17:45
a function that decides if < is a start of a tag or a lower than / lower than + equal:

<?php
function lt_replace($str){
    return
preg_replace("/<([^[:alpha:]])/", '&lt;\\1', $str);
}
?>

It's to be used before strip_slashes.
mehul dot g12 at gmail dot com
3.03.2009 12:22
Strip tags doesn't work fine if we have '<' symbol in the string followed immediately by any letter. but it doed work fine if there is a space after '<' symbol. e.g.

<?php
strip_tags
('<p>1<4</p>');  //won't work fine
strip_tags('<p>1 < 4</p>');  //will work fine
?>

to solve this problem, I used a simple logic. This code will replace '<' by html char, if it not a part of html tag.

My version of strip_tags is as follow:

<?php
function my_strip_tags($str) {
   
$strs=explode('<',$str);
   
$res=$strs[0];
    for(
$i=1;$i<count($strs);$i++)
    {
        if(!
strpos($strs[$i],'>'))
           
$res = $res.'&lt;'.$strs[$i];
        else
           
$res = $res.'<'.$strs[$i];
    }
    return
strip_tags($res);   
}
?>
CEO at CarPool2Camp dot org
17.02.2009 20:10
For some reason, this note got removed, perhaps because a moderator thought it was a bug report.  I hope awareness of this
"Interesting Behavior" can save someone from an unpleasant surprise.  Note the different outputs from different versions of the same tag:

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br>');
var_dump($new);  // OUTPUTS string(21) "<br>EachNew<br />Line"

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br/>');
var_dump($new); // OUTPUTS string(16) "Each<br/>NewLine"

<?php // striptags.php
$data = '<br>Each<br/>New<br />Line';
$new  = strip_tags($data, '<br />');
var_dump($new); // OUTPUTS string(11) "EachNewLine"
Leendert W
26.01.2009 23:51
Maybe also a usefull function for someone.

<?php
function removeUnsafeAttributesAndGivenTags($input, $validTags = '')
{
   
$regex = '#\s*<(/?\w+)\s+(?:on\w+\s*=\s*(["\'\s])?.+?
\(\1?.+?\1?\);?\1?|style=["\'].+?["\'])\s*>#is'
;
    return
preg_replace($regex, '<${1}>',strip_tags($input, $validTags));
}
?>
phzzyzhou at gmail dot com
17.01.2009 11:01
strip_tags will strip '<' and the string behind, like this

<?php
$str
= <<<EOF
123 < 456
<a>link</a>
bbb
EOF;

echo
strip_tags($str);
?>

will output:
123

---------------------------------
this function will repiar this

<?php
function will_strip_tags($str) {
    do {
       
$count = 0;
       
$str = preg_replace('/(<)([^>]*?<)/' , '&lt;$2' , $str , -1 , $count);
    } while (
$count > 0);
   
$str = strip_tags($str);
   
$str = str_replace('>' , '&gt;' , $str);
    return
$str;
}

echo
will_strip_tags($str);
?>

will output:
123 &lt; 456
link
bbb
tleblan at pricegrabber dot com
14.01.2009 4:03
I think it is worth mentioning that if some tags are allowed using the second parameter, this function does not allow to strip attributes within the allowed tags and hence should not be used against XSS vulnerabilities.

One can still execute javascript by 2 means:
- by inserting attributes that typically accept javascript
  >> onClick="alert('XSS');"
- by using styles
  >> style="width:expression(alert('XSS'));" (works on IE7 and probably other versions)
mariusz.tarnaski at wp dot pl
12.11.2008 17:05
Hi. I made a function that removes the HTML tags along with their contents:

Function:
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {

 
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
 
$tags = array_unique($tags[1]);
   
  if(
is_array($tags) AND count($tags) > 0) {
    if(
$invert == FALSE) {
      return
preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
    }
    else {
      return
preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
    }
  }
  elseif(
$invert == FALSE) {
    return
preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
  }
  return
$text;
}
?>

Sample text:
$text = '<b>sample</b> text with <div>tags</div>';

Result for strip_tags($text):
sample text with tags

Result for strip_tags_content($text):
 text with

Result for strip_tags_content($text, '<b>'):
<b>sample</b> text with

Result for strip_tags_content($text, '<b>', TRUE);
 text with <div>tags</div>

I hope that someone is useful :) The exact explanation for Polish PHP programmers at http://www.tarnaski.eu/blog/rozszerzone-strip_tags/
lucky760 at VideoSift dot com
20.10.2008 19:21
It's come to my attention that PHP's strip_tags has been doing something funky to some video embed codes that our members submit. I'm not sure the exact situation, but whenever there is a <param> tag that is very long, strip_tags() will completely remove the tag even though it's specified as an allowable tag.

Here's an example of the existing problem:
<?php
// a single very long <param> tag
$html =<<<EOF
<param name="flashVars" value="skin=http%3A//cdn-i.dmdentertainm
...[snip]...
vie%20of%20All-Time"/>
EOF;

echo
strip_tags($html, '<param>');
// this outputs an empty string
?>

This is the function I built to fix and extend the functionality of strip_tags(). The args are:
- $i_html - the HTML string to be parsed
- $i_allowedtags - an array of allowed tag names
- $i_trimtext - whether or not to strip all text outside of the allowed tags

<?php

function real_strip_tags($i_html, $i_allowedtags = array(), $i_trimtext = FALSE) {
  if (!
is_array($i_allowedtags))
   
$i_allowedtags = !empty($i_allowedtags) ? array($i_allowedtags) : array();
 
$tags = implode('|', $i_allowedtags);

  if (empty(
$tags))
   
$tags = '[a-z]+';

 
preg_match_all('@</?\s*(' . $tags . ')(\s+[a-z_]+=(\'[^\']+\'|"[^"]+"))*\s*/?>@i', $i_html, $matches);

 
$full_tags = $matches[0];
 
$tag_names = $matches[1];

  foreach (
$full_tags as $i => $full_tag) {
    if (!
in_array($tag_names[$i], $i_allowedtags))
      if (
$i_trimtext)
        unset(
$full_tags[$i]);
      else
       
$i_html = str_replace($full_tag, '', $i_html);
  }

  return
$i_trimtext ? implode('', $full_tags) : $i_html;
}
?>

And here's an example with the a block of full video embed code with <object><embed><param> and some extraneous HTML:

<?php
$html
=<<<EOF
<em><div><object type="application/x-shock
...[snip]...
me.html">Wal-Mart Makes The Worst Movie of All-Time</a> -- powered by whatever</div></em>
EOF;

$good_html = real_strip_tags($html, array('object', 'embed', 'param'), TRUE);

?>

Now $good_html contains only the specified tags and none of the "powered by" type text. I hope someone finds this as useful as I needed it to be. :)
southsentry at yahoo dot com
25.09.2008 18:15
I was looking for a simple way to ban html from review posts, and the like. I have seen a few classes to do it. This line, while it doesn't strip the post, effectively blocks people from posting html in review and other forms.

<?php
if (strlen(strip_tags($review)) < strlen($review)) {
    return
false;
}
?>

If you want to further get by the tricksters that use & for html links, include this:

<?php
if (strlen(strip_tags($review)) < strlen($review)) {
        return
false;
} elseif (
strpos($review, "&") !== false) {
        return
5;
}
?>

I hope this helps someone out!
Liam Morland
24.08.2008 2:58
Here is a suggestion for getting rid of attributes: After you run your HTML through strip_tags(), use the DOM interface to parse the HTML. Recursively walk through the DOM tree and remove any unwanted attributes. Serialize the DOM back to the HTML string.

Don't make the default permit mistake: Make a list of the attributes you want to ALLOW and remove any others, rather than removing a specific list, which may be missing something important.
Kalle Sommer Nielsen
31.03.2008 0:05
This adds alot of missing javascript events on the strip_tags_attributes() function from below entries.

Props to MSDN for lots of them ;)

<?php
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavaible', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragdrop', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterupdate', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmoveout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload'))
    {
        if (empty(
$aDisabledAttributes)) return strip_tags($sSource, implode('', $aAllowedTags));

        return
preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
    }
?>
bstrick at gmail dot com
15.01.2008 18:52
This will strip all PHP and HTML out of a file.  Leaves only plain txt.

<?php
// Open the search file
$file = fopen($filename, 'r');
               
// Get rid of all PHP code.       
$search = array('/<\?((?!\?>).)*\?>/s');
       
$text = fread($file, filesize($filename));

$new = strip_tags(preg_replace($search, '', $text));

echo
$new;

fclose($file);
?>

- Strick
y5
15.01.2008 17:59
An improved version of tREXX and Tony Freeman's code, this keeps the code clean while removing unwanted attributes, including the javascript: protocol. Unlike the built-in strip_tags() function, this takes an array for allowed tags, rather than a string. For example: array('<a>', '<object>');

I don't understand why the built-in function uses a string.. oh well =)

<?php
   
function strip_tags_attributes($sSource, $aAllowedTags = array(), $aDisabledAttributes = array('onclick', 'ondblclick', 'onkeydown', 'onkeypress', 'onkeyup', 'onload', 'onmousedown', 'onmousemove', 'onmouseout', 'onmouseover', 'onmouseup', 'onunload'))
    {
        if (empty(
$aDisabledEvents)) return strip_tags($sSource, implode('', $aAllowedTags));

        return
preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $aDisabledAttributes) . ")=[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($sSource, implode('', $aAllowedTags)));
    }
?>
Matthieu Larcher
27.06.2007 17:44
I noticed some problems with the strip_selected_tags() function below, sometimes big chunks of contents where suppressed...
Here is a modified version that should run better.

<?php
function strip_selected_tags($text, $tags = array())
{
   
$args = func_get_args();
   
$text = array_shift($args);
   
$tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
    foreach (
$tags as $tag){
        while(
preg_match('/<'.$tag.'(|\W[^>]*)>(.*)<\/'. $tag .'>/iusU', $text, $found)){
           
$text = str_replace($found[0],$found[2],$text);
        }
    }

    return
preg_replace('/(<('.join('|',$tags).')(|\W.*)\/>)/iusU', '', $text);
}

?>
birwin at suddensales dot com
23.06.2007 9:18
This is an upgrade to the illegal characters script by rodt [on 16-JAN-07]. This script will handle the input, even if the one or all of the fileds include arrays. Of course another loop could be added to handle compound arrays within arrays, but if you are savvy enough to be using compound arrays, you don't need me to rewrite the program.

<?php
function screenForm($ary_check_for_html)
{
   
// check array - reject if any content contains HTML.
   
foreach($ary_check_for_html as $field_value)
    {       
        if(
is_array($field_value))
        {
            foreach(
$field_value as $field_array// if the field value is an array, step through it
           
{
           
$stripped = strip_tags($field_array);
                if(
$field_array!=$stripped)
                {
               
// something in the field value was HTML
               
return false;
                }
            }
        }else{
           
$stripped = strip_tags($field_value);
                if(
$field_value!=$stripped)
                {
               
// something in the field value was HTML
               
return false;
                }
            }
    }
    return
true;
}  
?>
bermi ferrer
27.11.2006 10:40
Here is a faster and tested version of strip_selected_tags.

Previous example had a small bug that has been fixed now.

<?php

   
function strip_selected_tags($text, $tags = array())
    {
       
$args = func_get_args();
       
$text = array_shift($args);
       
$tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
        foreach (
$tags as $tag){
            if(
preg_match_all( '/<'.$tag.'[^>]*>([^<]*)<\/'.$tag.'>/iu', $text, $found) ){
               
$text = str_replace($found[0],$found[1],$text);
            }
        }

        return
preg_replace( '/(<('.join('|',$tags).')(\\n|\\r|.)*\/>)/iu', '', $text);
    }

?>
David
5.11.2006 20:29
<?php

   
/**
     * strip_selected_tags ( string str [, string strip_tags[, strip_content flag]] )
     * ---------------------------------------------------------------------
     * Like strip_tags() but inverse; the strip_tags tags will be stripped, not kept.
     * strip_tags: string with tags to strip, ex: "<a><p><quote>" etc.
     * strip_content flag: TRUE will also strip everything between open and closed tag
     */
   
public function strip_selected_tags($str, $tags = "", $stripContent = false)
    {
       
preg_match_all("/<([^>]+)>/i",$tags,$allTags,PREG_PATTERN_ORDER);
        foreach (
$allTags[1] as $tag){
            if (
$stripContent) {
               
$str = preg_replace("/<".$tag."[^>]*>.*<\/".$tag.">/iU","",$str);
            }
           
$str = preg_replace("/<\/?".$tag."[^>]*>/iU","",$str);
        }
        return
$str;
    }

?>
jausions at php dot net
19.09.2006 8:57
To sanitize any user input, you should also consider PEAR's HTML_Safe package.

http://pear.php.net/package/HTML_Safe
admin at automapit dot com
9.08.2006 19:01
<?php
function html2txt($document){
$search = array('@<script[^>]*?>.*?</script>@si'// Strip out javascript
              
'@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
              
'@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
              
'@<![\s\S]*?--[ \t\n\r]*>@'         // Strip multi-line comments including CDATA
);
$text = preg_replace($search, '', $document);
return
$text;
}
?>

This function turns HTML into text... strips tags, comments spanning multiple lines including CDATA, and anything else that gets in it's way.

It's a frankenstein function I made from bits picked up on my travels through the web, thanks to the many who have unwittingly contributed!
JeremysFilms.com
7.04.2006 22:57
A simple little function for blocking tags by replacing the '<' and '>' characters with their HTML entities.  Good for simple posting systems that you don't want to have a chance of stripping non-HTML tags, or just want everything to show literally without any security issues:

<?php

function block_tags($string){
   
$replaced_string = str_ireplace('<','&lt',$string);
   
$replaced_string = str_ireplace('>','&gt',$replaced_string);
    return
$replaced_string;
}

echo
block_tags('<b>HEY</b>'); //Returns &ltb&gtHEY&lt/b&gt

?>
cesar at nixar dot org
7.03.2006 20:44
Here is a recursive function for strip_tags like the one showed in the stripslashes manual page.

<?php
function strip_tags_deep($value)
{
  return
is_array($value) ?
   
array_map('strip_tags_deep', $value) :
   
strip_tags($value);
}

// Example
$array = array('<b>Foo</b>', '<i>Bar</i>', array('<b>Foo</b>', '<i>Bar</i>'));
$array = strip_tags_deep($array);

// Output
print_r($array);
?>
salavert at~ akelos
13.02.2006 11:21
<?php
      
/**
    * Works like PHP function strip_tags, but it only removes selected tags.
    * Example:
    *     strip_selected_tags('<b>Person:</b> <strong>Salavert</strong>', 'strong') => <b>Person:</b> Salavert
    */

   
function strip_selected_tags($text, $tags = array())
    {
       
$args = func_get_args();
       
$text = array_shift($args);
       
$tags = func_num_args() > 2 ? array_diff($args,array($text))  : (array)$tags;
        foreach (
$tags as $tag){
            if(
preg_match_all('/<'.$tag.'[^>]*>(.*)<\/'.$tag.'>/iU', $text, $found)){
               
$text = str_replace($found[0],$found[1],$text);
          }
        }

        return
$text;
    }

?>

Hope you find it useful,

Jose Salavert
webmaster at tmproductionz dot com
2.02.2006 4:28
<?php

function remove_tag ( $tag , $data ) {
   
    while (
eregi ( "<" . $tag , $data ) ) {
       
       
$it    = stripos ( $data , "<" . $tag   ) ;
               
       
$it2   = stripos ( $data , "</" . $tag . ">" ) + strlen ( $tag ) + 3 ;
               
       
$temp  = substr ( $data , 0    , $it  ) ;
   
       
$temp2 = substr ( $data , $it2 , strlen ( $data ) ) ;
       
       
$data = $temp . $temp2 ;
           
    }
   
    return
$data ;
   
}

?>

this code will remove only and all of the specified tag from a given haystack.

10.08.2005 21:08
<?php
/**removes specifed tags from the text where each tag requires a
     *closing tag and if the later
     *is not found then everything after will be removed
     *typical usage:
     *some html text, array('script','body','html') - all lower case*/
   
public static function removeTags($text,$tags_array){
       
$length = strlen($text);
       
$pos =0;
       
$tags_array = $array_flip($tags_array);
        while (
$pos < $length && ($pos = strpos($text,'<',$pos)) !== false){
           
$dlm_pos = strpos($text,' ',$pos);
           
$dlm2_pos = strpos($text,'>',$pos);
            if (
$dlm_pos > $dlm2_pos)$dlm_pos=$dlm2_pos;
           
$which_tag = strtolower(substr($text,$pos+1,$dlm_pos-($pos+1)));
           
$tag_length = strlen($srch_tag);
            if (!isset(
$tags_array[$which_tag])){
               
//if no tag matches found
               
++$pos;
                continue;
            }
           
//find the end
           
$sec_tag = '</'.$which_tag.'>';
           
$sec_pos = stripos($text,$sec_tag,$pos+$tag_length);
           
//remove everything after if end of the tag not found
           
if ($sec_pos === false) $sec_pos = $length-strlen($sec_tag);
           
$rmv_length = $sec_pos-$pos+strlen($sec_tag);
           
$text = substr_replace($text,'',$pos,$rmv_length);
           
//update length
           
$length = $length - $rmv_length;
           
$pos++;
        }
        return
$text;
    }
?>
anonymous
27.05.2005 21:45
Someone can use attributes like CSS in the tags.
Example, you strip all tagw except <b> then a user can still do <b style="color: red; font-size: 45pt">Hello</b> which might be undesired.

Maybe BB Code would be something.
eric at direnetworks dot com
21.12.2004 3:36
the strip_tags() function in both php 4.3.8 and 5.0.2 (probably many more, but these are the only 2 versions I tested with) have a max tag length of 1024.  If you're trying to process a tag over this limit, strip_tags will not return that line (as if it were an illegal tag).   I noticed this problem while trying to parse a paypal encrypted link button (<input type="hidden" name="encrypted" value="encryptedtext">, with <input> as an allowed tag), which is 2702 characters long.  I can't really think of any workaround for this other than parsing each tag to figure out the length, then only sending it to strip_tags() if its under 1024, but at that point, I might as well be stripping the tags myself.
@dada
29.09.2004 14:41
if you  only want to have the text within the tags, you can use this function:

<?php
function showtextintags($text)

{

$text = preg_replace("/(\<script)(.*?)(script>)/si", "dada", "$text");
$text = strip_tags($text);
$text = str_replace("<!--", "&lt;!--", $text);
$text = preg_replace("/(\<)(.*?)(--\>)/mi", "".nl2br("\\2")."", $text);

return
$text;

}
?>

it will show all the text without tags and (!!!) without javascripts
Anonymous User
22.08.2004 18:24
Be aware that tags constitute visual whitespace, so stripping may leave the resulting text looking misjoined.

For example,

"<strong>This is a bit of text</strong><p />Followed by this bit"

are seperable paragraphs on a visual plane, but if simply stripped of tags will result in

"This is a bit of textFollowed by this bit"

which may not be what you want, e.g. if you are creating an excerpt for an RSS description field.

The workaround is to force whitespace prior to stripping, using something like this:

<?php
      $text
= getTheText();
     
$text = preg_replace('/</',' <',$text);
     
$text = preg_replace('/>/','> ',$text);
     
$desc = html_entity_decode(strip_tags($text));
     
$desc = preg_replace('/[\n\r\t]/',' ',$desc);
     
$desc = preg_replace('/  /',' ',$desc);
?>
Isaac Schlueter php at isaacschlueter dot com
17.08.2004 4:32
steven --at-- acko --dot-- net pointed out that you can't make strip_slashes allow comments.  With this function, you can.  Just pass <!--> as one of the allowed tags.  Easy as pie: just pull them out, strip, and then put them back.

<?php
function strip_tags_c($string, $allowed_tags = '')
{   
   
$allow_comments = ( strpos($allowed_tags, '<!-->') !== false );
    if(
$allow_comments )
    {
       
$string = str_replace(array('<!--', '-->'), array('&lt;!--', '--&gt;'), $string);
       
$allowed_tags = str_replace('<!-->', '', $allowed_tags);
    }
   
$string = strip_tags( $string, $allowed_tags );
    if(
$allow_comments ) $string = str_replace(array('&lt;!--', '--&gt;'), array('<!--', '-->'), $string);
    return
$string;
}
?>
Isaac Schlueter php at isaacschlueter dot com
16.08.2004 8:16
I am creating a rendering plugin for a CMS system (http://b2evolution.net) that wraps certain bits of text in acronym tags.  The problem is that if you have something like this:
<a href="http://www.php.net" title="PHP is cool!">PHP</a>

then the plugin will mangle it into:

<a href="http://www.<acronym title="PHP: Hypertext Processor">php</acronym>.net" title="<acronym title="PHP: Hypertext Processor">PHP</acronym> is cool!>PHP</a>

This function will strip out tags that occur within other tags.  Not super-useful in tons of situations, but it was an interesting puzzle.  I had started out using preg_replace, but it got riduculously complicated when there were linebreaks and multiple instances in the same tag.

The CMS does its XHTML validation before the content gets to the plugin, so we can be pretty sure that the content is well-formed, except for the tags inside of other tags.

<?php
if( !function_exists( 'antiTagInTag' ) )
{
   
// $content is the string to be anti-tagintagged, and $format sets the format of the internals.
   
function antiTagInTag( $content = '', $format = 'htmlhead' )
    {
        if( !
function_exists( 'format_to_output' ) )
        {   
// Use the external function if it exists, or fall back on just strip_tags.
           
function format_to_output($content, $format)
            {
                return
strip_tags($content);
            }
        }
       
$contentwalker = 0;
       
$length = strlen( $content );
       
$tagend = -1;
        for(
$tagstart = strpos( $content, '<', $tagend + 1 ) ; $tagstart !== false && $tagstart < strlen( $content ); $tagstart = strpos( $content, '<', $tagend ) )
        {
           
// got the start of a tag.  Now find the proper end!
           
$walker = $tagstart + 1;
           
$open = 1;
            while(
$open != 0 && $walker < strlen( $content ) )
            {
               
$nextopen = strpos( $content, '<', $walker );
               
$nextclose = strpos( $content, '>', $walker );
                if(
$nextclose === false )
                {   
// ERROR! Open waka without close waka!
                    // echo '<code>Error in antiTagInTag - malformed tag!</code> ';
                   
return $content;
                }
                if(
$nextopen === false || $nextopen > $nextclose )
                {
// No more opens, but there was a close; or, a close happens before the next open.
                    // walker goes to the close+1, and open decrements
                   
$open --;
                   
$walker = $nextclose + 1;
                }
                elseif(
$nextopen < $nextclose )
                {
// an open before the next close
                   
$open ++;
                   
$walker = $nextopen + 1;
                }
            }
           
$tagend = $walker;
            if(
$tagend > strlen( $content ) )
               
$tagend = strlen( $content );
            else
            {
               
$tagend --;
               
$tagstart ++;
            }
           
$tag = substr( $content, $tagstart, $tagend - $tagstart );
           
$tags[] = '<' . $tag . '>';
           
$newtag = format_to_output( $tag, $format );
           
$newtags[] = '<' . $newtag . '>';
           
$newtag = format_to_output( $tag, $format );
        }
       
       
$content = str_replace($tags, $newtags, $content);
        return
$content;
    }
}
?>
leathargy at hotmail dot com
26.10.2003 19:15
it seems we're all overlooking a few things:
1) if we replace "</ta</tableble>" by removing </table, we're not better off. try using a char-by-char comparison, and replaceing stuff with *s, because then this ex would become "</ta******ble>", which is not problemmatic; also, with a char by char approach, you can skip whitespace, and kill stuff like "< table>"... just make sure <&bkspTable> doesn't work...
2) no browser treats { as <.[as far as i know]
3) because of statement 2, we can do:

<?php
$remove
=array("<?","<","?>",">");
$change=array("{[pre]}","{[","{/pre}","]}");
$repairSeek = array("{[pre]}", "</pre>","{[b]}","{[/b]}","{[br]}");
// and so forth...

$repairChange("<pre>","</pre>","<b>","<b>","<br>");
// and so forth...

$maltags=array("{[","]}");
$nontags=array("{","}");
$unclean=...;//get variable from somewhere...
$unclean=str_replace($remove,$change,$unclean);
$unclean=str_replace($repairSeek, $repairChange, $unclean);
$clean=str_replace($maltags, $nontags, $unclean);

////end example....
?>

4) we can further improve the above by using explode(for our ease):

<?php
function purifyText($unclean, $fixme)
{
$remove=array();
$remove=explode("\n",$fixit['remove']);
//... and so forth for each of the above arrays...
// or you could just pass the arrays..., or a giant string
//put above here...
return $clean
}//done
?>
dougal at gunters dot org
10.09.2003 22:03
strip_tags() appears to become nauseated at the site of a <!DOCTYPE> declaration (at least in PHP 4.3.1). You might want to do something like:

$html = str_replace('<!DOCTYPE','<DOCTYPE',$html);

before processing with strip_tags().
guy at datalink dot SPAMMENOT dot net dot au
15.03.2002 7:19
Strip tags will NOT remove HTML entities such as &nbsp;
chrisj at thecyberpunk dot com
18.12.2001 21:57
strip_tags has doesn't recognize that css within the style tags are not document text. To fix this do something similar to the following:

$htmlstring = preg_replace("'<style[^>]*>.*</style>'siU",'',$htmlstring);



PHP Powered Diese Seite bei php.net
The PHP manual text and comments are covered by the Creative Commons Attribution 3.0 License © the PHP Documentation Group - Impressum - mail("TO:Reinhard Neidl",...)