One of the reasons why curl can be slow is that it appends header "Expect: 100-continue". Try this fix:
<?php
$headers = array(
"Expect:",
// more headers here
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
?>
Fixed bugs in the function posted earlier (better javascript redirect following and now supports HTTPS)
<?php
/*==================================
Get url content and response headers (given a url, follows all redirections on it and returned content and response headers of final url)
@return array[0] content
array[1] array of response headers
==================================*/
function get_url( $url, $javascript_loop = 0, $timeout = 5 )
{
$url = str_replace( "&", "&", urldecode(trim($url)) );
$cookie = tempnam ("/tmp", "CURLCOOKIE");
$ch = curl_init();
curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $ch, CURLOPT_ENCODING, "" );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, false ); # required for https urls
curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, $timeout );
curl_setopt( $ch, CURLOPT_TIMEOUT, $timeout );
curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
$content = curl_exec( $ch );
$response = curl_getinfo( $ch );
curl_close ( $ch );
if ($response['http_code'] == 301 || $response['http_code'] == 302)
{
ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");
if ( $headers = get_headers($response['url']) )
{
foreach( $headers as $value )
{
if ( substr( strtolower($value), 0, 9 ) == "location:" )
return get_url( trim( substr( $value, 9, strlen($value) ) ) );
}
}
}
if ( ( preg_match("/>[[:space:]]+window\.location\.replace\('(.*)'\)/i", $content, $value) || preg_match("/>[[:space:]]+window\.location\=\"(.*)\"/i", $content, $value) ) &&
$javascript_loop < 5
)
{
return get_url( $value[1], $javascript_loop+1 );
}
else
{
return array( $content, $response );
}
}
?>
To follow ALL redirects using CURL brings up a lot of special cases. Here's a function that takes everything into account (even javascript redirects)
<?php
function get_final_url( $url, $timeout = 5 )
{
$url = str_replace( "&", "&", urldecode(trim($url)) );
$cookie = tempnam ("/tmp", "CURLCOOKIE");
$ch = curl_init();
curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1" );
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie );
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $ch, CURLOPT_ENCODING, "" );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $ch, CURLOPT_AUTOREFERER, true );
curl_setopt( $ch, CURLOPT_CONNECTTIMEOUT, $timeout );
curl_setopt( $ch, CURLOPT_TIMEOUT, $timeout );
curl_setopt( $ch, CURLOPT_MAXREDIRS, 10 );
$content = curl_exec( $ch );
$response = curl_getinfo( $ch );
curl_close ( $ch );
if ($response['http_code'] == 301 || $response['http_code'] == 302)
{
ini_set("user_agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.7.3) Gecko/20041001 Firefox/0.10.1");
$headers = get_headers($response['url']);
$location = "";
foreach( $headers as $value )
{
if ( substr( strtolower($value), 0, 9 ) == "location:" )
return get_final_url( trim( substr( $value, 9, strlen($value) ) ) );
}
}
if ( preg_match("/window\.location\.replace\('(.*)'\)/i", $content, $value) ||
preg_match("/window\.location\=\"(.*)\"/i", $content, $value)
)
{
return get_final_url ( $value[1] );
}
else
{
return $response['url'];
}
}
?>
Although it has been noted that cURL outperforms both file_get_contents and fopen when it comes to getting a file over a HTTP link, the disadvantage of cURL is that it has no way of only reading a part of a page at a time.
For example, the following code is likely to generate a memory limit error:
<?php
$ch = curl_init("http://www.example.com/reallybigfile.tar.gz");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$output = curl_exec($ch);
$fh = fopen("out.tar.gz", 'w');
fwrite($fh, $output);
fclose($fh);
?>
While this, on the other hand, wouldn't
<?php
$hostfile = fopen("http://www.example.com/reallybigfile.tar.gz", 'r');
$fh = fopen("out.tar.gz", 'w');
while (!feof($hostfile)) {
$output = fread($hostfile, 8192);
fwrite($fh, $output);
}
fclose($hostfile);
fclose($fh);
?>
In order to prevent curl sending messages to the server error log, you need to instruct curl to use a temporary file for output on stderr.
<?php
$gacookie="/askapache/tmp/curl-1.txt";
@touch($gacookie);
@chmod($gacookie,0666);
if($fp = tmpfile()){
$ch = curl_init("http://www.askapache.com/p.php");
curl_setopt ($ch, CURLOPT_STDERR, $fp);
curl_setopt ($ch, CURLOPT_VERBOSE, 2);
curl_setopt ($ch, CURLOPT_ENCODING, 0);
curl_setopt ($ch, CURLOPT_USERAGENT, 'Mozilla/5.0');
curl_setopt ($ch, CURLOPT_HTTPHEADER, $FF);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $gacookie);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $gacookie);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_FAILONERROR, 1);
curl_setopt ($ch, CURLOPT_HEADER, 1);
curl_setopt ($ch, CURLINFO_HEADER_OUT, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 30);
$r=curl_exec($ch);$ch_info=curl_getinfo($ch);
if (curl_errno($ch)) return false;
else curl_close($ch);
header("Content-type: text/plain");
echo $r;
sleep(2);
$postdata='keywords='.$_GET['k'];
$ch1 = curl_init("http://www.askapache.com/p.php");
curl_setopt ($ch1, CURLOPT_STDERR, $fp);
curl_setopt ($ch1, CURLOPT_VERBOSE, 2);
curl_setopt ($ch1, CURLOPT_ENCODING, 0);
curl_setopt ($ch1, CURLOPT_USERAGENT, 'Mozilla/5.0');
curl_setopt ($ch1, CURLOPT_COOKIEJAR, $gacookie);
curl_setopt ($ch1, CURLOPT_COOKIEFILE, $gacookie);
curl_setopt ($ch1, CURLOPT_POSTFIELDS, $postdata);
curl_setopt ($ch1, CURLOPT_POST, 1);
curl_setopt ($ch1, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch1, CURLOPT_FAILONERROR, 1);
curl_setopt ($ch1, CURLOPT_HEADER, 1);
curl_setopt ($ch1, CURLINFO_HEADER_OUT, 1);
curl_setopt ($ch1, CURLOPT_CONNECTTIMEOUT, 30);
$r=curl_exec ($ch1);$ch1_info=curl_getinfo($ch1);
if (curl_errno($ch1)) return false;
else curl_close($ch1);
header("Content-type: text/plain");print_r($ch1_info);echo $r;sleep(2);
}fclose($fp);
@unlink($gacookie);
?>
See also: http://www.askapache.com/security/curl-google-post-feed.html
Do not forget to use the complete path for the cookies
<?php
$ch = curl_init() ;
$myfile = "d:\\mydir\\second_dir\\cookiefile.txt" ;
curl_setopt($ch, CURLOPT_COOKIEJAR, $myfile) ;
?>
Best regards,
Fernando Gabrieli
As a windows xp user having php 5.2.5 running as an apache module on my personal computer, all I had to do in order to get this CURL stuff working is to uncomment the following line in my php.ini file:
extension=php_curl.dll
Thought I'd just mention it as it may save time to others since it seems somehow less compicated than the process mentioned (without sufficient explanation, mind you) in the above article (compile php??? options??? all I ever did to install php was to unzip the archive I downloaded and add a couple of line to my apache conf file).
I just tried downloading a web page and save its source into a text file using the example from the manual above. When the php_curl.dll extension is not active, it doesn't work (fatal error with init_curl, as you'd expect). When the php_curl.dll extension is active, it works. No need for any other intricate operation as far as I can say.
Otherwise, it's great that I now have my tool for doing any http request i want.
If you have trouble on server 2003, IIS 6 ( perhaps other versions ) with getting the php_curl loading please see the following.
- run (as an administrator) php.exe -i > C:\phpinfo.txt and go open C:\phpinfo.txt, look in the file to see if CURL was loading, if it's there then keep reading.
- running <?PHP phpinfo(); ?> inside a text.php script on my IIS server would not show CURL loading
- A permissions problem on libeay32.dll and ssleay32.dll was causing the cli version of php allowing me access to these two dlls while IIS was not able to get them. I gave 'everyone' read and execute on these two dll's to try to fix the issue and it just now worked. You may wish to be more restrictive, perhaps IUSR rather than EVERYONE; but check your permissions none-the-less.
Hopefully this saves someone some time!
- Ryan Nanney
Don't foget to curl_close($ch); Even if curl_errno($ch) != 0
Because if you don't - on Windows this will produce windows-error-report (Program terminated unexpectedly)
For anyone trying to use cURL to submit to an ASP/ASPX page that uses an image as the submit button.
Make sure that you have 'button_name.x' and 'button_name.y' in the post fields. PHP names these fields 'button_name_x' and 'button_name_y', while ASP uses a dot.
Also, as noted above, be sure to include the '__VIEWSTATE' input field in your post request.
Using curl to take snapshots of the current page for emailing the HTML is a clever little idea. (ie: Email this page to a friend)
<?php
//to be explained below!
session_write_close();
$pageurl = "http://www.site.com/content.php?PHPSESSID=123XYZ
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $pageurl );
$html = curl_exec ( $ch );
curl_close($ch);
//then you need to fix pathing to absolute
$search = "/(src|href|background)=\"[^:,^>,^\"]*\"/i";
preg_match_all ( $search, $html, $a_matches );
//you can figure out the rest ! but thought the reg expression is useful as well
?>
But here is the catch, you may want to make sure curl connects to the server under the same session as the browser. So naturally you pass the session cookie through the curl system either by the cookie jar system, or through the query string in the path.
This is where you will get stuck. PHP will need write access to the same session file simultaneously!! causing serious hanging issues!
This is why you should close off your session before you make curl take a page snapshot!
If PHP configure fails with cURL errors, try ommiting the --with-curl=path and just make this --with-curl.
Of course it will also be optimal to make sure that the cURL library directory is listed in /etc/ld.so.conf and then run 'ldconfig'.
By default this is /usr/local/lib.
For anyone who is having trouble getting some of the advanced functionality to work in whatever version of PHP you have, I wrote a little wrapper for the command line version of curl. The function below is for Windows (hence, the curl.exe), but it works the same way under Linux or whatever.
The curl man page enumerates a ton of options you can use...
<?php
function curlPageGrabber($destinationURL, $refererURL, $postData, $autoForward = false, $autoGrabCookies = true, $curlDebug = false)
{
$curlString = "c:\curl\curl.exe -i -v -m 30 -L -b c:\curl\cookiejar.txt -c c:\curl\cookiejar.txt ";
$curlString .= " -A \"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322;)\" ";
$curlString .= " -H \"Accept-Language: en-us\" ";
$curlString .= " -H \"Accept-Encoding: gzip, deflate\" ";
$curlString .= " -H \"Host: affiliate-program.amazon.com\" ";
$curlString .= " -H \"Accept: */*\" ";
$curlString .= " --compressed ";
//$curlString .= " --trace-ascii ";
if ($refererURL != '')
{
$curlString .= "-e \"".$refererURL."\" ";
}
if ($postData != '')
{
$curlString.= " -d \"$postData\" ";
}
$curlString .= $destinationURL;
if ($curlDebug == true)
{
echo "<p>curlString: $curlString</p>";
}
$temp = exec($curlString, $retval);
$htmlData = '';
foreach ($retval as $arrayLine)
{
$htmlData .= $arrayLine."\r\n";
}
if ($curlDebug == true)
{
echo "<textarea cols=80 rows=20>$htmlData</textarea>";
}
return $htmlData;
} // end function
?>
Note that on Win32 this documentation can get a little confusing.
In order to get this to work you need to:
1) Be sure that the folder where libeay32.dll and ssleay32.dll - tipically C:\\PHP - is present on the PATH variable.
2) Uncomment - remove the semi-colon - the line that says "extension=php_curl.dll" from php.ini
3) Restart the webserver (you should already know this one, but...)
It took me some time to realize this, since this page doesn't mention the need to uncomment that php.ini's line.
This may be obvious to everybody *except* me, but if you want to use curl to connect via ftp rather than http, then you just need to use "ftp://" in the url specification (I was looking for an use_ftp flag or something).
Use the CURLOPT_USERPWD to login to the ftp site.
<?php
/*
Sean Huber CURL library
This library is a basic implementation of CURL capabilities.
It works in most modern versions of IE and FF.
==================================== USAGE ====================================
It exports the CURL object globally, so set a callback with setCallback($func).
(Use setCallback(array('class_name', 'func_name')) to set a callback as a func
that lies within a different class)
Then use one of the CURL request methods:
get($url);
post($url, $vars); vars is a urlencoded string in query string format.
Your callback function will then be called with 1 argument, the response text.
If a callback is not defined, your request will return the response text.
*/
class CURL {
var $callback = false;
function setCallback($func_name) {
$this->callback = $func_name;
}
function doRequest($method, $url, $vars) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
if ($method == 'POST') {
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $vars);
}
$data = curl_exec($ch);
curl_close($ch);
if ($data) {
if ($this->callback)
{
$callback = $this->callback;
$this->callback = false;
return call_user_func($callback, $data);
} else {
return $data;
}
} else {
return curl_error($ch);
}
}
function get($url) {
return $this->doRequest('GET', $url, 'NULL');
}
function post($url, $vars) {
return $this->doRequest('POST', $url, $vars);
}
}
?>
Beware of any extra spaces in the URL. A trailing space in the URL caused my script to fail with the message "empty reply from server".
I had the following experience when harvesting urls with a get variable from a page using cUrl. HTML pages will output ampersands as & when the page is read by the curl function.
If you code a script to find all hyperlinks, it will use & instead of &, especially using a regular expression search.
It is hard to detect because when you output the url to the browser it renders the html. To fix, add a line to replace the & with &.
<?php
function processURL($url){
$url=str_replace('&','&',$url);
$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$xml = curl_exec ($ch);
curl_close ($ch);
echo $xml;
}
?>
A note of warning for PHP 5 users: if you try to fetch the CURLINFO_CONTENT_TYPE using curl_getinfo when there is a connect error, you will core dump PHP. I have informed the Curl team about this, so it will hopefully be fixed soon. Just make sure you check for an error before you look for this data.
In recent versions of php, CURLOPT_MUTE has (probably) been deprecated. Any attempt of using curl_setopt() to set CURLOPT_MUTE will give you a warning like this:
PHP Notice: Use of undefined constant CURLOPT_MUTE - assumed 'CURLOPT_MUTE' in ....
If you wish tu silence the curl output, use the following instead:
<?php
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
?>
And then,
<?php
$curl_output=curl_exec($ch);
?>
The output of the curl operation will be stored as a string in $curl_output while the operation remains totally silent.
If you have upgraded to using thread safe PHP (with apache 2 MPM=worker) note that
CURLOPT_COOKIEJAR / CURLOPT_COOKIEFILE both need an absolute path set for the cookie file location and no longer take a relative path.
(As before, also remember to have set correct permissions to allow a writeable cookie file/dir by apache)
[php 4.3.7/apache v2.0.49]
It took me quite some to to figure out how to get Curl (with SSL), OpenSSL and PHP to play nicely together.
After reinstalling MS-VC7 and compiling OpenSSL to finally realise this was'nt nesscary.
If your like me and like *Nix systems more than Windows then you'll most probly have similar problems.
I came across this, on a simple google with the right keywords.
http://www.tonyspencer.com/journal/00000037.htm
I read thru that and found my mistake.
Its just a small list of notes, I found them to be the best I've found on the subject and the most simplist.
Dont forget to add a simple line like this into your scripts to get them working on Win32.
<?php
if($WINDIR) curl_setopt($curl, CURLOPT_CAINFO, "c:\\windows\\ca-bundle.crt");
?>
Last note: ca-bundle.crt file is located in the Curl download. I stored mine in the windows directory and apache/php can access it fine.
All the best and I hope this helps.
Simon Lightfoot
vHost Direct Limited
You can request (and have deflated for you) compressed http (from mod_gzip or ob_gzhandler, for instance) by using the following:
<?php
curl_setopt($ch,CURLOPT_ENCODING , "gzip");
?>
This seems to work fine with the latest php (4.3.4) curl (7.11.0) and zlib (1.1.4) under linux. This is much more elegant than forcing the Accept-Encode header and using gzdeflate on an edited result.
For Win2000: To get the 4.3.1 curl dll to work with https you now need to download the latest win32 curl library from http://curl.haxx.se and snag the ca-bundle.crt file from the lib directory. Place this somewhere handy on your webserver.
Then in your PHP script, add the following setopt line to the rest of your curl_setopt commands:
<?php curl_setopt($ch, CURLOPT_CAFILE, 'C:\pathto\ca-bundle.crt'); ?>
This worked for me and allowed me to discontinue using the CURLOPT_SSL_VERIFYPEER set to zero hack.