|
From: <php...@li...> - 2010-02-05 06:24:18
|
Hi,
I'm doing a fairly simple streaming download, looping for a number of
different files, and I'm getting sporadic, unrepeatable severe errors in
PhpCGIServlet.
The code will successfully download some files, or parts of some files,
but will sometimes cause a crash. It might download 850MB without a
hiccup, or fail partway through a file at 20MB. If I rerun the code, it
might download the file it previously failed on, or I might need to run
it two or three times to get it to work, then it might download all
other files with no problems.
The code has been tested using the most up-to-date JavaBridge,
php-script and php-servlet jar files and the error still occurs. The PHP
has been tested in Apache webserver and there is no evidence of the
error there. Tomcat is running on a VMWare instance with PHP-CGI, and
Apache is running on the host machine with the Apache PHP server module.
Any advice would be welcome.
Using:
PHP 5.2.10
Tomcat 6.0.20
Apache 2.2.10
Cheers,
Mark...
Extract of error log:
INFO: PHP Warning: unlink(sync.lck) [<a
href='function.unlink'>function.unlink</a>]: Permission denied in
C:\player.php on line 15
Feb 5, 2010 12:53:19 PM org.apache.catalina.core.ApplicationContext log
INFO: SessionListener: contextDestroyed()
Feb 5, 2010 12:53:19 PM org.apache.catalina.core.ApplicationContext log
INFO: ContextListener: contextDestroyed()
Feb 5, 2010 12:53:19 PM org.apache.catalina.core.ApplicationContext log
SEVERE: Servlet PhpCGIServlet threw unload() exception
javax.servlet.ServletException: Servlet.destroy() for servlet
PhpCGIServlet threw exception
at
org.apache.catalina.core.StandardWrapper.unload(StandardWrapper.java:1413)
at org.apache.catalina.core.StandardWrapper.stop(StandardWrapper.java:1739)
at org.apache.catalina.core.StandardContext.stop(StandardContext.java:4563)
at
org.apache.catalina.core.ContainerBase.removeChild(ContainerBase.java:924)
at
org.apache.catalina.startup.HostConfig.undeployApps(HostConfig.java:1248)
at org.apache.catalina.startup.HostConfig.stop(HostConfig.java:1219)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:316)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1086)
at org.apache.catalina.core.ContainerBase.stop(ContainerBase.java:1098)
at org.apache.catalina.core.StandardEngine.stop(StandardEngine.java:448)
at org.apache.catalina.core.StandardService.stop(StandardService.java:584)
at org.apache.catalina.core.StandardServer.stop(StandardServer.java:744)
at org.apache.catalina.startup.Catalina.stop(Catalina.java:633)
at org.apache.catalina.startup.Catalina.start(Catalina.java:608)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:288)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:413)
Caused by: java.lang.NullPointerException
at php.java.bridge.Util$Process.getOutputStream(Util.java:1071)
at php.java.servlet.fastcgi.ChannelFactory.destroy(ChannelFactory.java:149)
at php.java.servlet.fastcgi.FastCGIServlet.destroy(FastCGIServlet.java:296)
at php.java.servlet.PhpCGIServlet.destroy(PhpCGIServlet.java:195)
at
org.apache.catalina.core.StandardWrapper.unload(StandardWrapper.java:1394)
... 20 more
The code extract is here. The error is occuring in the '// Chunk
download' loop:
$fs = fsockopen($host, 80, $errno, $errstr, 30);
if (!$fs) {
$this->writeDebugInfo("FAILED: ", $errstr . '(' . $errno .
')');
} else {
$out = "GET $file HTTP/1.0\r\n";
$out .= "Host: $host\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fs, $out);
$fm = fopen ($temp_file_name, "w");
stream_set_timeout($fs, 30);
while(!feof($fs) && ($debug = fgets($fs)) != "\r\n" ); //
ignore headers
while(!feof($fs)) {
$contents = fgets($fs, 4096); // Chunk download
fwrite($fm, $contents);
}
fclose($fm);
$info = stream_get_meta_data($fs);
fclose($fs);
if ($info['timed_out']) {
// Delete temp file if fails
unlink($temp_file_name);
$this->writeDebugInfo("FAILED: Connection timed out: ",
$temp_file_name);
$this->writeDebugInfo("stream info: ", $info);
} else {
// Move temp file if succeeds
$media_file_name = str_replace('temp/', 'media/',
$temp_file_name);
rename($temp_file_name, $media_file_name);
$this->writeDebugInfo("SUCCESS: ", $media_file_name);
$this->writeDebugInfo("stream info: ", $info);
}
}
|
|
From: <php...@li...> - 2010-02-05 07:01:38
|
Hi Mark,
On Fri, 2010-02-05 at 13:17 +0700,
php...@li... wrote:
> while(!feof($fs)) {
> $contents = fgets($fs, 4096); // Chunk download
HTTP/1.1 chunked encoding is not supported by the CGI spec. CGI expects
the CONTENT_LENGTH header. If it is missing, php-cgi will wait forever
until it is destroyed.
The FastCGI SAPI does support it; and all php-cgi binaries > 5.2.0 are
compiled with --enable-fastcgi per default.
I will add a new method to the PHPFastCGIServlet in version 5.5.6, which
will allow clients to establish a two-way connection with the php script
using either "Connection: Upgrade" (WebSockets) or Transfer-Encoding:
Chunked (semi-standard HTTP tunnel).
Please see my previous mail for details.
Regards,
Jost Boekemeier
|
|
From: <php...@li...> - 2010-02-05 07:42:08
|
I did see your recent email, but the 'Websockets' references threw me. So it should be added in the next release, and I'll just need to add the 'Transfer-Encoding: Chunked' header. Cool - I'll watch the mailing lists for news on the next release. Cheers, and many thanks for the quick reply. Mark... |
|
From: <php...@li...> - 2010-02-05 11:21:11
|
> So it should be added in the next release, and I'll just need to add the > 'Transfer-Encoding: Chunked' header. May be not. -- I just saw that your code doesn't use php://input, but opens a HTTP/1.0 URL connection to some resource. What resource is it? A local or a remote resource? If it is a resource served by the originating HTTP server or servlet engine, we've found the bug. If you want to include some local resource, use requestDispather.include() or the Apache API instead. A "loop-back" URL connection does not work, because the Apache process- and servlet thread pool is limited. If it runs out of resources, further requests might create a dead lock, they will block all further requests. Imagine a thread pool of 2, with two php scripts creating 2 additional requests to the same resource. These scripts will block the HTTP server or servlet engine forever. Regards, Jost Boekemeier |
|
From: <php...@li...> - 2010-02-06 11:03:57
|
> May be not. -- I just saw that your code doesn't use php://input, but > opens a HTTP/1.0 URL connection to some resource. > > What resource is it? A local or a remote resource? It is actually a remote resource. The code is installed on any PC, and downloads data from a specified and fixed Amazon Web Service S3 bucket. So I don't think this is covered by the scenario you described. But many thanks for looking at this as I understand it's not really in the scope of JavaBridge. Cheers, Mark... |
|
From: <php...@li...> - 2010-02-06 12:04:56
|
Does the remote server respond with HTTP/1.1 by any chance? If so, the feof() test will not work, the last packet is zero, but the connection isn't closed. So the 20KB you receive may be the remainder of the last connection. Regards, Jost Bökemeier |
|
From: <php...@li...> - 2010-02-06 16:43:20
|
> Does the remote server respond with HTTP/1.1 by any chance? It does. > If so, the feof() test will not work, the last packet is zero, but the > connection isn't closed. It's failing before EOF. > So the 20KB you receive may be the remainder of the last connection. This might still be something to do with it, but on the last test with CURL and a 8k buffer: 175kb file downloaded okay 66MB file hung after 44kb downloaded It often works fine. So I'm considering using fsockopen and using 'set_time_limit' on each iteration and catching the exception when it hangs to retry the file a number of times. This will be a first for me. I appreciate your advice with this one. And the question I've been avoiding asking is now looming in my mind - any idea when 5.5.6 might be released? ;-) Thanks again, Mark... |
|
From: <php...@li...> - 2010-02-06 18:52:07
|
On Sat, 2010-02-06 at 23:36 +0700, php...@li... wrote: > > Does the remote server respond with HTTP/1.1 by any chance? > > It does. If the data you receive is encoded in HTTP/1.1 chunks, your fgets() is incorrect, too. The HTTP/1.1 chunk data format is: length\r\n data\r\n The last packet is: 000\r\n\r\n I don't think that there's a bug in PHP or the PHP/Java Bridge: Java.inc uses chunked connections too. Please take a look at the class java_ChunkedSocketChannel from Protocol.inc for an example how to receive HTTP/1.1 chunked data. I think the bug is in the line: contents = fgets($fs, 4096); // Chunk download The packet may be longer than 4096 bytes. You must extract the data length from eack packet header. Regards, Jost Bökemeier |
|
From: <php...@li...> - 2010-02-07 08:26:04
|
> If the data you receive is encoded in HTTP/1.1 chunks
I don't believe it is, but I don't know a lot about these things, so I
might be wrong.
There's nothing special happening at the server side. It's just a GET
request and response. I'm just trying to download the response in a way
that doesn't cause me memory problems with files that can be 100MBs in size.
At present, the code below is producing an adequate result, ie: it's not
failed in testing yet. The downloads only fail 'sometimes', so I don't
understand how some of the explanations offer here fit to what's
happening. The retry improves the chance of the downloading completing.
If a file does not complete after the retries, the client handles this,
and the downloads are retried again later.
Thanks thanks for everyone's advice though. Some of it is over my head
as a PHP newbie, but I'm understanding and learning other parts of what
is being said.
Cheers,
Mark...
$download_attempt = 1;
do {
$fs = fsockopen($host, 80, $errno, $errstr, 30);
if (!$fs) {
$this->writeDebugInfo("FAILED ", $errstr . '(' . $errno .
')');
} else {
$out = "GET $file HTTP/1.1\r\n";
$out .= "Host: $host\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fs, $out);
$fm = fopen ($temp_file_name, "w");
stream_set_timeout($fs, 30);
while(!feof($fs) && ($debug = fgets($fs)) != "\r\n" ); //
ignore headers
while(!feof($fs)) {
$contents = fgets($fs, 4096); // Chunk download
fwrite($fm, $contents);
$info = stream_get_meta_data($fs);
if ($info['timed_out']) {
break;
}
}
fclose($fm);
fclose($fs);
if ($info['timed_out']) {
// Delete temp file if fails
unlink($temp_file_name);
$this->writeDebugInfo("FAILED on attempt " .
$download_attempt . " - Connection timed out: ", $temp_file_name);
$download_attempt++;
if ($download_attempt < 5) {
$this->writeDebugInfo("RETRYING: ", $temp_file_name);
}
} else {
// Move temp file if succeeds
$media_file_name = str_replace('temp/', 'media/',
$temp_file_name);
rename($temp_file_name, $media_file_name);
$this->writeDebugInfo("SUCCESS: ", $media_file_name);
}
}
} while ($download_attempt < 5 && $info['timed_out']);
|
|
From: <php...@li...> - 2010-02-07 16:48:36
|
On Sun, 2010-02-07 at 15:19 +0700, php...@li... wrote: > I'm just trying to download the response in a way How else do you want to download the files efficiently, if not via Transfer-Encoding: Chunked? > The downloads only fail 'sometimes', so I don't understand how some of > the explanations Given your code, the possibility that PHP reads "over the edge" depends on the file size. If the file is large it is more likely that this will happen. In any case, this isn't due to bugs in PHP or the PHP/Java Bridge, but due to your misunderstanding of how HTTP chunked encoding works and what fgets() and feof() do. Regards, Jost Bökemeier |
|
From: <php...@li...> - 2010-02-07 17:41:38
|
> In any case, this isn't due to bugs in PHP or the PHP/Java Bridge, but > due to your misunderstanding of how HTTP chunked encoding works and what > fgets() and feof() do. I comfortably admit I am new to downloading very large files in PHP, and I'm trying to learn. I found an approach that worked...mostly. But it caused PHP to crash under JavaBridge and Tomcat. An experienced Java programmer saw the logs and said it was due to a class error. I contacted this mailing list and explained what was happening and was told that it was down to a feature not yet supported in JavaBridge. Now I'm being told it's not down to that. I don't have any understanding of HTTP chunked encoding works. I'm not using this header in my code and your references to it were the first that I had seen. I'll read up on this and see how I can use it given what you say. And looking at the implementation of chunked encoding will give me more information on how fgets() and feof() work generally, and in the context of chunked encoding. Again, many thanks for helping out. It is appreciated. Mark... |
|
From: <php...@li...> - 2010-02-08 16:25:29
|
On Mon, 2010-02-08 at 00:34 +0700, php...@li... wrote: > I found an approach that worked...mostly. Yes, "mostly", indeed. The code you have posted doesn't work on any operating system or HTTP server. > But it caused PHP to crash It certainly doesn't cause a crash. It causes PHP to hang for the reasons outlined in the previous post. > under JavaBridge and Tomcat. An experienced Java programmer saw the logs > and said it was due to a class error. Please re-read the error log. The hanging PHP instance has been killed during the tomcat shutdown sequence... Anyway, I don't think we need to discuss this issue any further, Peter has already posted code which works (except that I would read the terminating \r\n, too, but that's a matter of style). Just nuke your code and use his code instead. Please re-read the HTTP/1.1 spec for details why your code is completely wrong and causes PHP to hang sometimes. Regards, Jost Bökemeier |
|
From: <php...@li...> - 2010-02-08 17:35:22
|
> Yes, "mostly", indeed. The code you have posted doesn't work on any > operating system or HTTP server. I am very aware that you are much more experienced in these matters than me, but your statement does not explain why the code successfully downloaded about 2.6GB of data consisting of 25-30 different files in testing yesterday. Experiences like this imply that it does work. Though I accept that it might not be the most efficient way of doing it and I'm happy to accept the advice of people who have more knowledge than me and can help me to see better solutions. I did see this when I was reading back to look again at Peter's code: > If the data you receive is encoded in HTTP/1.1 chunks, your fgets() is > incorrect, too. The HTTP/1.1 chunk data format is: Received data isn't chunk encoded. Maybe I've not been clear on this point by using the words 'chunk download' in a comment. It was my figure of speech, rather than a declaration of intent to use chunk encoding. Perhaps 'buffer download' would've been more appropriate. > Peter > has already posted code which works To my inexperienced eye, this looks like it will receive chunk encoded data, but the server I'm using is not passing data using this method. Though I do appreciate Peter's response - I don't think I acknowledged it at the time (Cheers Peter!). To me, it seems that you believe I am receiving chunk encoded data but I'm not. I can see that my code would be completely wrong if I was trying to do that and your comments about the code would be completely valid. Your frustration that I've not understood this point is clear from the last two emails. However, advice from StackOverflow confirms that my code should work fine, that fread() should be used to make it more binary-safe, that CURL would be even better, and chunk encoding is not needed. So I have other approaches to try. Thanks again for taking the time out to reply. I don't think this needs to be discussed further either. And many thanks for providing libraries without which I would not have been able to proceed as far with this project as I have. Regards, Mark... |
|
From: <php...@li...> - 2010-02-09 17:07:03
|
Mark, you don't seem to understand the issue. 1. A GET request must be answered by the HTTP 1.1 server by either sending the Content-Length header or by sending the Transfer-Encoding: chunked header. 2. To send large responses efficiently, Transfer-Encoding: chunked is used. 3. The network buffer used by the PHP streams implementation reads data eagerly. If you fread($socket, 1024) and the network buffer already contains 24 bytes, PHP will try to read 1000 bytes nevertheless. 4. If you open a HTTP 1.0 connection to a HTTP server which cannot respond with a Content-Length header, the behaviour is implementation specific. It may send an error, send a raw byte stream or it may start playing nethack. 5. PHP running within Apache behaves exactly the same as PHP running within a JEE server or servlet engine. The only difference is that Java prints a stack trace when the hanging PHP instance is killed, while Apache silently kills it with SIGCHILD. You don't have many choices to implement your "stream download". Either you use HTTP/1.1 chunked connections, or you will have to deal with an implementation-specific behaviour of both, the HTTP server (#4) *and* PHP (#3). Since the PHP streams implementation cannot handle raw byte streams which don't have an explicit length (either Content-Length or the length from each "chunked" header), you cannot use PHP fread --- your "fgets()" is bogus because you do not and cannot distinguish between the \r\n from the data and the \r\n used to split the packets, so your download is garbage anyway --- to receive the data, unless the remote server shuts down the connection at the end of the transfer, so that PHP's fread (or fgets) doesn't try to fill its network buffer over the end of the data. If you want to discuss this issue further, please use the PHP mailing list instead. Regards, Jost Bökemeier |
|
From: <php...@li...> - 2010-02-09 17:39:31
|
Hi Jost, Thanks for the very reasonable response. I won't respond to it in detail here as you've requested, but the way you've explained it has made it much clearer for me. I still don't understand why my code is working given what you have said, though it is much revised from the first email I sent. Maybe it will break and I will experience what you are telling me for myself, which won't be a bad thing. But I've taken enough of your time already, so I'll shut up. Thanks for helping out. Mark... |
|
From: <php...@li...> - 2010-02-05 09:23:31
|
FWIW, and for anyone else it might help, I'm trying a CURL approach instead:
$fp = fopen($temp_file_name, "w");
// Configuration of curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $host . $file);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_BUFFERSIZE, 8192);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec($ch);
$curl_info = curl_getinfo($ch); // Tests for success later
curl_close($ch);
fclose($fp);
if (is_resource($fp)) {
fclose($fp); // CURL bug
}
Look out for the need to double-close the file resource - bugfix went in
Sept 2009 but isn't in PHP 5.2.10 which isn't *that* old.
If there is any known conflict with PGP-CGI or JavaBridge, could you let
me know? It might save many hours of pain and misery with testing...
Cheers,
Mark...
|
|
From: <php...@li...> - 2010-02-06 09:44:58
|
In PHP/Java Bridge version 5.5.5 you can use the PHP function
"virtual()" to include a local resource. I have added the following to
our FAQ:
How do I include a local (*.asp, *.jsp, ...) resource?
Use the virtual() function. If PHP is running within a JEE server or
servlet engine, the virtual() function is an alias for
java_context()->getHttpServletRequest()->getRequestDispatcher()->include()
Warning: Do not open a "loop back" url connection (e.g. via
fopen("http://localhost.../foo.asp")) to include the local resource.
This might exceed the HTTP server's pool size and create a deadlock!.
Regards,
Jost Bökemeier
|
|
From: <php...@li...> - 2010-02-07 00:00:01
|
Hello!
The apache server sends garbage if you send http 1.0 but expect http 1.1.
The following code will work:
$sock = fsockopen(...);
fwrite($sock, "GET /foo.php HTTP/1.1\r\nHost: localhost\r\n\r\n");
while(fgets($sock)!="\r\n");
while($len=hexdec(fgets($sock))) {
for($str=fread($sock, $len); $len>strlen($str); $str.=fread($sock, $len-strlen($str)));
echo $str;
}
Peter
|
|
From: <php...@li...> - 2010-02-07 07:29:57
|
Hi Peter, > The apache server sends garbage if you send http 1.0 but expect http > 1.1. are we talking about this apache feature which can stream bytes to the client? This feature is quite useful on C level, you can simply read() the bytes as apache generates them. But it is difficult to handle in PHP or Bea WebLogic for example, because their network implementations use a buffer which reads data eagerly. If the PHP network buffer contains 4 bytes and fread() requests 20 bytes, PHP tries to read 16 bytes from the network, instead of simply returning the 4 bytes available so far. I don't know why PHP and other JEE servers implement their network buffers this way. It might be more efficient for normal HTTP connections, but it is bad for data packets which don't have an explicit length; in PHP you must call fread($sock, 1), if you don't want to run into a dead lock. Or you must unblock the stream and call stream_select() for each packet. Very low tech... Regards, Jost Bökemeier |