Message boards :
Number crunching :
It's Back ... HTTP Internal Server Error
Message board moderation
Author | Message |
---|---|
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
Following the last little hiccup (the upload issue), 1 of my boxes started accumulating wus (as did the others) until it could again upload completed wus. This it did. I suspect that Communication was Deferred for some considerable length of time and when that time elapsed it uploaded more than the server can report against. I now have 2,450+ wus that cannot report. This happened following the previous outage and the apparent issue was fixed however the fix seems to have disappeared as I noticed that I could report wu quantities greater than 64. Whilst this machine is still pumping out wus (and can't report) can someone have a look into the patch and why it is not working ... cheers |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
Following the last little hiccup (the upload issue), 1 of my boxes started accumulating wus (as did the others) until it could again upload completed wus. This it did. I suspect that Communication was Deferred for some considerable length of time and when that time elapsed it uploaded more than the server can report against. I now have 2,450+ wus that cannot report. This happened following the previous outage and the apparent issue was fixed however the fix seems to have disappeared as I noticed that I could report wu quantities greater than 64. Whilst this machine is still pumping out wus (and can't report) can someone have a look into the patch and why it is not working ... I don't think they actually fixed that error yet, they just upped the amount that could report from 64 to 256. Of course you have to a 6.12.x or higher version to take advantage of the cc_config option. |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
I thought that that was the fix ... if not, sorry about that ... However, that doesn't really solve the problem ... the interesting thing here is that when I went on hols o/s for 3 weeks I could come back, turn on the router and upload 1,000s and then report them all in one go ... I suppose those were the good ol' days ... still leaves me with the HTTP issue though |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
Now have 3,540 wus that are not reporting due HTTP Internal Server Error ... When can someone do something about this so these wus can report ?? cheers |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Only you can at the moment...... Update Boinc to a version that will support max tasks reported and set it at 256 or less....... Sorry, I do not remember what the minimum version for that is right now. "Time is simply the mechanism that keeps everything from happening all at once." |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Sorry, I do not remember what the minimum version for that is right now. <max_tasks_reported>N</max_tasks_reported> was added from 6.11.10 onwards. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
The easiest one to use is v6.12.34 - that's available via the all versions link on the main BOINC download page, and allows you to return to your v6.10.58 comfort zone afterwards if you want. It's a pretty good and stable version. |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
guys... remember I did that last time ... I even stayed on 6.12.34 to see how it worked but the cache would not fill past 20% of what it should have been so I went back to 6.10.58 ... there is an issue with the server that needs to be fixed and at the current rate of dysfunctional activity I'll probably be doing this once a month ... it's not a fix ... L. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
guys... remember I did that last time ... I even stayed on 6.12.34 to see how it worked but the cache would not fill past 20% of what it should have been so I went back to 6.10.58 ... Agreed. As with most of these things, there needs to be a two-pronged attack. You need to find a way of reporting the current tasks before they reach deadline - otherwise they will be reissued, the Berkeley bandwidth will be stressed unecessarily, electricity will have been wasted and (both last and least) you won't get the credit you think you've earned. We need further information, please, so that we can file a proper report on the ongoing problem and have another go at getting it fixed at source. So (with your current BOINC, before any workround): Which of your hosts is this - ID, please? How many tasks have you completed - trying to report? How many tasks do you still have cached - ready to start? How big is the file 'sched_request_setiathome.berkeley.edu.xml' (in your BOINC data directory)? |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
That's what I had to do with my GTX 580 box. It had a similar amount of WU's to report after last weekend's glitch. When it had reported in I went back to 6.10.58. Another tip, while you are reporting the backlog in sent SAH to "No New Tasks" then you don't get the "last contact was less than 5 minutes ago" error. When the tasks are reported you switch back to the earlier BOINC version and allow new tasks. <I wondered why that box of yours hadn't reported in for a while :-)> T.A. |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
now you know why ... :) |
Bernd Noessler Send message Joined: 15 Nov 09 Posts: 99 Credit: 52,635,434 RAC: 0 |
I don't think the problems are related to the number of tasks you have to report. I have 2 internet connections. One with an upload of 0.5 MBit/s and the other with 10 MBit/s. One of my machines, connected to the fast upload, couldn't report 19 tasks for hours. I changed this machine to use the slow connection. Reporting worked in the 1st try. I switched this machine several times between the connections. Always the same. The slow connection works, the fast dosn't. |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
Richard Local time 6:46pm Melbourne, Australia Host ID: 6624102 ... it's the one that has 3 x GTX580s Task trying to report: 4,531 Task cached: good question... I have 19,182 in progress, 1780 pending, take out the 4531 above and that leaves 12,871 ... so allowing for error 12,500 or there abouts ... How big is the file 'sched_request_setiathome.berkeley.edu.xml' ... and the answer is ... 23,627KB Hope this helps and is there anything else that you would like to know before I do the ol switch a roo ... Lionel |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Richard Thanks. I can't think of anything else that would help them track it down, except perhaps one last try at reporting (expecting a server error), with as exact a timestamp as possible - ideally in UTC. The servers keep logs of each event, but as you can imagine there are many tens of thousands of entries to look at, so narrowing the search to a specific time will help. We'll be looking at host 6624102, I take it? When you switch clients, remember that the reporting limit is 256 now. Let us know how you get on. I'll drop a line to the powers-that-be. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Thanks. I can't think of anything else that would help them track it down Perhaps his internet upload speed, how long does it take for him to actually upload the 23MB report file, it may be that the line or the client has timed out by the time his BOINC has uploaded all of that. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Thanks. I can't think of anything else that would help them track it down An extract from the message log, showing the times when the request was submitted and the error reply returned, could be helpful there. It won't be accessible through BOINC Manager if Lionel has already upgraded (though it doesn't look as if he has yet), but older messages can be recovered from stdoutdae.txt even after a BOINC version change. Edit - if it is a timing problem related to the size of the request file, one last try at a time when the cricket graph shows comms are quiet might be worthwhile. |
KneeDeep Send message Joined: 27 Sep 99 Posts: 131 Credit: 4,887,778 RAC: 0 |
Thanks. I can't think of anything else that would help them track it down I'm curious ... If it is timing out while uploading the request file, would increasing the http_transfer_timeout in cc_config help? Or does that just apply to the upload/download of wu's? <http_transfer_timeout>seconds</http_transfer_timeout> abort HTTP transfers if idle for this many seconds; default 300 New in 6.12.27 |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
Thanks. I can't think of anything else that would help them track it down Only problem is he is running 6.10.58, so that option is not available either. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Edit - if it is a timing problem related to the size of the request file, one last try at a time when the cricket graph shows comms are quiet might be worthwhile. Well, he's going to have to install v6.12.34 anyway, the way things stand, so he could explore this at the same time. BUT: I don't think this is likely to help. If the problem is a timeout within the client (and hence modifiable by cc_config.xml), it's more likely to result in 21-Jun-2012 23:21:15 [SETI@home] [sched_op] Starting scheduler request If there is a timing problem with the uploading of the sched_request.xml file, it's more likely that a server setting would have to be changed - and that's in the hands of Matt/Eric/Jeff, not us. The message log doesn't give us much insight into the transfer of sched_request files, though we might get a little bit more info from an <http_debug> log. I fear we might have to go right down to <http_xfer_debug>, but that gets very verbose - it writes a line to the log at every MTU interval (typically ~1.4KB), so Lionel would have ~17 thousand lines of log. Probably not all that helpful... |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
The message log doesn't give us much insight into the transfer of sched_request files, though we might get a little bit more info from an <http_debug> log. I fear we might have to go right down to <http_xfer_debug> No need to reinvent the wheel. I ran with a <http_xfer_debug> and <http_debug> last night due to a bug I am tracing in the present client(s) (nothing to do with the present thread). <http_xfer_debug> itself gives amounts of bytes transported, without exactly stating what they come from. E.g. 3-Jul-2012 23:47:24 [---] [http_xfer] [ID#76] HTTP: wrote 7300 bytes 03-Jul-2012 23:47:24 [---] [http_xfer] [ID#78] HTTP: wrote 5840 bytes 03-Jul-2012 23:47:24 [---] [http_xfer] [ID#78] HTTP: wrote 7300 bytes 03-Jul-2012 23:47:24 [---] [http_xfer] [ID#78] HTTP: wrote 2920 bytes 03-Jul-2012 23:47:24 [---] [http_xfer] [ID#78] HTTP: wrote 8760 bytes Reporting one task to Seti gave me this info: 03-Jul-2012 23:48:45 [SETI@home] [sched_op] Starting scheduler request 03-Jul-2012 23:48:45 [SETI@home] Sending scheduler request: To report completed tasks. 03-Jul-2012 23:48:45 [SETI@home] Reporting 1 completed tasks 03-Jul-2012 23:48:45 [SETI@home] Not requesting tasks: some download is stalled 03-Jul-2012 23:48:45 [SETI@home] [sched_op] CPU work request: 0.00 seconds; 0.00 devices 03-Jul-2012 23:48:45 [SETI@home] [sched_op] ATI work request: 0.00 seconds; 0.00 devices 03-Jul-2012 23:48:45 [SETI@home] [http] HTTP_OP::init_post(): http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi 03-Jul-2012 23:48:45 [SETI@home] [http] HTTP_OP::libcurl_exec(): ca-bundle set 03-Jul-2012 23:48:45 [SETI@home] [http] [ID#1] Info: Connection #7 seems to be dead! 03-Jul-2012 23:48:45 [SETI@home] [http] [ID#1] Info: Closing connection #7 03-Jul-2012 23:48:45 [SETI@home] [http] [ID#1] Info: Connection #8 seems to be dead! 03-Jul-2012 23:48:45 [SETI@home] [http] [ID#1] Info: Closing connection #8 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Info: About to connect() to setiboinc.ssl.berkeley.edu port 80 (#0) 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Info: Trying 208.68.240.20... 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Info: Connected to setiboinc.ssl.berkeley.edu (208.68.240.20) port 80 (#0) 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Info: Connected to setiboinc.ssl.berkeley.edu (208.68.240.20) port 80 (#0) 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: POST /sah_cgi/cgi HTTP/1.1 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.0.31) 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Host: setiboinc.ssl.berkeley.edu 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Accept: */* 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Content-Length: 16955 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Expect: 100-continue 03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: 03-Jul-2012 23:48:47 [SETI@home] [http] [ID#1] Info: Done waiting for 100-continue 03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: HTTP/1.1 100 Continue 03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: HTTP/1.1 200 OK 03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: Date: Tue, 03 Jul 2012 21:48:39 GMT 03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: Server: Apache/2.2.17 (Fedora) 03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: Connection: close 03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: Transfer-Encoding: chunked 03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: Content-Type: text/xml 03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: 03-Jul-2012 23:48:49 [---] [http_xfer] [ID#1] HTTP: wrote 1271 bytes 03-Jul-2012 23:48:50 [SETI@home] [checkpoint] result 12fe12ad.5129.23789.14.10.22_2 checkpointed 03-Jul-2012 23:48:50 [SETI@home] [checkpoint] result 13fe12aa.24759.7020.12.10.243_1 checkpointed 03-Jul-2012 23:48:52 [---] [http_xfer] [ID#1] HTTP: wrote 4040 bytes 03-Jul-2012 23:48:52 [SETI@home] [http] [ID#1] Info: Closing connection #0 03-Jul-2012 23:48:53 [SETI@home] Scheduler request completed 03-Jul-2012 23:48:53 [SETI@home] [sched_op] Server version 701 03-Jul-2012 23:48:53 [SETI@home] Project requested delay of 303 seconds 03-Jul-2012 23:48:53 [SETI@home] [sched_op] handle_scheduler_reply(): got ack for task 21fe11ab.11639.629.8.10.101_2 03-Jul-2012 23:48:53 [SETI@home] [sched_op] Deferring communication for 5 min 3 sec 03-Jul-2012 23:48:53 [SETI@home] [sched_op] Reason: requested by project So it would seem 1271 + 4040 bytes, that times 4531 (tasks Lionel had at last count), is 24064141 bytes (or 22.9MB, which is about correct). |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.