It's Back ... HTTP Internal Server Error

Message boards : Number crunching : It's Back ... HTTP Internal Server Error
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1254900 - Posted: 3 Jul 2012, 5:18:57 UTC

Following the last little hiccup (the upload issue), 1 of my boxes started accumulating wus (as did the others) until it could again upload completed wus. This it did. I suspect that Communication was Deferred for some considerable length of time and when that time elapsed it uploaded more than the server can report against. I now have 2,450+ wus that cannot report. This happened following the previous outage and the apparent issue was fixed however the fix seems to have disappeared as I noticed that I could report wu quantities greater than 64. Whilst this machine is still pumping out wus (and can't report) can someone have a look into the patch and why it is not working ...

cheers
ID: 1254900 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1254904 - Posted: 3 Jul 2012, 5:44:54 UTC - in response to Message 1254900.  

Following the last little hiccup (the upload issue), 1 of my boxes started accumulating wus (as did the others) until it could again upload completed wus. This it did. I suspect that Communication was Deferred for some considerable length of time and when that time elapsed it uploaded more than the server can report against. I now have 2,450+ wus that cannot report. This happened following the previous outage and the apparent issue was fixed however the fix seems to have disappeared as I noticed that I could report wu quantities greater than 64. Whilst this machine is still pumping out wus (and can't report) can someone have a look into the patch and why it is not working ...

cheers


I don't think they actually fixed that error yet, they just upped the amount that could report from 64 to 256.

Of course you have to a 6.12.x or higher version to take advantage of the cc_config option.

ID: 1254904 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1254905 - Posted: 3 Jul 2012, 5:49:21 UTC - in response to Message 1254904.  


I thought that that was the fix ... if not, sorry about that ...

However, that doesn't really solve the problem ... the interesting thing here is that when I went on hols o/s for 3 weeks I could come back, turn on the router and upload 1,000s and then report them all in one go ... I suppose those were the good ol' days ...

still leaves me with the HTTP issue though
ID: 1254905 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1255321 - Posted: 4 Jul 2012, 8:01:39 UTC - in response to Message 1254905.  


Now have 3,540 wus that are not reporting due HTTP Internal Server Error ...

When can someone do something about this so these wus can report ??

cheers
ID: 1255321 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51470
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1255331 - Posted: 4 Jul 2012, 8:19:04 UTC - in response to Message 1255321.  


Now have 3,540 wus that are not reporting due HTTP Internal Server Error ...

When can someone do something about this so these wus can report ??

cheers

Only you can at the moment......
Update Boinc to a version that will support max tasks reported and set it at 256 or less.......
Sorry, I do not remember what the minimum version for that is right now.


"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1255331 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1255339 - Posted: 4 Jul 2012, 8:53:05 UTC - in response to Message 1255331.  

Sorry, I do not remember what the minimum version for that is right now.

<max_tasks_reported>N</max_tasks_reported> was added from 6.11.10 onwards.
ID: 1255339 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14658
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1255340 - Posted: 4 Jul 2012, 8:54:33 UTC - in response to Message 1255331.  


Now have 3,540 wus that are not reporting due HTTP Internal Server Error ...

When can someone do something about this so these wus can report ??

cheers

Only you can at the moment......
Update Boinc to a version that will support max tasks reported and set it at 256 or less.......
Sorry, I do not remember what the minimum version for that is right now.

The easiest one to use is v6.12.34 - that's available via the all versions link on the main BOINC download page, and allows you to return to your v6.10.58 comfort zone afterwards if you want. It's a pretty good and stable version.
ID: 1255340 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1255355 - Posted: 4 Jul 2012, 9:38:46 UTC - in response to Message 1255340.  

guys... remember I did that last time ... I even stayed on 6.12.34 to see how it worked but the cache would not fill past 20% of what it should have been so I went back to 6.10.58 ...

there is an issue with the server that needs to be fixed and at the current rate of dysfunctional activity I'll probably be doing this once a month ... it's not a fix ...

L.


ID: 1255355 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14658
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1255373 - Posted: 4 Jul 2012, 10:18:24 UTC - in response to Message 1255355.  

guys... remember I did that last time ... I even stayed on 6.12.34 to see how it worked but the cache would not fill past 20% of what it should have been so I went back to 6.10.58 ...

there is an issue with the server that needs to be fixed and at the current rate of dysfunctional activity I'll probably be doing this once a month ... it's not a fix ...

Agreed. As with most of these things, there needs to be a two-pronged attack.

You need to find a way of reporting the current tasks before they reach deadline - otherwise they will be reissued, the Berkeley bandwidth will be stressed unecessarily, electricity will have been wasted and (both last and least) you won't get the credit you think you've earned.

We need further information, please, so that we can file a proper report on the ongoing problem and have another go at getting it fixed at source.

So (with your current BOINC, before any workround):
Which of your hosts is this - ID, please?
How many tasks have you completed - trying to report?
How many tasks do you still have cached - ready to start?
How big is the file 'sched_request_setiathome.berkeley.edu.xml' (in your BOINC data directory)?
ID: 1255373 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1255597 - Posted: 4 Jul 2012, 19:10:46 UTC - in response to Message 1255340.  


Now have 3,540 wus that are not reporting due HTTP Internal Server Error ...

When can someone do something about this so these wus can report ??

cheers

Only you can at the moment......
Update Boinc to a version that will support max tasks reported and set it at 256 or less.......
Sorry, I do not remember what the minimum version for that is right now.

The easiest one to use is v6.12.34 - that's available via the all versions link on the main BOINC download page, and allows you to return to your v6.10.58 comfort zone afterwards if you want. It's a pretty good and stable version.

That's what I had to do with my GTX 580 box. It had a similar amount of WU's to report after last weekend's glitch. When it had reported in I went back to 6.10.58.

Another tip, while you are reporting the backlog in sent SAH to "No New Tasks" then you don't get the "last contact was less than 5 minutes ago" error. When the tasks are reported you switch back to the earlier BOINC version and allow new tasks.

<I wondered why that box of yours hadn't reported in for a while :-)>

T.A.
ID: 1255597 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1255780 - Posted: 5 Jul 2012, 4:22:09 UTC - in response to Message 1255597.  

now you know why ... :)
ID: 1255780 · Report as offensive
Bernd Noessler

Send message
Joined: 15 Nov 09
Posts: 99
Credit: 52,635,434
RAC: 0
Germany
Message 1255789 - Posted: 5 Jul 2012, 5:57:01 UTC

I don't think the problems are related to the number of tasks you have to report.

I have 2 internet connections. One with an upload of 0.5 MBit/s and the other
with 10 MBit/s.

One of my machines, connected to the fast upload, couldn't report 19 tasks
for hours.
I changed this machine to use the slow connection. Reporting worked in the
1st try.

I switched this machine several times between the connections. Always the same.
The slow connection works, the fast dosn't.


ID: 1255789 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1255825 - Posted: 5 Jul 2012, 8:56:44 UTC - in response to Message 1255373.  

Richard

Local time 6:46pm Melbourne, Australia

Host ID: 6624102 ... it's the one that has 3 x GTX580s

Task trying to report: 4,531

Task cached:
good question...
I have 19,182 in progress, 1780 pending, take out the 4531 above and that leaves 12,871 ... so allowing for error 12,500 or there abouts ...

How big is the file 'sched_request_setiathome.berkeley.edu.xml' ... and the answer is ... 23,627KB

Hope this helps and is there anything else that you would like to know before I do the ol switch a roo ...

Lionel
ID: 1255825 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14658
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1255840 - Posted: 5 Jul 2012, 9:59:23 UTC - in response to Message 1255825.  

Richard

Local time 6:46pm Melbourne, Australia

Host ID: 6624102 ... it's the one that has 3 x GTX580s

Task trying to report: 4,531

Task cached:
good question...
I have 19,182 in progress, 1780 pending, take out the 4531 above and that leaves 12,871 ... so allowing for error 12,500 or there abouts ...

How big is the file 'sched_request_setiathome.berkeley.edu.xml' ... and the answer is ... 23,627KB

Hope this helps and is there anything else that you would like to know before I do the ol switch a roo ...

Lionel

Thanks. I can't think of anything else that would help them track it down, except perhaps one last try at reporting (expecting a server error), with as exact a timestamp as possible - ideally in UTC. The servers keep logs of each event, but as you can imagine there are many tens of thousands of entries to look at, so narrowing the search to a specific time will help.

We'll be looking at host 6624102, I take it?

When you switch clients, remember that the reporting limit is 256 now. Let us know how you get on. I'll drop a line to the powers-that-be.
ID: 1255840 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1255847 - Posted: 5 Jul 2012, 10:35:22 UTC - in response to Message 1255840.  

Thanks. I can't think of anything else that would help them track it down

Perhaps his internet upload speed, how long does it take for him to actually upload the 23MB report file, it may be that the line or the client has timed out by the time his BOINC has uploaded all of that.
ID: 1255847 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14658
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1255852 - Posted: 5 Jul 2012, 10:56:42 UTC - in response to Message 1255847.  
Last modified: 5 Jul 2012, 11:11:48 UTC

Thanks. I can't think of anything else that would help them track it down

Perhaps his internet upload speed, how long does it take for him to actually upload the 23MB report file, it may be that the line or the client has timed out by the time his BOINC has uploaded all of that.

An extract from the message log, showing the times when the request was submitted and the error reply returned, could be helpful there.

It won't be accessible through BOINC Manager if Lionel has already upgraded (though it doesn't look as if he has yet), but older messages can be recovered from stdoutdae.txt even after a BOINC version change.

Edit - if it is a timing problem related to the size of the request file, one last try at a time when the cricket graph shows comms are quiet might be worthwhile.
ID: 1255852 · Report as offensive
KneeDeep

Send message
Joined: 27 Sep 99
Posts: 131
Credit: 4,887,778
RAC: 0
United States
Message 1255898 - Posted: 5 Jul 2012, 14:13:28 UTC - in response to Message 1255852.  

Thanks. I can't think of anything else that would help them track it down

Perhaps his internet upload speed, how long does it take for him to actually upload the 23MB report file, it may be that the line or the client has timed out by the time his BOINC has uploaded all of that.

An extract from the message log, showing the times when the request was submitted and the error reply returned, could be helpful there.

It won't be accessible through BOINC Manager if Lionel has already upgraded (though it doesn't look as if he has yet), but older messages can be recovered from stdoutdae.txt even after a BOINC version change.

Edit - if it is a timing problem related to the size of the request file, one last try at a time when the cricket graph shows comms are quiet might be worthwhile.


I'm curious ... If it is timing out while uploading the request file, would increasing the http_transfer_timeout in cc_config help?
Or does that just apply to the upload/download of wu's?

<http_transfer_timeout>seconds</http_transfer_timeout>
abort HTTP transfers if idle for this many seconds; default 300 New in 6.12.27

ID: 1255898 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1255911 - Posted: 5 Jul 2012, 16:05:07 UTC - in response to Message 1255898.  

Thanks. I can't think of anything else that would help them track it down

Perhaps his internet upload speed, how long does it take for him to actually upload the 23MB report file, it may be that the line or the client has timed out by the time his BOINC has uploaded all of that.

An extract from the message log, showing the times when the request was submitted and the error reply returned, could be helpful there.

It won't be accessible through BOINC Manager if Lionel has already upgraded (though it doesn't look as if he has yet), but older messages can be recovered from stdoutdae.txt even after a BOINC version change.

Edit - if it is a timing problem related to the size of the request file, one last try at a time when the cricket graph shows comms are quiet might be worthwhile.


I'm curious ... If it is timing out while uploading the request file, would increasing the http_transfer_timeout in cc_config help?
Or does that just apply to the upload/download of wu's?

<http_transfer_timeout>seconds</http_transfer_timeout>
abort HTTP transfers if idle for this many seconds; default 300 New in 6.12.27



Only problem is he is running 6.10.58, so that option is not available either.

ID: 1255911 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14658
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1255925 - Posted: 5 Jul 2012, 16:35:24 UTC - in response to Message 1255911.  

Edit - if it is a timing problem related to the size of the request file, one last try at a time when the cricket graph shows comms are quiet might be worthwhile.

I'm curious ... If it is timing out while uploading the request file, would increasing the http_transfer_timeout in cc_config help?
Or does that just apply to the upload/download of wu's?

<http_transfer_timeout>seconds</http_transfer_timeout>
abort HTTP transfers if idle for this many seconds; default 300 New in 6.12.27


Only problem is he is running 6.10.58, so that option is not available either.

Well, he's going to have to install v6.12.34 anyway, the way things stand, so he could explore this at the same time.

BUT: I don't think this is likely to help. If the problem is a timeout within the client (and hence modifiable by cc_config.xml), it's more likely to result in

21-Jun-2012 23:21:15 [SETI@home] [sched_op] Starting scheduler request
21-Jun-2012 23:21:15 [SETI@home] Sending scheduler request: To fetch work.
21-Jun-2012 23:21:15 [SETI@home] Requesting new tasks for NVIDIA GPU
21-Jun-2012 23:21:15 [SETI@home] [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs
21-Jun-2012 23:21:15 [SETI@home] [sched_op] NVIDIA GPU work request: 10744.10 seconds; 0.00 GPUs
21-Jun-2012 23:26:26 [SETI@home] Scheduler request failed: Timeout was reached
21-Jun-2012 23:26:26 [SETI@home] [sched_op] Deferring communication for 1 min 11 sec
21-Jun-2012 23:26:26 [SETI@home] [sched_op] Reason: Scheduler request failed

If there is a timing problem with the uploading of the sched_request.xml file, it's more likely that a server setting would have to be changed - and that's in the hands of Matt/Eric/Jeff, not us.

The message log doesn't give us much insight into the transfer of sched_request files, though we might get a little bit more info from an <http_debug> log. I fear we might have to go right down to <http_xfer_debug>, but that gets very verbose - it writes a line to the log at every MTU interval (typically ~1.4KB), so Lionel would have ~17 thousand lines of log. Probably not all that helpful...
ID: 1255925 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1255940 - Posted: 5 Jul 2012, 16:59:24 UTC - in response to Message 1255925.  
Last modified: 5 Jul 2012, 17:00:54 UTC

The message log doesn't give us much insight into the transfer of sched_request files, though we might get a little bit more info from an <http_debug> log. I fear we might have to go right down to <http_xfer_debug>

No need to reinvent the wheel. I ran with a <http_xfer_debug> and <http_debug> last night due to a bug I am tracing in the present client(s) (nothing to do with the present thread). <http_xfer_debug> itself gives amounts of bytes transported, without exactly stating what they come from.
E.g.
3-Jul-2012 23:47:24 [---] [http_xfer] [ID#76] HTTP: wrote 7300 bytes
03-Jul-2012 23:47:24 [---] [http_xfer] [ID#78] HTTP: wrote 5840 bytes
03-Jul-2012 23:47:24 [---] [http_xfer] [ID#78] HTTP: wrote 7300 bytes
03-Jul-2012 23:47:24 [---] [http_xfer] [ID#78] HTTP: wrote 2920 bytes
03-Jul-2012 23:47:24 [---] [http_xfer] [ID#78] HTTP: wrote 8760 bytes


Reporting one task to Seti gave me this info:
03-Jul-2012 23:48:45 [SETI@home] [sched_op] Starting scheduler request
03-Jul-2012 23:48:45 [SETI@home] Sending scheduler request: To report completed tasks.
03-Jul-2012 23:48:45 [SETI@home] Reporting 1 completed tasks
03-Jul-2012 23:48:45 [SETI@home] Not requesting tasks: some download is stalled
03-Jul-2012 23:48:45 [SETI@home] [sched_op] CPU work request: 0.00 seconds; 0.00 devices
03-Jul-2012 23:48:45 [SETI@home] [sched_op] ATI work request: 0.00 seconds; 0.00 devices
03-Jul-2012 23:48:45 [SETI@home] [http] HTTP_OP::init_post(): http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
03-Jul-2012 23:48:45 [SETI@home] [http] HTTP_OP::libcurl_exec(): ca-bundle set
03-Jul-2012 23:48:45 [SETI@home] [http] [ID#1] Info:  Connection #7 seems to be dead!
03-Jul-2012 23:48:45 [SETI@home] [http] [ID#1] Info:  Closing connection #7
03-Jul-2012 23:48:45 [SETI@home] [http] [ID#1] Info:  Connection #8 seems to be dead!
03-Jul-2012 23:48:45 [SETI@home] [http] [ID#1] Info:  Closing connection #8
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Info:  About to connect() to setiboinc.ssl.berkeley.edu port 80 (#0)
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Info:    Trying 208.68.240.20...
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Info:  Connected to setiboinc.ssl.berkeley.edu (208.68.240.20) port 80 (#0)
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Info:  Connected to setiboinc.ssl.berkeley.edu (208.68.240.20) port 80 (#0)
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: POST /sah_cgi/cgi HTTP/1.1
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.0.31)
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Host: setiboinc.ssl.berkeley.edu
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Accept: */*
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Accept-Encoding: deflate, gzip
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Content-Type: application/x-www-form-urlencoded
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Content-Length: 16955
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: Expect: 100-continue
03-Jul-2012 23:48:46 [SETI@home] [http] [ID#1] Sent header to server: 
03-Jul-2012 23:48:47 [SETI@home] [http] [ID#1] Info:  Done waiting for 100-continue
03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: HTTP/1.1 100 Continue
03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: HTTP/1.1 200 OK
03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: Date: Tue, 03 Jul 2012 21:48:39 GMT
03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: Server: Apache/2.2.17 (Fedora)
03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: Connection: close
03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: Transfer-Encoding: chunked
03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: Content-Type: text/xml
03-Jul-2012 23:48:49 [SETI@home] [http] [ID#1] Received header from server: 
03-Jul-2012 23:48:49 [---] [http_xfer] [ID#1] HTTP: wrote 1271 bytes
03-Jul-2012 23:48:50 [SETI@home] [checkpoint] result 12fe12ad.5129.23789.14.10.22_2 checkpointed
03-Jul-2012 23:48:50 [SETI@home] [checkpoint] result 13fe12aa.24759.7020.12.10.243_1 checkpointed
03-Jul-2012 23:48:52 [---] [http_xfer] [ID#1] HTTP: wrote 4040 bytes
03-Jul-2012 23:48:52 [SETI@home] [http] [ID#1] Info:  Closing connection #0
03-Jul-2012 23:48:53 [SETI@home] Scheduler request completed
03-Jul-2012 23:48:53 [SETI@home] [sched_op] Server version 701
03-Jul-2012 23:48:53 [SETI@home] Project requested delay of 303 seconds
03-Jul-2012 23:48:53 [SETI@home] [sched_op] handle_scheduler_reply(): got ack for task 21fe11ab.11639.629.8.10.101_2
03-Jul-2012 23:48:53 [SETI@home] [sched_op] Deferring communication for 5 min 3 sec
03-Jul-2012 23:48:53 [SETI@home] [sched_op] Reason: requested by project

So it would seem 1271 + 4040 bytes, that times 4531 (tasks Lionel had at last count), is 24064141 bytes (or 22.9MB, which is about correct).
ID: 1255940 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : It's Back ... HTTP Internal Server Error


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.