Message boards :
Number crunching :
CPU work units download stuck for 2 days?
Message board moderation
Author | Message |
---|---|
woodyrox Send message Joined: 7 Apr 01 Posts: 34 Credit: 16,069,169 RAC: 0 |
I'm unable to get work CPU work units for 2 days. GPU tasks downloaded ok in the same time frame. Work units are stuck in "Downloading" status. 0 KB transferred and speed is always 0 KBps. Downloads time out and retry and always no progress. Checking the server status page shows that download servers, anakin & vader, are up. Upload server, bruno, is shown disabled but uploads work fine here. Is this a temporary server problem or has my boinc gone bonkers? |
rob smith Send message Joined: 7 Mar 03 Posts: 22231 Credit: 416,307,556 RAC: 380 |
There is no difference between MB tasks destined for either CPU or GPU, so its going to be down to BOINC, at your end, not requesting CPU tasks. A quick look suggests that a couple of your crunchers have had a lot of errors recently, and if the error rate gets too high BOINC does cut down the request rate for the affected processor. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
woodyrox Send message Joined: 7 Apr 01 Posts: 34 Credit: 16,069,169 RAC: 0 |
Thanks for your reply. This is the cruncher I'm having problems with: http://setiathome.berkeley.edu/results.php?hostid=5047831 The only errors reported are the 8 work units I aborted after waiting for a day without a byte of transfer. After aborting those, my cruncher was given 9 work units this morning and not a byte of data has yet been downloaded. I thought this problem might clear up after project maintenance, but no such luck. 9 stuck. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I'm unable to get work CPU work units for 2 days. GPU tasks downloaded ok in the same time frame. Work units are stuck in "Downloading" status. 0 KB transferred and speed is always 0 KBps. Downloads time out and retry and always no progress. The servers can be up and doing their best and still have slow/stuck downloads. You may have heard mention of the cricket graph. Having a look at it you will notice that the bandwidth had been maxed out for a while. The green shows traffic going out of the lab to the world. The dip in activity today was during the servers being down for weekly maintenance. If your machine is on 24/7 the downloads should get taken care of eventually. If you only have it on during a limited time you may have to resort to hitting the retry button on the download tab to get them to complete. Sometimes it can be luck of the draw if some of them download while others stay stuck. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
I'm unable to get work CPU work units for 2 days. GPU tasks downloaded ok in the same time frame. Work units are stuck in "Downloading" status. 0 KB transferred and speed is always 0 KBps. Downloads time out and retry and always no progress. If there was a way to send you some of my VLARs, I will be happy to send them to you. My A-SYS has not had any cuda work since last Wednesday (7/20) just VLAR & AP work. It keeps asking for CUDA, but the scheduler keeps saying no joy. On the other hand the B-SYS keeps sucking them up a vacumn cleaner on steroids, getting over 200 today alone, with 47 since the servers came back online. There has been traffic problems, but I keep abusing the retry button and eventually everything get d/l. I don't buy computers, I build them!! |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
I'm unable to get work CPU work units for 2 days. GPU tasks downloaded ok in the same time frame. Work units are stuck in "Downloading" status. 0 KB transferred and speed is always 0 KBps. Downloads time out and retry and always no progress. Hmm... having the same problem since about a day on all my machines, on two of them I get connect() failed and on my laptop HTTP error. Even using the retry button during the outage, trying to connect thru university proxy and all the usual stuff like rebooting didn't help. I can ping and tracert both download servers, but I don't get even a single byte of a WU from them and also not the fedora test page. |
woodyrox Send message Joined: 7 Apr 01 Posts: 34 Credit: 16,069,169 RAC: 0 |
The servers can be up and doing their best and still have slow/stuck downloads. You may have heard mention of the cricket graph. Having a look at it you will notice that the bandwidth had been maxed out for a while. The green shows traffic going out of the lab to the world. The dip in activity today was during the servers being down for weekly maintenance. Yeah, I've looked at cricket a few times and see the servers are maxed. But I've had problems with downloads before, and symptoms have been different. In the past, the download would stall after a few bytes. Now I'm getting the big goose egg, as in zero bytes for all work units. I haven't seen this before and was wondering if there are possibly other problems. My machine is nearly out of work so I wanted to get ahead of the eventuality of sitting idle. |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
But I got you right that the tasks are assigned to you by the scheduler but aren't downloaded? That means they show up in the Tasks tab as "Downloading" and in the Transfers tab as what "Suspended", "Download pending" or what? That sounds very suspicious; I recently get my downloads through with only a few retries, if any. When did you last reboot that machine(s) [edit]and the router, as Richard says:-)[/edit]? Do you have any SETI-related entries in your etc\hosts file? Did you try some logging flags in cc_config.xml (like <file_xfer_debug>, <http_xfer_debug>)? Gruß, Gundolf |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
The servers can be up and doing their best and still have slow/stuck downloads. You may have heard mention of the cricket graph. Having a look at it you will notice that the bandwidth had been maxed out for a while. The green shows traffic going out of the lab to the world. The dip in activity today was during the servers being down for weekly maintenance. I get that sometimes. Rebooting my router (a combination ADSL modem/router/switch) seems to wake up the embedded DNS server which seems to be causing most grief these days. |
woodyrox Send message Joined: 7 Apr 01 Posts: 34 Credit: 16,069,169 RAC: 0 |
I get that sometimes. Rebooting my router (a combination ADSL modem/router/switch) seems to wake up the embedded DNS server which seems to be causing most grief these days. Rebooted my router, no joy. Rebooted my computer... still glum. I did update the project to report a finished task, and that went through immediately. I'm talking to the servers, but they're not giving me any bits. But I got you right that the tasks are assigned to you by the scheduler but aren't downloaded? That means they show up in the Tasks tab as "Downloading" and in the Transfers tab as what "Suspended", "Download pending" or what? Yes, you got that exectly right. Suspicious is the reason I'm posting about it. The scheduler for sure assigned me tasks. You can see them in my task list at: http://setiathome.berkeley.edu/results.php?hostid=5047831 On the above status screen, all of the tasks are shown "In progress" even though they are downloading. In my task tab, the tasks say "Downloading". In the "Transfers" tab, the status is Download pending, then Downloading and finally Retry in... Ummm, I just noticed something and don't know if this is significant. The task file names on the seti task details web page don't match the file names on my Transfers tab. The seti file names have a "_1" appended to the end. My file names match except there is no ending "_1". Do you have any SETI-related entries in your etc\hosts file? There's only my localhost in my hosts file. I tried adding the debug log levels. I'm not familiar with that file format but looked it up. Here's what I did: <cc_config> <log_flags> <file_xfer_debug> <http_xfer_debug> </log_flags> <options> </options> </cc_config> Here's what my 6.6.9 version of boinc complained about: Tue 26 Jul 2011 10:30:05 PM EDT Unrecognized tag in cc_config.xml: <file_xfer_debug> Tue 26 Jul 2011 10:30:05 PM EDT Missing end tag in cc_config.xml Tue 26 Jul 2011 10:30:05 PM EDT Starting BOINC client version 6.6.9 for i686-pc-linux-gnu |
woodyrox Send message Joined: 7 Apr 01 Posts: 34 Credit: 16,069,169 RAC: 0 |
So I thought it might be useful to post the message log: Tue 26 Jul 2011 10:40:13 PM EDT SETI@home Started download of 21mr11af.31268.237100.13.10.36 Tue 26 Jul 2011 10:42:13 PM EDT Project communication failed: attempting access to reference site Tue 26 Jul 2011 10:42:13 PM EDT SETI@home Temporarily failed download of 21ap11ac.3769.1860.16.10.163: HTTP error Tue 26 Jul 2011 10:42:13 PM EDT SETI@home Backing off 3 hr 55 min 12 sec on download of 21ap11ac.3769.1860.16.10.163 Tue 26 Jul 2011 10:42:13 PM EDT SETI@home Started download of 21mr11af.30646.238327.12.10.94 Tue 26 Jul 2011 10:42:14 PM EDT Internet access OK - project servers may be temporarily down. Tue 26 Jul 2011 10:42:14 PM EDT SETI@home Temporarily failed download of 21mr11af.31268.237100.13.10.36: HTTP error Tue 26 Jul 2011 10:42:14 PM EDT SETI@home Backing off 3 hr 34 min 26 sec on download of 21mr11af.31268.237100.13.10.36 Tue 26 Jul 2011 10:44:15 PM EDT Project communication failed: attempting access to reference site Tue 26 Jul 2011 10:44:15 PM EDT SETI@home Temporarily failed download of 21mr11af.30646.238327.12.10.94: HTTP error Tue 26 Jul 2011 10:44:15 PM EDT SETI@home Backing off 1 hr 28 min 18 sec on download of 21mr11af.30646.238327.12.10.94 Tue 26 Jul 2011 10:44:17 PM EDT Internet access OK - project servers may be temporarily down. But you see, the file names in the above log are missing the "_1". I noticed that the correct file names are shown in the Tasks tab. Hope this is helpful. This problem started all of a sudden. I haven't changed anything on my end. |
woodyrox Send message Joined: 7 Apr 01 Posts: 34 Credit: 16,069,169 RAC: 0 |
Just got handed 7 more work units. Same deal, stuck in my craw. |
woodyrox Send message Joined: 7 Apr 01 Posts: 34 Credit: 16,069,169 RAC: 0 |
Joy! Here's what I did. Advanceced->Preferences->Clear This reset my preferences to global. Stopped & restarted the client and wham! All the work units downloaded. The only difference I see is that "Use GPU while computer is in use" is not checked. This computer doesn't have a CUDA GPU and I can't see how that made any difference. Anyway I'm up and running, and did not run out of work units. Thanks for everyone's help. |
Bernd Noessler Send message Joined: 15 Nov 09 Posts: 99 Credit: 52,635,434 RAC: 0 |
I think the webserver on 208.68.240.13 is down for more than a day now. If your client does not try the second one (208.68.240.18) you cannot download. |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
I think the webserver on 208.68.240.13 is down for more than a day now. That seems to be true, according to http debug, all my computers were trying *.13 all the time and inserting "208.68.240.18 boinc2.ssl.berkeley.edu" in the hosts file solved the problem. |
woodyrox Send message Joined: 7 Apr 01 Posts: 34 Credit: 16,069,169 RAC: 0 |
I think the webserver on 208.68.240.13 is down for more than a day now. This is good to know. I figured out cc_config.xml file format and got the communications logs working. My file looks like this: <cc_config> <log_flags> <file_xfer_debug>1</file_xfer_debug> <http_xfer_debug>1</http_xfer_debug> </log_flags> <options> </options> </cc_config> So I will look for failed host attempts and will edit the hosts file if needed. thanks |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Joy! Computers... sometimes you just want to toss them out a window. Glad it wants to play now. for future reference your cc_config.xml would look something like this for the logging flags. Where 1 turns on the option and 0 turns it off. <cc_config> <log_flags> <task>0</task> <file_xfer>0</file_xfer> <sched_ops>1</sched_ops> <coproc_debug>0</coproc_debug> <cpu_sched>0</cpu_sched> <cpu_sched_debug>0</cpu_sched_debug> <dcf_debug>0</dcf_debug> <sched_op_debug>0</sched_op_debug> <state_debug>0</state_debug> <http_debug>0</http_debug> <http_xfer_debug>0</http_xfer_debug> </log_flags> <options> <no_gpus>0</no_gpus> <allow_remote_gui_rpc>1</allow_remote_gui_rpc> <save_stats_days>180</save_stats_days> </options> </cc_config> Edit: Ok fine you already figure it out. :) SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Link Send message Joined: 18 Sep 03 Posts: 834 Credit: 1,807,369 RAC: 0 |
Since it's not the first time that we have problems like that here, I wonder if it would not cause less problems if SETI had two different download server URLs, for example dl1.ssl.berkeley.edu and dl2.ssl.berkeley.edu and send both as possible download locations like rosetta is doing for example: <url>http://srv3.bakerlab.org/rosetta/download/262/avgE_from_pdb.gz</url> So for a SETI WU it could be: <url>http://dl1.ssl.berkeley.edu/sah/download_fanout/61/08ap11ae.3480.1703.14.10.29</url> Don't know how the load balancing works in that case, if the BOINC client picks just one of them, than that would be pretty easy, not need for any big server side changes. If the client starts from the top and tries one after the other, than the sheduler would have to send dl1,dl2 to all even number results (_0, _2,...) and dl2,dl1 to all odd number results. I think it might work better that the current way... but I might be wrong of course. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.