Posts by woodyrox

21) Message boards : Number crunching : CPU work units download stuck for 2 days? (Message 1132552)
Posted 27 Jul 2011 by Profile woodyrox
Post:
I think the webserver on 208.68.240.13 is down for more than a day now.

If your client does not try the second one (208.68.240.18) you cannot download.

That seems to be true, according to http debug, all my computers were trying *.13 all the time and inserting "208.68.240.18 boinc2.ssl.berkeley.edu" in the hosts file solved the problem.


This is good to know. I figured out cc_config.xml file format and got the communications logs working. My file looks like this:

<cc_config>
	<log_flags>
		<file_xfer_debug>1</file_xfer_debug>
		<http_xfer_debug>1</http_xfer_debug>
	</log_flags>
	
	<options>
		
	</options>
</cc_config>


So I will look for failed host attempts and will edit the hosts file if needed.

thanks
22) Message boards : Number crunching : CPU work units download stuck for 2 days? (Message 1132414)
Posted 27 Jul 2011 by Profile woodyrox
Post:
Joy!

Here's what I did. Advanceced->Preferences->Clear

This reset my preferences to global. Stopped & restarted the client and wham! All the work units downloaded. The only difference I see is that "Use GPU while computer is in use" is not checked. This computer doesn't have a CUDA GPU and I can't see how that made any difference. Anyway I'm up and running, and did not run out of work units.

Thanks for everyone's help.
23) Message boards : Number crunching : CPU work units download stuck for 2 days? (Message 1132410)
Posted 27 Jul 2011 by Profile woodyrox
Post:
Just got handed 7 more work units. Same deal, stuck in my craw.
24) Message boards : Number crunching : CPU work units download stuck for 2 days? (Message 1132400)
Posted 27 Jul 2011 by Profile woodyrox
Post:
So I thought it might be useful to post the message log:

Tue 26 Jul 2011 10:40:13 PM EDT	SETI@home	Started download of 21mr11af.31268.237100.13.10.36
Tue 26 Jul 2011 10:42:13 PM EDT		Project communication failed: attempting access to reference site
Tue 26 Jul 2011 10:42:13 PM EDT	SETI@home	Temporarily failed download of 21ap11ac.3769.1860.16.10.163: HTTP error
Tue 26 Jul 2011 10:42:13 PM EDT	SETI@home	Backing off 3 hr 55 min 12 sec on download of 21ap11ac.3769.1860.16.10.163
Tue 26 Jul 2011 10:42:13 PM EDT	SETI@home	Started download of 21mr11af.30646.238327.12.10.94
Tue 26 Jul 2011 10:42:14 PM EDT		Internet access OK - project servers may be temporarily down.
Tue 26 Jul 2011 10:42:14 PM EDT	SETI@home	Temporarily failed download of 21mr11af.31268.237100.13.10.36: HTTP error
Tue 26 Jul 2011 10:42:14 PM EDT	SETI@home	Backing off 3 hr 34 min 26 sec on download of 21mr11af.31268.237100.13.10.36
Tue 26 Jul 2011 10:44:15 PM EDT		Project communication failed: attempting access to reference site
Tue 26 Jul 2011 10:44:15 PM EDT	SETI@home	Temporarily failed download of 21mr11af.30646.238327.12.10.94: HTTP error
Tue 26 Jul 2011 10:44:15 PM EDT	SETI@home	Backing off 1 hr 28 min 18 sec on download of 21mr11af.30646.238327.12.10.94
Tue 26 Jul 2011 10:44:17 PM EDT		Internet access OK - project servers may be temporarily down.


But you see, the file names in the above log are missing the "_1". I noticed that the correct file names are shown in the Tasks tab. Hope this is helpful.

This problem started all of a sudden. I haven't changed anything on my end.
25) Message boards : Number crunching : CPU work units download stuck for 2 days? (Message 1132384)
Posted 27 Jul 2011 by Profile woodyrox
Post:
I get that sometimes. Rebooting my router (a combination ADSL modem/router/switch) seems to wake up the embedded DNS server which seems to be causing most grief these days.


Rebooted my router, no joy. Rebooted my computer... still glum. I did update the project to report a finished task, and that went through immediately. I'm talking to the servers, but they're not giving me any bits.

But I got you right that the tasks are assigned to you by the scheduler but aren't downloaded? That means they show up in the Tasks tab as "Downloading" and in the Transfers tab as what "Suspended", "Download pending" or what?

That sounds very suspicious; I recently get my downloads through with only a few retries, if any.


Yes, you got that exectly right. Suspicious is the reason I'm posting about it. The scheduler for sure assigned me tasks. You can see them in my task list at:

http://setiathome.berkeley.edu/results.php?hostid=5047831

On the above status screen, all of the tasks are shown "In progress" even though they are downloading.

In my task tab, the tasks say "Downloading". In the "Transfers" tab, the status is Download pending, then Downloading and finally Retry in...

Ummm, I just noticed something and don't know if this is significant. The task file names on the seti task details web page don't match the file names on my Transfers tab. The seti file names have a "_1" appended to the end. My file names match except there is no ending "_1".

Do you have any SETI-related entries in your etc\hosts file?

Did you try some logging flags in cc_config.xml (like <file_xfer_debug>, <http_xfer_debug>)?


There's only my localhost in my hosts file.

I tried adding the debug log levels. I'm not familiar with that file format but looked it up. Here's what I did:

<cc_config>
	<log_flags>
		<file_xfer_debug>
		<http_xfer_debug>
	</log_flags>
	
	<options>
		
	</options>
</cc_config>


Here's what my 6.6.9 version of boinc complained about:

Tue 26 Jul 2011 10:30:05 PM EDT		Unrecognized tag in cc_config.xml: <file_xfer_debug>
Tue 26 Jul 2011 10:30:05 PM EDT		Missing end tag in cc_config.xml
Tue 26 Jul 2011 10:30:05 PM EDT		Starting BOINC client version 6.6.9 for i686-pc-linux-gnu

26) Message boards : Number crunching : CPU work units download stuck for 2 days? (Message 1132307)
Posted 26 Jul 2011 by Profile woodyrox
Post:
The servers can be up and doing their best and still have slow/stuck downloads. You may have heard mention of the cricket graph. Having a look at it you will notice that the bandwidth had been maxed out for a while. The green shows traffic going out of the lab to the world. The dip in activity today was during the servers being down for weekly maintenance.
If your machine is on 24/7 the downloads should get taken care of eventually. If you only have it on during a limited time you may have to resort to hitting the retry button on the download tab to get them to complete.

Sometimes it can be luck of the draw if some of them download while others stay stuck.


Yeah, I've looked at cricket a few times and see the servers are maxed. But I've had problems with downloads before, and symptoms have been different. In the past, the download would stall after a few bytes. Now I'm getting the big goose egg, as in zero bytes for all work units. I haven't seen this before and was wondering if there are possibly other problems. My machine is nearly out of work so I wanted to get ahead of the eventuality of sitting idle.
27) Message boards : Number crunching : CPU work units download stuck for 2 days? (Message 1132254)
Posted 26 Jul 2011 by Profile woodyrox
Post:
Thanks for your reply. This is the cruncher I'm having problems with:

http://setiathome.berkeley.edu/results.php?hostid=5047831

The only errors reported are the 8 work units I aborted after waiting for a day without a byte of transfer. After aborting those, my cruncher was given 9 work units this morning and not a byte of data has yet been downloaded.

I thought this problem might clear up after project maintenance, but no such luck. 9 stuck.
28) Message boards : Number crunching : CPU work units download stuck for 2 days? (Message 1132229)
Posted 26 Jul 2011 by Profile woodyrox
Post:
I'm unable to get work CPU work units for 2 days. GPU tasks downloaded ok in the same time frame. Work units are stuck in "Downloading" status. 0 KB transferred and speed is always 0 KBps. Downloads time out and retry and always no progress.

Checking the server status page shows that download servers, anakin & vader, are up. Upload server, bruno, is shown disabled but uploads work fine here.

Is this a temporary server problem or has my boinc gone bonkers?
29) Questions and Answers : GPU applications : Should I abort work units when swapping out my GPU for a different model? (Message 1131672)
Posted 25 Jul 2011 by Profile woodyrox
Post:
Thank you for your reply.

I read the boards after my problematic GPU switch and did in fact roll back drivers all the way to the 180 series. I turned off one of my monitors, etc. and CPU-Z always reported 256MB. I could not make the work units for the 512MB GPU restart in the 256MB GPU. As soon as I put the old GPU back, the tasks started running again.

So if you say it's OK to keep running tasks for another GPU, I can do that and will only abort if they don't restart or if I see computation/validity errors.
30) Questions and Answers : GPU applications : Should I abort work units when swapping out my GPU for a different model? (Message 1131581)
Posted 25 Jul 2011 by Profile woodyrox
Post:
A while ago, I swapped out a slow 512MB GPU with a faster 256MB GPU. All of the in-progress and downloaded work units refused to run and showed a status of "Waiting for memory". So at first I thought seti wouldn't run on my 256MB GPU. But when I put that card in another box, seti recognized it and loaded it with work units. Everything worked fine.

I saw different work units called cuda_fermi on yet another gpu in my network. I concluded that on the server side, seti looks to see what hardware you have and loads appropriate work units.

My question is... how much flexibility do I have in swapping out my GPUs and not upsetting the apple cart? Would it be better to abort unfinished GPU work units and have seti download new ones whenever I swap out a graphics card? Should I instead only do that in cases where the gpu work units won't run?

Thanks in advance.
31) Message boards : Number crunching : Exceeded elapsed time limit... yikes! (Message 1131342)
Posted 24 Jul 2011 by Profile woodyrox
Post:
Hey all, I loaded the 266 drivers and it's a different world! All of the cuda_fermi estimated times to completion are now much more reasonable. They're over an hour now where before they were under 5 minutes. The run status still says 0.52 CPUs + 1.00 NVIDIA GPUs. But the CPU is not being used 50% by the cuda_fermi task. There's plenty of CPU left to decrement the "To completion" time of the CPU task.

The GT 430 reported two cuda_fermi tasks that are awaiting validation. I think one was mostly completed by the CPU, but the second one was completed by the GPU, and fast. The GPU is grinding through another task right now and won't be long.

Bottom line is -- it's working like it's supposed to. I did not update to boinc 6.12 and don't plan to.

Thanks for your help.
32) Message boards : Number crunching : Exceeded elapsed time limit... yikes! (Message 1131331)
Posted 24 Jul 2011 by Profile woodyrox
Post:
Thanks to everyone for your replies.

I don't know if my GPU is bad. BOINC loads it fine and says this:

Sat 23 Jul 2011 08:04:32 PM EDT		NVIDIA GPU 0: GeForce GT 430 (driver version 27533, CUDA version 4000, compute capability 2.1, 962MB, 179 GFLOPS peak)


CPU-Z says I've got a CUDA GPU, but I don't think it tests performance. I ran clinfo from lunatics, Its output log doesn't seem to show any problems.

Two of my now 3 error runs show a debug dump. One simply says time limit exceeded. I don't know what that means.

I have not switched user and I have not run a remote desktop. I use VNCultra for that but have not used it while running these GPU tasks.

I'll update boinc to 6.12 though I doubt it's a boinc problem. I'll also install 266 drivers and tell you what happens.

If it is a GPU problem, how would I go about confirming that? I ran several video benchmarks and they all seem to think my GPU is ok as they report good video performance.
33) Message boards : Number crunching : Exceeded elapsed time limit... yikes! (Message 1131230)
Posted 24 Jul 2011 by Profile woodyrox
Post:
I posted this in the BOINC GPU forum. They suggested I come here...


So I installed a decent gpu in a not so hot box. The GPU is a PNY GT 430 installed in a Celeron D 356 Vista box. Installed Boinc 6.10.60 and got one CUDA-fermi task. GPU started crunching, and right off the bat I thought something was fishy. Both of the "To completion" timers were running forward -- ie, getting bigger. Also, the CUDA GPU task was showing a very high CPU usage, like 0.52 CPUs + 1.00 NVIDIA GPUs. I don't see CPU numbers this big on my other boinc computers. Like I've got a Pentium E2220 with a lousy 8400 GS GPU and it says 0.09 CPU + 1.00 NVIDIA GPU. So I was surprised to see such a high CPU demand on the low-end box.

I watched the elapsed time & to completion time of the CUDA task and realized that as soon as the "Elapsed time" = "To completion" timer would start counting down. Sure enough that's what happened. That's for the CUDA task timer. The CPU task was still getting bigger because the CPU task wasn't getting as much CPU as the 1.00 it was expecting. That all made sense.

So then, all of a sudden, the task was finished. I looked at the messages and saw:

Sat 23 Jul 2011 10:36:24 PM EDT SETI@home Aborting task 09mr11ag.20908.17409.16.10.203_1: exceeded elapsed time limit 8658.846851


Just when everything seemed to be going gangbusters, the task went belly up. OK, so then my box begged for some more GPU taks and got them. It's crunching one right now, but I can see the exact same thing is gonna happen. The counter is running in the wrong direction and it's expected completion time won't be met. Mark my words.

So my question... what to do?

thanks

PS: The initial "To completion" time estimate is crazy low -- just a few minutes. Ain't no how, now way this box will crunch these data sets that fast. I need more time. 3 hours not 10 minutes!


Previous 20


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.