WTF is Going On???? One Machine running out of WUs, No Changes...

Message boards : Number crunching : WTF is Going On???? One Machine running out of WUs, No Changes...
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1289197 - Posted: 29 Sep 2012, 8:30:41 UTC

I run 2 crunchers, let's call them F(ermibox2) and U(nimatrix002). They are primarily GPU crunchers, very similar (2 x GTX 460) and have been running for a while with queues from 1K to 2K WUs each. Both have had some problems recently with downloads that take a long while to complete.

Yet now, F has over 1000 WUs, with a handful stuck in transit, while U is down to about 50, all stuck in transit. So the machine is effectively dead. Yet nothing has changed and both can access the Internet OK. F has been running 306.23 drivers for a week or two, U is still running 275.xx, if that matters.

Is there anything I can check to find out what's wrong? Are we back to the SETI router d/l problems of a few months ago? ????

Thanks for any suggestions...
ID: 1289197 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1289225 - Posted: 29 Sep 2012, 9:39:59 UTC - in response to Message 1289197.  

F: could be having problems with that driver, 301.xx is suppose to be the choice here but even with my GTX550 TI's and GTX560 I'll stick with 285.62 drivers myself.

I won't even say anything about the version of BOINC that you are using as that's already been beaten into the ground enough times. ;)

Cheers.
ID: 1289225 · Report as offensive
ChrisSibbald

Send message
Joined: 23 Jul 11
Posts: 18
Credit: 23,582,502
RAC: 0
Canada
Message 1289245 - Posted: 29 Sep 2012, 11:49:56 UTC - in response to Message 1289225.  

I have not been able to fill my caches on my GPU crunchers in about a week now...very slow downloads. If anyone has any ideas for us to try it is much appreciated.

Cheers,
Chris
ID: 1289245 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1289246 - Posted: 29 Sep 2012, 11:55:30 UTC - in response to Message 1289225.  
Last modified: 29 Sep 2012, 12:14:28 UTC

F: could be having problems with that driver, 301.xx is suppose to be the choice here but even with my GTX550 TI's and GTX560 I'll stick with 285.62 drivers myself.

I won't even say anything about the version of BOINC that you are using as that's already been beaten into the ground enough times. ;)

Cheers.


Doubt that it is a driver problem - that wouldn't affect the d/l in any event.

And I don't know if this is relevant, but it sure contributes to the problem - all the WUs I am getting are super-shorties - they run only 2-4minutes instead of 15-25, so I run out faster even when I manage to d/l a few. Bah!
ID: 1289246 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1289326 - Posted: 29 Sep 2012, 15:11:37 UTC - in response to Message 1289246.  
Last modified: 29 Sep 2012, 15:12:32 UTC

See my post here and Chris S. message following it.

It looks like a huge shorty storm. The system is being swamped, and you just have to be patient.
Donald
Infernal Optimist / Submariner, retired
ID: 1289326 · Report as offensive
spitfire_mk_2
Avatar

Send message
Joined: 14 Apr 00
Posts: 563
Credit: 27,306,885
RAC: 0
United States
Message 1289464 - Posted: 29 Sep 2012, 19:25:05 UTC - in response to Message 1289245.  

I have not been able to fill my caches on my GPU crunchers in about a week now...very slow downloads. If anyone has any ideas for us to try it is much appreciated.

Cheers,
Chris

Same here. Today is first day I finally got a bunch of GPU units.

On the other hand, my Einstein@home average is shooting for the moon. ☺
ID: 1289464 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1290541 - Posted: 3 Oct 2012, 3:08:27 UTC

Well - now both my machines are drained, except for a few uploads stalled for the last couple of days. A few go through from time to time, but otherwise NOTHING!

Both machines get (on manual Update) "not requesting new work" despite having nothimg or very little work - they are both set for 5 or more days of work in the queues, but BOINC is blithely ignoring that.

Is that because of the stalled uploads?

Is there anything I can do besides vacuum out the machines yet again?
ID: 1290541 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1290544 - Posted: 3 Oct 2012, 3:20:35 UTC - in response to Message 1290541.  



Is that because of the stalled uploads?


Yes, as long as it is more than twice the number of processors on the machine.

Earlier today I had my GTX-670 machine down to only 8 uploads stuck. It is already back up to 84 currently.

Of course the machine is not in danger of running out anytime soon as I managed to cache about 2700 WU's over the weekend.

The GTX-560 machine on the other hand only has 180 WU's on hand.

ID: 1290544 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1290632 - Posted: 3 Oct 2012, 8:27:00 UTC - in response to Message 1290544.  



Yes, as long as it is more than twice the number of processors on the machine.



When you say "Processors" here, do you mean CPUs only? Because one of my machines uses 3 CPU threads, and the other, being GPU only, uses 0. Or: if it counts GPUs, too, how does it allow for them? 'Twould seem to be a major flaw if it is CPU only, or if it counts 1 GPU same as a CPU thread, eh? (???).
ID: 1290632 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1290633 - Posted: 3 Oct 2012, 8:41:44 UTC - in response to Message 1290632.  

or if it counts 1 GPU same as a CPU thread, eh? (???).

Yep.

Grant
Darwin NT
ID: 1290633 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1290641 - Posted: 3 Oct 2012, 9:12:50 UTC - in response to Message 1290633.  

or if it counts 1 GPU same as a CPU thread, eh? (???).

Yep.


IMHO that is really stupid, as a vid card is typically worth several (as in 5-10) CPU threads. Can't they be clever enough to figure that out..I mean, what's all the performance data they have for each machine used for???
ID: 1290641 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1290643 - Posted: 3 Oct 2012, 9:17:31 UTC - in response to Message 1290641.  

what's all the performance data they have for each machine used for???

Helping determine how many WUs you need to fill your cache.
Grant
Darwin NT
ID: 1290643 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1290656 - Posted: 3 Oct 2012, 9:41:54 UTC - in response to Message 1290643.  

what's all the performance data they have for each machine used for???

Helping determine how many WUs you need to fill your cache.


Yah, I knew that, but why can't it also be used to be a little more intelligent about when to ask for work - this current screwup is really a pain...
there's no reason I can't have some work to do now, I would think, given the overall status of the project, yes?


ID: 1290656 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1290702 - Posted: 3 Oct 2012, 11:45:48 UTC - in response to Message 1290633.  

or if it counts 1 GPU same as a CPU thread, eh? (???).

Yep.


Actually no, it does not count GPU's in that total.

With my Quad-Core, it I have 9+ uploads going it will not request new work.

ID: 1290702 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1290703 - Posted: 3 Oct 2012, 11:47:14 UTC - in response to Message 1290656.  

what's all the performance data they have for each machine used for???

Helping determine how many WUs you need to fill your cache.


Yah, I knew that, but why can't it also be used to be a little more intelligent about when to ask for work - this current screwup is really a pain...
there's no reason I can't have some work to do now, I would think, given the overall status of the project, yes?


I don't really think it will make much difference at the moment, downloads aren't downloading and uploads... well you get the picture. Even if you could get work, it takes longer to download than it does for a GPU to crunch.

It has always been said that patience isn't just a virtue on the project it is a necessity. Just chill out. Whatever the problem it will eventually be sorted, possibly not today, nor this week or even this month, but it will be sorted.
ID: 1290703 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1290712 - Posted: 3 Oct 2012, 12:38:41 UTC - in response to Message 1290703.  

Not really impatient - I've been through lots of stuff at SETI in the past. But I thought a lot of this crap was fixed by the newer equipment, etc.

Maybe the huge shorty storm we've had recently has clogged the pipes up. I know I've had literally 100s, if not 1000s of shorties lately, and d/l and u/l them has the same number of bytes per, so it surely places a larger demand on the pipes to SETI...


ID: 1290712 · Report as offensive
Profile Akio
Avatar

Send message
Joined: 18 May 11
Posts: 375
Credit: 32,129,242
RAC: 0
United States
Message 1290715 - Posted: 3 Oct 2012, 12:41:19 UTC - in response to Message 1290703.  

It has always been said that patience isn't just a virtue on the project it is a necessity. Just chill out. Whatever the problem it will eventually be sorted, possibly not today, nor this week or even this month, but it will be sorted.


+1 Well said.
ID: 1290715 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1290716 - Posted: 3 Oct 2012, 12:45:14 UTC

I have two MB jobs on my laptop, one Astropulse on the SUN WS, and a vlar on the Solaris Virtual Machine. I am never out of work since I am running 4 BOINC projects on each Real Machine and one on the Virtual Machine.
Tullio
ID: 1290716 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1290842 - Posted: 3 Oct 2012, 17:45:54 UTC - in response to Message 1290641.  

or if it counts 1 GPU same as a CPU thread, eh? (???).

Yep.


IMHO that is really stupid, as a vid card is typically worth several (as in 5-10) CPU threads. Can't they be clever enough to figure that out..I mean, what's all the performance data they have for each machine used for???

Actually, the count is either CPU or GPU, whichever the host has more of.

The fundamental idea is that it isn't sensible to deliver more work to a host which is processing it faster than results can be returned. The particular multiple used is just what Dr. Anderson originally guessed to be effective for the purpose. The fact that GPUs are faster doesn't alter the fundamental purpose in any way, just makes it likely the limit will be imposed more often.

As long as the downloads are using all available bandwidth, it doesn't make a lot of difference who's getting them from the project view. But longer term, the effect on the morale of the most productive participants may be significant. Then again, if everything ran smoothly all of the time I suspect many participants would get bored.
                                                                  Joe
ID: 1290842 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1290894 - Posted: 3 Oct 2012, 20:06:43 UTC

An interesting phenomenon now:

Both machines have (finally!) uploaded all their results, but BOINC thinks that one has 63 WUs "In Progress", and the other, 120. Is this an example of "ghost" Wus, that BOINC thinks were sent, but never received by my babies?
ID: 1290894 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : WTF is Going On???? One Machine running out of WUs, No Changes...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.