WTF is Going On???? One Machine running out of WUs, No Changes...


log in

Advanced search

Message boards : Number crunching : WTF is Going On???? One Machine running out of WUs, No Changes...

1 · 2 · Next
Author Message
jravin
Send message
Joined: 25 Mar 02
Posts: 941
Credit: 102,369,246
RAC: 87,954
United States
Message 1289197 - Posted: 29 Sep 2012, 8:30:41 UTC

I run 2 crunchers, let's call them F(ermibox2) and U(nimatrix002). They are primarily GPU crunchers, very similar (2 x GTX 460) and have been running for a while with queues from 1K to 2K WUs each. Both have had some problems recently with downloads that take a long while to complete.

Yet now, F has over 1000 WUs, with a handful stuck in transit, while U is down to about 50, all stuck in transit. So the machine is effectively dead. Yet nothing has changed and both can access the Internet OK. F has been running 306.23 drivers for a week or two, U is still running 275.xx, if that matters.

Is there anything I can check to find out what's wrong? Are we back to the SETI router d/l problems of a few months ago? ????

Thanks for any suggestions...
____________

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7331
Credit: 96,775,593
RAC: 67,359
Australia
Message 1289225 - Posted: 29 Sep 2012, 9:39:59 UTC - in response to Message 1289197.

F: could be having problems with that driver, 301.xx is suppose to be the choice here but even with my GTX550 TI's and GTX560 I'll stick with 285.62 drivers myself.

I won't even say anything about the version of BOINC that you are using as that's already been beaten into the ground enough times. ;)

Cheers.
____________

ChrisSibbald
Send message
Joined: 23 Jul 11
Posts: 18
Credit: 23,582,502
RAC: 0
Canada
Message 1289245 - Posted: 29 Sep 2012, 11:49:56 UTC - in response to Message 1289225.

I have not been able to fill my caches on my GPU crunchers in about a week now...very slow downloads. If anyone has any ideas for us to try it is much appreciated.

Cheers,
Chris

jravin
Send message
Joined: 25 Mar 02
Posts: 941
Credit: 102,369,246
RAC: 87,954
United States
Message 1289246 - Posted: 29 Sep 2012, 11:55:30 UTC - in response to Message 1289225.
Last modified: 29 Sep 2012, 12:14:28 UTC

F: could be having problems with that driver, 301.xx is suppose to be the choice here but even with my GTX550 TI's and GTX560 I'll stick with 285.62 drivers myself.

I won't even say anything about the version of BOINC that you are using as that's already been beaten into the ground enough times. ;)

Cheers.


Doubt that it is a driver problem - that wouldn't affect the d/l in any event.

And I don't know if this is relevant, but it sure contributes to the problem - all the WUs I am getting are super-shorties - they run only 2-4minutes instead of 15-25, so I run out faster even when I manage to d/l a few. Bah!
____________

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6259
Credit: 736,784
RAC: 1,146
United States
Message 1289326 - Posted: 29 Sep 2012, 15:11:37 UTC - in response to Message 1289246.
Last modified: 29 Sep 2012, 15:12:32 UTC

See my post here and Chris S. message following it.

It looks like a huge shorty storm. The system is being swamped, and you just have to be patient.
____________
Donald
Infernal Optimist / Submariner, retired

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3514
Credit: 20,703,370
RAC: 22,945
Sweden
Message 1289353 - Posted: 29 Sep 2012, 16:00:52 UTC - in response to Message 1289326.

See my post here and Chris S. message following it.

It looks like a huge shorty storm. The system is being swamped, and you just have to be patient.


Or if you're not capable of being patient, you may find yourself in an institution as a patient :-)
____________

spitfire_mk_2
Avatar
Send message
Joined: 14 Apr 00
Posts: 458
Credit: 12,813,820
RAC: 8,843
United States
Message 1289464 - Posted: 29 Sep 2012, 19:25:05 UTC - in response to Message 1289245.

I have not been able to fill my caches on my GPU crunchers in about a week now...very slow downloads. If anyone has any ideas for us to try it is much appreciated.

Cheers,
Chris

Same here. Today is first day I finally got a bunch of GPU units.

On the other hand, my Einstein@home average is shooting for the moon. ☺
____________

jravin
Send message
Joined: 25 Mar 02
Posts: 941
Credit: 102,369,246
RAC: 87,954
United States
Message 1290541 - Posted: 3 Oct 2012, 3:08:27 UTC

Well - now both my machines are drained, except for a few uploads stalled for the last couple of days. A few go through from time to time, but otherwise NOTHING!

Both machines get (on manual Update) "not requesting new work" despite having nothimg or very little work - they are both set for 5 or more days of work in the queues, but BOINC is blithely ignoring that.

Is that because of the stalled uploads?

Is there anything I can do besides vacuum out the machines yet again?
____________

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3688
Credit: 48,724,275
RAC: 6,700
United States
Message 1290544 - Posted: 3 Oct 2012, 3:20:35 UTC - in response to Message 1290541.



Is that because of the stalled uploads?


Yes, as long as it is more than twice the number of processors on the machine.

Earlier today I had my GTX-670 machine down to only 8 uploads stuck. It is already back up to 84 currently.

Of course the machine is not in danger of running out anytime soon as I managed to cache about 2700 WU's over the weekend.

The GTX-560 machine on the other hand only has 180 WU's on hand.
____________

jravin
Send message
Joined: 25 Mar 02
Posts: 941
Credit: 102,369,246
RAC: 87,954
United States
Message 1290632 - Posted: 3 Oct 2012, 8:27:00 UTC - in response to Message 1290544.



Yes, as long as it is more than twice the number of processors on the machine.



When you say "Processors" here, do you mean CPUs only? Because one of my machines uses 3 CPU threads, and the other, being GPU only, uses 0. Or: if it counts GPUs, too, how does it allow for them? 'Twould seem to be a major flaw if it is CPU only, or if it counts 1 GPU same as a CPU thread, eh? (???).
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5864
Credit: 60,501,953
RAC: 47,532
Australia
Message 1290633 - Posted: 3 Oct 2012, 8:41:44 UTC - in response to Message 1290632.

or if it counts 1 GPU same as a CPU thread, eh? (???).

Yep.

____________
Grant
Darwin NT.

jravin
Send message
Joined: 25 Mar 02
Posts: 941
Credit: 102,369,246
RAC: 87,954
United States
Message 1290641 - Posted: 3 Oct 2012, 9:12:50 UTC - in response to Message 1290633.

or if it counts 1 GPU same as a CPU thread, eh? (???).

Yep.


IMHO that is really stupid, as a vid card is typically worth several (as in 5-10) CPU threads. Can't they be clever enough to figure that out..I mean, what's all the performance data they have for each machine used for???
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5864
Credit: 60,501,953
RAC: 47,532
Australia
Message 1290643 - Posted: 3 Oct 2012, 9:17:31 UTC - in response to Message 1290641.

what's all the performance data they have for each machine used for???

Helping determine how many WUs you need to fill your cache.
____________
Grant
Darwin NT.

jravin
Send message
Joined: 25 Mar 02
Posts: 941
Credit: 102,369,246
RAC: 87,954
United States
Message 1290656 - Posted: 3 Oct 2012, 9:41:54 UTC - in response to Message 1290643.

what's all the performance data they have for each machine used for???

Helping determine how many WUs you need to fill your cache.


Yah, I knew that, but why can't it also be used to be a little more intelligent about when to ask for work - this current screwup is really a pain...
there's no reason I can't have some work to do now, I would think, given the overall status of the project, yes?


____________

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3688
Credit: 48,724,275
RAC: 6,700
United States
Message 1290702 - Posted: 3 Oct 2012, 11:45:48 UTC - in response to Message 1290633.

or if it counts 1 GPU same as a CPU thread, eh? (???).

Yep.


Actually no, it does not count GPU's in that total.

With my Quad-Core, it I have 9+ uploads going it will not request new work.
____________

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 7075
Credit: 27,456,913
RAC: 35,482
United Kingdom
Message 1290703 - Posted: 3 Oct 2012, 11:47:14 UTC - in response to Message 1290656.

what's all the performance data they have for each machine used for???

Helping determine how many WUs you need to fill your cache.


Yah, I knew that, but why can't it also be used to be a little more intelligent about when to ask for work - this current screwup is really a pain...
there's no reason I can't have some work to do now, I would think, given the overall status of the project, yes?


I don't really think it will make much difference at the moment, downloads aren't downloading and uploads... well you get the picture. Even if you could get work, it takes longer to download than it does for a GPU to crunch.

It has always been said that patience isn't just a virtue on the project it is a necessity. Just chill out. Whatever the problem it will eventually be sorted, possibly not today, nor this week or even this month, but it will be sorted.
____________


Today is life, the only life we're sure of. Make the most of today.

jravin
Send message
Joined: 25 Mar 02
Posts: 941
Credit: 102,369,246
RAC: 87,954
United States
Message 1290712 - Posted: 3 Oct 2012, 12:38:41 UTC - in response to Message 1290703.

Not really impatient - I've been through lots of stuff at SETI in the past. But I thought a lot of this crap was fixed by the newer equipment, etc.

Maybe the huge shorty storm we've had recently has clogged the pipes up. I know I've had literally 100s, if not 1000s of shorties lately, and d/l and u/l them has the same number of bytes per, so it surely places a larger demand on the pipes to SETI...


____________

Profile SliverProject donor
Avatar
Send message
Joined: 18 May 11
Posts: 281
Credit: 7,191,152
RAC: 623
United States
Message 1290715 - Posted: 3 Oct 2012, 12:41:19 UTC - in response to Message 1290703.

It has always been said that patience isn't just a virtue on the project it is a necessity. Just chill out. Whatever the problem it will eventually be sorted, possibly not today, nor this week or even this month, but it will be sorted.


+1 Well said.
____________

Profile tullioProject donor
Send message
Joined: 9 Apr 04
Posts: 3754
Credit: 388,028
RAC: 123
Italy
Message 1290716 - Posted: 3 Oct 2012, 12:45:14 UTC

I have two MB jobs on my laptop, one Astropulse on the SUN WS, and a vlar on the Solaris Virtual Machine. I am never out of work since I am running 4 BOINC projects on each Real Machine and one on the Virtual Machine.
Tullio
____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4301
Credit: 1,070,204
RAC: 1,104
United States
Message 1290842 - Posted: 3 Oct 2012, 17:45:54 UTC - in response to Message 1290641.

or if it counts 1 GPU same as a CPU thread, eh? (???).

Yep.


IMHO that is really stupid, as a vid card is typically worth several (as in 5-10) CPU threads. Can't they be clever enough to figure that out..I mean, what's all the performance data they have for each machine used for???

Actually, the count is either CPU or GPU, whichever the host has more of.

The fundamental idea is that it isn't sensible to deliver more work to a host which is processing it faster than results can be returned. The particular multiple used is just what Dr. Anderson originally guessed to be effective for the purpose. The fact that GPUs are faster doesn't alter the fundamental purpose in any way, just makes it likely the limit will be imposed more often.

As long as the downloads are using all available bandwidth, it doesn't make a lot of difference who's getting them from the project view. But longer term, the effect on the morale of the most productive participants may be significant. Then again, if everything ran smoothly all of the time I suspect many participants would get bored.
Joe

1 · 2 · Next

Message boards : Number crunching : WTF is Going On???? One Machine running out of WUs, No Changes...

Copyright © 2014 University of California