Message boards :
Number crunching :
WTF is Going On???? One Machine running out of WUs, No Changes...
Message board moderation
Author | Message |
---|---|
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
I run 2 crunchers, let's call them F(ermibox2) and U(nimatrix002). They are primarily GPU crunchers, very similar (2 x GTX 460) and have been running for a while with queues from 1K to 2K WUs each. Both have had some problems recently with downloads that take a long while to complete. Yet now, F has over 1000 WUs, with a handful stuck in transit, while U is down to about 50, all stuck in transit. So the machine is effectively dead. Yet nothing has changed and both can access the Internet OK. F has been running 306.23 drivers for a week or two, U is still running 275.xx, if that matters. Is there anything I can check to find out what's wrong? Are we back to the SETI router d/l problems of a few months ago? ???? Thanks for any suggestions... |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
F: could be having problems with that driver, 301.xx is suppose to be the choice here but even with my GTX550 TI's and GTX560 I'll stick with 285.62 drivers myself. I won't even say anything about the version of BOINC that you are using as that's already been beaten into the ground enough times. ;) Cheers. |
ChrisSibbald Send message Joined: 23 Jul 11 Posts: 18 Credit: 23,582,502 RAC: 0 |
I have not been able to fill my caches on my GPU crunchers in about a week now...very slow downloads. If anyone has any ideas for us to try it is much appreciated. Cheers, Chris |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
F: could be having problems with that driver, 301.xx is suppose to be the choice here but even with my GTX550 TI's and GTX560 I'll stick with 285.62 drivers myself. Doubt that it is a driver problem - that wouldn't affect the d/l in any event. And I don't know if this is relevant, but it sure contributes to the problem - all the WUs I am getting are super-shorties - they run only 2-4minutes instead of 15-25, so I run out faster even when I manage to d/l a few. Bah! |
Donald L. Johnson Send message Joined: 5 Aug 02 Posts: 8240 Credit: 14,654,533 RAC: 20 |
See my post here and Chris S. message following it. It looks like a huge shorty storm. The system is being swamped, and you just have to be patient. Donald Infernal Optimist / Submariner, retired |
spitfire_mk_2 Send message Joined: 14 Apr 00 Posts: 563 Credit: 27,306,885 RAC: 0 |
I have not been able to fill my caches on my GPU crunchers in about a week now...very slow downloads. If anyone has any ideas for us to try it is much appreciated. Same here. Today is first day I finally got a bunch of GPU units. On the other hand, my Einstein@home average is shooting for the moon. ☺ |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Well - now both my machines are drained, except for a few uploads stalled for the last couple of days. A few go through from time to time, but otherwise NOTHING! Both machines get (on manual Update) "not requesting new work" despite having nothimg or very little work - they are both set for 5 or more days of work in the queues, but BOINC is blithely ignoring that. Is that because of the stalled uploads? Is there anything I can do besides vacuum out the machines yet again? |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
Yes, as long as it is more than twice the number of processors on the machine. Earlier today I had my GTX-670 machine down to only 8 uploads stuck. It is already back up to 84 currently. Of course the machine is not in danger of running out anytime soon as I managed to cache about 2700 WU's over the weekend. The GTX-560 machine on the other hand only has 180 WU's on hand. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
When you say "Processors" here, do you mean CPUs only? Because one of my machines uses 3 CPU threads, and the other, being GPU only, uses 0. Or: if it counts GPUs, too, how does it allow for them? 'Twould seem to be a major flaw if it is CPU only, or if it counts 1 GPU same as a CPU thread, eh? (???). |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
or if it counts 1 GPU same as a CPU thread, eh? (???). Yep. Grant Darwin NT |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
or if it counts 1 GPU same as a CPU thread, eh? (???). IMHO that is really stupid, as a vid card is typically worth several (as in 5-10) CPU threads. Can't they be clever enough to figure that out..I mean, what's all the performance data they have for each machine used for??? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13722 Credit: 208,696,464 RAC: 304 |
what's all the performance data they have for each machine used for??? Helping determine how many WUs you need to fill your cache. Grant Darwin NT |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
what's all the performance data they have for each machine used for??? Yah, I knew that, but why can't it also be used to be a little more intelligent about when to ask for work - this current screwup is really a pain... there's no reason I can't have some work to do now, I would think, given the overall status of the project, yes? |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
what's all the performance data they have for each machine used for??? I don't really think it will make much difference at the moment, downloads aren't downloading and uploads... well you get the picture. Even if you could get work, it takes longer to download than it does for a GPU to crunch. It has always been said that patience isn't just a virtue on the project it is a necessity. Just chill out. Whatever the problem it will eventually be sorted, possibly not today, nor this week or even this month, but it will be sorted. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
Not really impatient - I've been through lots of stuff at SETI in the past. But I thought a lot of this crap was fixed by the newer equipment, etc. Maybe the huge shorty storm we've had recently has clogged the pipes up. I know I've had literally 100s, if not 1000s of shorties lately, and d/l and u/l them has the same number of bytes per, so it surely places a larger demand on the pipes to SETI... |
Akio Send message Joined: 18 May 11 Posts: 375 Credit: 32,129,242 RAC: 0 |
It has always been said that patience isn't just a virtue on the project it is a necessity. Just chill out. Whatever the problem it will eventually be sorted, possibly not today, nor this week or even this month, but it will be sorted. +1 Well said. |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
I have two MB jobs on my laptop, one Astropulse on the SUN WS, and a vlar on the Solaris Virtual Machine. I am never out of work since I am running 4 BOINC projects on each Real Machine and one on the Virtual Machine. Tullio |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
or if it counts 1 GPU same as a CPU thread, eh? (???). Actually, the count is either CPU or GPU, whichever the host has more of. The fundamental idea is that it isn't sensible to deliver more work to a host which is processing it faster than results can be returned. The particular multiple used is just what Dr. Anderson originally guessed to be effective for the purpose. The fact that GPUs are faster doesn't alter the fundamental purpose in any way, just makes it likely the limit will be imposed more often. As long as the downloads are using all available bandwidth, it doesn't make a lot of difference who's getting them from the project view. But longer term, the effect on the morale of the most productive participants may be significant. Then again, if everything ran smoothly all of the time I suspect many participants would get bored. Joe |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
An interesting phenomenon now: Both machines have (finally!) uploaded all their results, but BOINC thinks that one has 63 WUs "In Progress", and the other, 120. Is this an example of "ghost" Wus, that BOINC thinks were sent, but never received by my babies? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.