Message boards :
Number crunching :
Panic Mode On (58) Server problems?
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next
Author | Message |
---|---|
Les Send message Joined: 20 May 99 Posts: 53 Credit: 21,062,237 RAC: 18 |
Joe - Thanks for the suggestion and of course you were right for all of the B3_P1 that made it to my computer, 21 to 26 seconds. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Thanks for that information Joe. Just cleared 8 of mine. I was playing a game for a few hours and finished up and noticed that my 10-day cache was filled. Saw your message about B3_P1 and noticed a few in my cache. Let those run and early-exit, report, and got some new APs right away. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
MikeN Send message Joined: 24 Jan 11 Posts: 319 Credit: 64,719,409 RAC: 85 |
I got 21 AP's last night spread over three separate crunchers. Only one problem, they have estimated completion times of 234 hours (i.e. 10 days) when they will actually run in 12 hours. As a result I am now not getting any other work as BOINC thinks my 10 day cash is full. I am currently running shorties in HP mode. No doubt when I do run the AP's, BOINC will recalculate the completion times on my remaining MB's to be 20 times shorter than it will actually take and fill my cashes to the 50 WU per core limit. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
While not strictly a server problem, the B3_P1 Astropulse problem is here again. The ap_20ap11ad_B3_P1_00338_20111007_07731.wu I'm downloading now has the pattern of damaged data we've seen before on that channel which can be expected to give a "Blanking too much RFI" immediate exit. Might I suggest that people don't try to return too many of these in succession, especially if they're new to AP crunching? So far as I can tell, we still have no outlier detection in the AP validator, so you could still run into APR problems. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Does anyone want to hazard a guess as to when GPU crunching estimates will finally come back to realistic values? The DCF on both of my machines is now around 1.0 & CPU estimates are close enough. But the GPU ones are still way out- and although they slowly get closer to the actual value (although never even remotely near it) as soon as a CPU WU is done, it's back to where it was. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Remember the sequence of events. Problem identified - first attempt at solution - collateral damage - interim precautionary measures - need second (permanent) solution - final staged return to normality - remove precautionary measures. And all sort of other network issues got in the way, as well. We're part way through several of those interim steps. Precautionary measures (limit on work in progress) - has been lifted from '50 CPU tasks' to '50 tasks per CPU core'. Nobody has mentioned whether the GPU version has been raised to 'per GPU' yet - I can't test (only single GPU rigs here). Permanent solution - we need to detect 'outliers', and exclude them from DCF/APR calculations. That's been done for MB, but not yet (so far as I can tell) for AP. With Joe's news about the stuck bit, there will be a lot of outliers around - we need that finishing off. Staged return to normality - we've had two steps so far, x2 and x5. I'd be happy to argue for another x5 during maintenance this coming Tuesday: I think we're ready for that, provided the servers hold together over the weekend, and they can get the outlier test into the AP validator. But all we can do from the outside is observe, advise, warn, cajole..... The decisions will be taken in Berkeley. |
Blake Bonkofsky Send message Joined: 29 Dec 99 Posts: 617 Credit: 46,383,149 RAC: 0 |
I can comment on limits. My machines have been banging off of the limiters for a while. i7 Quad Core with HyperThreading on (8 cores) + 3x GTX460's has been at 400 CPU/1200 GPU. Quad core + Single GTX460 is at 200/400, and i3 dual core with HT is at 200/400 as well. Sooo, it is back to per core/GPU (50/400) |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
I can comment on limits. My machines have been banging off of the limiters for a while. i7 Quad Core with HyperThreading on (8 cores) + 3x GTX460's has been at 400 CPU/1200 GPU. Quad core + Single GTX460 is at 200/400, and i3 dual core with HT is at 200/400 as well. Sooo, it is back to per core/GPU (50/400) Looks about right, based on cache levels I am seeing here. The kitties are a little happier with a bit more kibble in the bowls. So far, so good with the server code adjustments. "Time is simply the mechanism that keeps everything from happening all at once." |
S@NL Etienne Dokkum Send message Joined: 11 Jun 99 Posts: 212 Credit: 43,822,095 RAC: 0 |
Maybe the guys in the lab could comment on how and when AP validation will resume ??? They're sending out new WU's but still no returned result gets validated... |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Server status does show the validator running... Could possibly be a return of the AP validator bug that brought things to a crawl some time ago. "Time is simply the mechanism that keeps everything from happening all at once." |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Could be nearly time for 'That Duck' ... "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36838 Credit: 261,360,520 RAC: 489 |
Could be nearly time for 'That Duck' ... Ahh.., duck hunting season is soon to begin. :D Cheers. |
kittyman Send message Joined: 9 Jul 00 Posts: 51478 Credit: 1,018,363,574 RAC: 1,004 |
Could be nearly time for 'That Duck' ... Ooooooo noooooooooooooooo. Not the return of 'the little yellow fluffy thing who's name shall not be spoken'............. "Time is simply the mechanism that keeps everything from happening all at once." |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
|
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Yep, it looks that way, I started my XP3200+/HD4650/8400 GS host on the Stock apps the other day, at this time it's got 5 validated 6.03 tasks, one being a -9, and APR does not look hugely different from the anonymous platform entry (i think i was running Stock 6.03, but i only have two validations there) The next question is why does 'Number of tasks completed' read one less than 'Consecutive valid tasks', i suppose 'Number of tasks completed' included task Number 0 Claggy |
KWSN Ekky Ekky Ekky Send message Joined: 25 May 99 Posts: 944 Credit: 52,956,491 RAC: 67 |
I have plenty of work here but uploads seem to have stopped at this end. Cricket he say good. Cricket speak with forked tongue? |
MikeN Send message Joined: 24 Jan 11 Posts: 319 Credit: 64,719,409 RAC: 85 |
I cannot remember the last time the daily cricket graph looked the way it does at present - a perfect, solid mass of green. Long may it continue. http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d%3Aw%3Am%3Ay;view=Octets |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Ok, whoever it is that have "the yellow one" in custody, prepare to bring it out in about 30 minutes. You mean ? |
KWSN Ekky Ekky Ekky Send message Joined: 25 May 99 Posts: 944 Credit: 52,956,491 RAC: 67 |
I cannot remember the last time the daily cricket graph looked the way it does at present - a perfect, solid mass of green. Long may it continue. But if nothing is actually uploading? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Yep, it looks that way, I started my XP3200+/HD4650/8400 GS host on the Stock apps the other day, at this time it's got 5 validated 6.03 tasks, one being a -9, As you know, I reverted my host 3751792 to stock when you reported the 'no work to stock GPUs' bug earlier this week. I'm seeing correct runtime estimates with a DCF currently around 1.2 (it'll drop back again when I reach the next batch of shorties). Since I'm running GPU only, and the card is well fast enough to show APR/estimate anomalies, I'm assuming that the 'non-anonymous-platform' bits of [trac]changeset:24217[/trac] were never applied here, even though we know the change in the ratio limit from 2 to 10 is active. I think the verdict on the APR outlier code is the old Scottish standby of 'not proven', but we ought to test it sometime, for both the stock and anonymous platform cases. On the task counts, I guess it's an initialisation issue like the one I got him to fix in [trac]changeset:23637[/trac]. Feel free..... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.