Panic Mode On (74) Server problems?

Author	Message
red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0	Message 1230092 - Posted: 10 May 2012, 17:57:42 UTC - in response to Message 1230081. Last modified: 10 May 2012, 18:00:21 UTC If you stop BOINC and set a bigish duration_correction_factor you will just get CPU work for a while. The reason my 980X gets CPU WUs is the DCF jumps to 6 when a slow GPU finishes and the system just asks for CPU WUs 'till it drops. Wow, the 980X hast just hit 4,000 WUs cached. ID: 1230092 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1230096 - Posted: 10 May 2012, 18:05:48 UTC - in response to Message 1230092. If you stop BOINC and set a bigish duration_correction_factor you will just get CPU work for a while. The reason my 980X gets CPU WUs is the DCF jumps to 6 when a slow GPU finishes and the system just asks for CPU WUs 'till it drops. Wow, the 980X hast just hit 4,000 WUs cached. I have enough GPU work to last a bit, so I am going to do the 'uncheck use nvidia GPU' trick to get some CPU work flowing. But, that is a workaround, and should not be necessary. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1230096 ·

LadyL Volunteer tester Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0	Message 1230097 - Posted: 10 May 2012, 18:06:25 UTC - in response to Message 1230072. I would like to see a bigger fifo so fewer requests are needed to replenish the cache. It's called the feeder. The usual workaround is to disable the resource in the project prefs that is getting all the tasks, until the 'slower' has some sort of cache. The other option would be to reduce cache, allow the slower resource to catch up and then gradually increase cache again. It will eventually get sorted by itself, but if you have a large cache to fill, it may take quite a while until you have single resource requests again instead of double ones. I'm not the Pope. I don't speak Ex Cathedra! ID: 1230097 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1230235 - Posted: 10 May 2012, 21:43:44 UTC - in response to Message 1230136. Who stole all APs? Or, who stole the AP splitters? It may be that since AP_v505 is now down to 8 still out in the field, new tapes will be held until the last of those 505's comes in and they can do that DB kick thing (which may have already been done since "awaiting validation" isn't 10k+ anymore). Or.. it could just be that there were so many tapes loaded up in the first place that we're now at that point where we have to sit around and wait for the MB splitters to catch up to get some new tapes loaded for AP to split. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1230235 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 1230303 - Posted: 10 May 2012, 23:27:22 UTC - in response to Message 1230235. Who stole all APs? Or, who stole the AP splitters? It may be that since AP_v505 is now down to 8 still out in the field, new tapes will be held until the last of those 505's comes in and they can do that DB kick thing (which may have already been done since "awaiting validation" isn't 10k+ anymore). Or.. it could just be that there were so many tapes loaded up in the first place that we're now at that point where we have to sit around and wait for the MB splitters to catch up to get some new tapes loaded for AP to split. I am down to 18 from 65 a couple of days ago. ID: 1230303 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1230426 - Posted: 11 May 2012, 6:05:02 UTC Well this is just starting to be almost slightly irritating. Because of the adjustments to the estimates, I ended up with like a 22-day AP-only cache and therefore, my average turnaround time was in the high teens. The result of this was that most of my wingmates were waiting for me, so I ended up with nearly every reported result being validated immediately. But since there hasn't been new work going out and my cache is now down in the ~8-day range, I'm starting to pick up more and more pendings when I report. Oh well. That's the way it goes. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1230426 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1230437 - Posted: 11 May 2012, 6:27:44 UTC I still reckon something's not quite right. I'm not getting as many "Project has no tasks available" messages as i was, but i'm still getting more than i usually do even when network traffic is maxed out. Given how (relatively) low the traffic has been i would expect to get hardly any, if any, such messages when requesting work. Grant Darwin NT ID: 1230437 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1230442 - Posted: 11 May 2012, 6:33:22 UTC - in response to Message 1230437. I still reckon something's not quite right. I'm not getting as many "Project has no tasks available" messages as i was, but i'm still getting more than i usually do even when network traffic is maxed out. Given how (relatively) low the traffic has been i would expect to get hardly any, if any, such messages when requesting work. Well, the tasks are going out, because we're now over 5.5 million out in the field. I don't know how big that figure can be before the database starts slowing down... ID: 1230442 ·

red-ray Send message Joined: 24 Jun 99 Posts: 308 Credit: 9,029,848 RAC: 0	Message 1230458 - Posted: 11 May 2012, 8:24:49 UTC - in response to Message 1230437. Last modified: 11 May 2012, 8:51:52 UTC I still reckon something's not quite right. I'm not getting as many "Project has no tasks available" messages as i was, but i'm still getting more than i usually do even when network traffic is maxed out. Given how (relatively) low the traffic has been i would expect to get hardly any, if any, such messages when requesting work. Now there are no limits I expect many hosts are asking for and getting the entire of the feeder buffer. Getting WUs is going to be a problem 'till all the caches are full. I feel it would help a lot if the feeder could have a bigger buffer. I am puzzled as to why the Result average turnaround is dropping though. ID: 1230458 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1230538 - Posted: 11 May 2012, 14:42:06 UTC - in response to Message 1230458. I am puzzled as to why the Result average turnaround is dropping though. Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer. Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1230538 ·

LadyL Volunteer tester Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0	Message 1230557 - Posted: 11 May 2012, 15:47:14 UTC - in response to Message 1230538. Last modified: 11 May 2012, 15:54:30 UTC I am puzzled as to why the Result average turnaround is dropping though. Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer. that's why Ray finds it puzzling that turnaround time is DEcreasing ;) Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat. Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set. In standard running reporting is done when the client asks for new work, the user hits update, in V7 if a scheduled 'no net' will be soon and latest 24 hours after an upload. [I'm probably forgetting something] Edit: For a complete list see this post Thanks Highlander for digging that out and to Ageless for posting it :) If the last hadn't passed yet, but your client didn't think it needed new work (it didn't ask on the update) then it didn't see a reason to report yet. Why it wasn't requesting - lots of reasons. Debt, cache, stuck uploads, stuck downloads, suspended task(s) can all block requesting. I can see an AP task in the list - that's bound to be grossly overestimated, so CPU cache is probably looking full to BOINC. GPU - on a hunch, could be messed up DCF from the APR capping lift, so task estimates are far too long also making it look like cache is bigger than it actually is. I'm not the Pope. I don't speak Ex Cathedra! ID: 1230557 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1230559 - Posted: 11 May 2012, 15:52:35 UTC I suspect the turnaround time is dropping due to many hosts going into EDF as their caches increase in size and Boinc tries to adjust, thus bringing them to the front of the cache and returning them sooner than they would be otherwise. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1230559 ·

David S Volunteer tester Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12	Message 1230574 - Posted: 11 May 2012, 16:25:17 UTC - in response to Message 1230557. I am puzzled as to why the Result average turnaround is dropping though. Seems simple to me. Hosts are building up bigger caches, which means the turnaround from when a task is sent to when it is returned is getting longer. that's why Ray finds it puzzling that turnaround time is DEcreasing ;) Duh! I just read it wrong, I guess. Something else, though: a little while ago, I noticed my i7 hadn't contacted the scheduler since 1430 local yesterday. I remoted in and told it to Update, and it reported 140 completed tasks (and didn't request any more !!??). I just read the new thread about reporting immediately, but I don't think I've ever seen it hold onto so many before unless there was a server problem. If this is happening to everybody for some strange reason, it would show up in that stat. Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set. In standard running reporting is done when the client asks for new work, the user hits update, in V7 if a scheduled 'no net' will be soon and latest 24 hours after an upload. [I'm probably forgetting something] Edit: For a complete list see this post Thanks Highlander for digging that out and to Ageless for posting it :) If the last hadn't passed yet, but your client didn't think it needed new work (it didn't ask on the update) then it didn't see a reason to report yet. Why it wasn't requesting - lots of reasons. Debt, cache, stuck uploads, stuck downloads, suspended task(s) can all block requesting. I can see an AP task in the list - that's bound to be grossly overestimated, so CPU cache is probably looking full to BOINC. GPU - on a hunch, could be messed up DCF from the APR capping lift, so task estimates are far too long also making it look like cache is bigger than it actually is. Thanks. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. ID: 1230574 ·

Misfit Volunteer tester Send message Joined: 21 Jun 01 Posts: 21804 Credit: 2,815,091 RAC: 0	Message 1230691 - Posted: 11 May 2012, 22:51:36 UTC Last modified: 11 May 2012, 22:51:54 UTC So what's the worst that could happen? What could possibly go wrong? me@rescam.org ID: 1230691 ·

Slavac Volunteer tester Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0	Message 1230696 - Posted: 11 May 2012, 23:05:04 UTC - in response to Message 1230691. Hehehe. Things go good for a while and it has people nervous. I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds. Everything's green on my end. Executive Director GPU Users Group Inc. - brad@gpuug.org ID: 1230696 ·

Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0	Message 1230697 - Posted: 11 May 2012, 23:07:43 UTC - in response to Message 1230696. Hehehe. Things go good for a while and it has people nervous. I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds. Everything's green on my end. I'm just about at 10k in progress on 3 rigs, including 4200 on the twin 590. Looking good to me.<g> ID: 1230697 ·

.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 55,390,408 RAC: 69	Message 1230763 - Posted: 12 May 2012, 2:30:20 UTC - in response to Message 1230557. Immediate report is a feature of V7 Boinc, when NNT (No New Tasks) is set. I am shure that started with 6:12:33 or somewhere near that version, of all the changes that have been made that is one of them that i noticed and like . . . ID: 1230763 ·

zoom3+1=4 Volunteer tester Send message Joined: 30 Nov 03 Posts: 65752 Credit: 55,293,173 RAC: 49	Message 1230770 - Posted: 12 May 2012, 3:27:27 UTC - in response to Message 1230697. Hehehe. Things go good for a while and it has people nervous. I'm up to a cache of ~4300 and get new tasks about every 2-3 requests. Most of the requests end up with 50ish tasks being downloaded at a time which takes seconds. Everything's green on my end. I'm just about at 10k in progress on 3 rigs, including 4200 on the twin 590. Looking good to me.<g> I've got 1067 on the 590 here waiting to be worked on, I have 1227 pending of course, but then I'm testing testing out some modified 590 firmware, so far so good, the 1598 seems to like it, no driver crashes seen today since the fan went to 100% and the volts is now at 0.950v(since about 7am today), I'm using the 2.20 final version of MSI Afterburner, I like it better than EVGA's version, the EVGA has the looks, but the functionality is in the MSI, oh well, the softwares free at least. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's ID: 1230770 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34772 Credit: 261,360,520 RAC: 489	Message 1230806 - Posted: 12 May 2012, 4:55:50 UTC - in response to Message 1230770. I'm pretty happy here now as my 3 rigs arn't hitting up the servers every 5 mins or so and I havn't seen that annoying limit message for days, that alone should take a fair bit of strain off the connection (until next month at least when I'm planning a few little upgrades). Cheers. ID: 1230806 ·

Terror Australis Volunteer tester Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44	Message 1230942 - Posted: 12 May 2012, 12:39:49 UTC Personally I think the system works better with the limits in place. Units are turned around quickly and are able to be cleared from the data base in good time.The system load should actually decrease because even though it has to handle more contacts each contact only involves the uploading and downloading of 10 or so units at a time. The excessive number of units out in the field just slows the system down. My RAC has dropped nearly 15,000 in 24 hours. Having it drop that quickly is the equivalent of turning 2 machines off cold. With FIVE new servers in place in the past 15 months most of the hardware issues have been cured so does anyone really need to keep a 10 day cache except for bragging rights. ID: 1230942 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.