Message boards :
Number crunching :
Panic Mode On (110) Server Problems?
Message board moderation
Previous · 1 . . . 31 · 32 · 33 · 34 · 35 · 36 · 37 · Next
Author | Message |
---|---|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
If your CPU caches are full before your GPU caches, you're probably asking for too much work. . . Hi Richard, . . I can't say that I follow that line of thought ... Stephen ? ? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
No this is the old scheduler issue from a couple of Xmas' past that still affects some Hosts. Most definitely mine. I am very familiar with the symptoms and the fixes necessary. It is a separate issue from the schedulers only issuing Arecibo non-VLAR work from the RTS buffer to Nvidia hosts. The scheduler is just hung up at the moment and we will have to wait for staff to fix things in the morning. They do but never post what their fix is and which I am most interested in. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Well, I just opened up my three micro-managed crunchers for their breakfast, and all three got exactly what they requested at the first attempt. Up to their 'max tasks in progress', and I don't imagine the staff intervened at all on a Sunday evening. |
rob smith Send message Joined: 7 Mar 03 Posts: 22540 Credit: 416,307,556 RAC: 380 |
It's the "your worrying too much" flag being set to give you something to worry about. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Bernie Vine Send message Joined: 26 May 99 Posts: 9958 Credit: 103,452,613 RAC: 328 |
It's odd, but I don't think I have ever seen the problem being discussed. Even when I had a couple of problem crunchers and was checking several times a day. Now I have "retired" all the old machines, I probably check a couple of times a day. I always tend to check before I go to bed and first thing in the morning and I always see full caches, and currently as one machine is new and the older is now a full time cruncher, I see rising RAC as well!! Both machines have reasonable GPU's and the new one has a better CPU as well. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The problem has been discussed ad nauseum here in Number Crunching. I don't know how you could say you had never seen it discussed. The problem goes back to Xmas 2016 when the staff attempted to fix the schedulers for an issue with ATI cards not getting work. It didn't work and they ended up releasing a different application. But the result of their actions was to cause some Hosts with Nvidia cards to not get work and just get no work is available messages along with no AP tasks are available when there is plenty of available work in the RTS buffer. The cause is the setting in the project preferences for accept additional work for other applications if no work is available for selected applications. If you have that set along with AP work, you get the no work is available messages at each work request. Until you toggle off that setting and/or the AP setting and do a project Update to change the preferences you get squat and your caches plummet to zero. You might have to do the Triple Update trick to wake up the schedulers too. Lots of others are affected and have commented on the issue many times, Grant being one of them. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Here you go. I just set this Host for AP: yes, SETI@home v8: no & If no work for selected applications is available, accept work from other applications? = yes Up until now it's been receiving work, let's see how it works now, https://setiathome.berkeley.edu/results.php?hostid=7769537 It's been a while since I tried it...so, anything can happen. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Task Work unit Sent 6496227687 2905174336 19 Mar 2018, 16:31:43 UTC 6496227760 2905174285 19 Mar 2018, 16:31:43 UTCBoth MB, NVidia GPU |
Bernie Vine Send message Joined: 26 May 99 Posts: 9958 Credit: 103,452,613 RAC: 328 |
I don't know how you could say you had never seen it discussed Ah I think a little misunderstanding here, let me rephrase. I have never experienced the problem currently being discussed. Is that clearer. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I actually have seen very little problems from the ATI fallout lately. I have not changed my Yes, Yes, No other work settings in ages. The extent of my kicks are to simply suspend 1 task, wait for the 5 minute timer to expire, enable task, do a manual update. But that doesn't help when the problem yesterday was from the server cache being stuffed full of Arecibo VLARS. I didn't run out of tasks, but came very close to it. It's just luck of the draw to get the none VLARS when they become available. |
Iona Send message Joined: 12 Jul 07 Posts: 790 Credit: 22,438,118 RAC: 0 |
I don't think that is what Bernie meant or tried to say, Keith. What I am pretty sure he is saying is, he has not had (or, seen) the problem being discussed. A slightly different wording perhaps, but the same meaning. Lucky Bernie! Edit. No way this took several minutes to write, either! Don't take life too seriously, as you'll never come out of it alive! |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Sorry about the misunderstanding. I am sensitive to the problem because I am greatly affected. Not everyone with Nvidia is affected. Just some. We have discussed at length and compared settings and hardware to try and understand why some hosts are affected and others not. No insight has been discovered yet. As I stated in my message, I am very familiar with the symptoms and have a bag of tricks to get the schedulers to acknowledge a task deficit and refill the caches. The one that Brent mentioned one of them and being a variation on the 'ghost task recovery" protocol. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I actually have seen very little problems from the ATI fallout lately. I have not changed my Yes, Yes, No other work settings in ages. Yes, the luck of the draw when the RTS buffer has nothing but Arecibo VLAR's to send. My luck is crap as usual. I came very close to running out multiple times with only a trickle of 1 task delivered for every 15 returned. Eventually the mass of Arecibo VLAR's cleared out and the usual mix of Arecibo non-VLAR's and BLC tasks started to return. Would be nice if they removed the old no Arecibo VLAR's sent to Nvidia cards restriction. Or give a new setting in Preferences to allow that to occur. So there must have been some sort of event in the splitters where nothing was coming out of the BLC tapes and only Arecibo was filling the RTS buffers to have created the pocket of Arecibo VLAR's in the first place. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
So there must have been some sort of event in the splitters where nothing was coming out of the BLC tapes and only Arecibo was filling the RTS buffers to have created the pocket of Arecibo VLAR's in the first place.I think that we were running pretty close to high-water mark for RTS yesterday (*). I think the rule for high-water mark can be summed up as something like "finish whatever you're doing, then go take a break." So the next question is, when they need to come back onstream, who gets first dibs? Consciously or unconsciously, somebody comes first. It might be the index on a database table, in which case Arecibo was here first and is likely to have the low numbers. Or maybe A(recibo) comes before B(reakthrough) in the alphabet. I think any 'event' affecting the breakthrough splitters is likely to be nothing more suspicious than they weren't needed, and hence didn't get the 'back to work, lads' message until the last Arecibo tape was finished. Remember Eric made extra Arecibo splitters available when he couldn't get data from Green Bank quickly enough, so they could probably keep up with demand with no RTS shortfall to fill on top. * Confirmed by Haveland. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I have seen quite a few times where the BLC splitters won't start back up when the RTS is full, UNTIL the script runs for timed out tasks, then they fire back up again. Yesterday at the end of the vlar floor, I got lucky and received ~150 resends right before the BLC splitters came back to life. I have seen this often that one of my computers will get resends right before BLCs restart. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I don't know how you could say you had never seen it discussed . . Yep, I think it is, I was confused before as well :) Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I actually have seen very little problems from the ATI fallout lately. I have not changed my Yes, Yes, No other work settings in ages. . . It seems there is a block of Blc01 tapes that are somehow stalled in the splitters, but the more recently mounted Bloc02 tapes are splitting nicely and I am currently seeing only Blc02 WUs. Maybe someone needs to give those splitters working on the Blc01 tapes a bit of a kick too :) . . As to removing the bar on sending Arecibo VLARs to Nvidia cards, that could be a bit of a disaster, there are still a lot of older cards out there on rigs running the older CUDA apps. An option to accept them on a rig with Nvidia cards would be useful for people running more modern hardware and apps like SoG and special sauce variants. These can cope quite well with those VLAR tasks even if they are a little slow. Stephen . . Stephen ? ? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
If there's enough 'ready to send' (which there is), it really doesn't matter which tape is currently active. Ever since Breakthrough Listen came online, it's seemed that the newest tapes are active first. Maybe that's an active decision to do the new stuff quickly, maybe it's a side effect of some other selection process. No matter. They all get done in the end. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
If there's enough 'ready to send' (which there is), it really doesn't matter which tape is currently active. Ever since Breakthrough Listen came online, it's seemed that the newest tapes are active first. Maybe that's an active decision to do the new stuff quickly, maybe it's a side effect of some other selection process. No matter. They all get done in the end. . . It'll be right on the night you reckon? Stephen :) |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
See that six new 100GB tapes were loaded this morning. I wonder if these are the new "standard" size? Can anyone remember if the number of channels per tape is still the same? Sizing of the tapes have doubled but the number of tasks is still the same I think because there is still only 128 channels |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.