Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (117)
Message board moderation
Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 52 · Next
Author | Message |
---|---|
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
I've set myself to no new task for a while, until the situation gets better. I hope everyone who needs WUs can get them. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
I've set myself to no new task for a while, until the situation gets better.Just done that for my Windows system, it's got more than usual to chew on. Just hoping the Linux system can start picking up some work more frequently before it runs out of work again. Edit- too late, it's out of GPU work again. It's odd how one system gets work almost every time, the other almost never. Grant Darwin NT |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
At what point does the results out in the field begin to be an issue??I'm assuming the db will get too large and the system will crash or slow to a crawl. It is late in California (Friday 10pm ish), so hopefully tomorrow someone can look at the issue. I'll just add the disclaimer that while no one guarantees us WUs, and I'm certainly not demanding they come in on the weekend and fix things, I'm sure they want to keep the project up. |
Wiggo Send message Joined: 24 Jan 00 Posts: 36876 Credit: 261,360,520 RAC: 489 |
Well I just set my 3570K rig to no new tasks with both rigs having over 600 tasks each in progress now. :-OI've set myself to no new task for a while, until the situation gets better. I hope everyone who needs WUs can get them.I did that with my 2500K rig when it hit 750 in progress. Cheers. |
Cherokee150 Send message Joined: 11 Nov 99 Posts: 192 Credit: 58,513,758 RAC: 74 |
One of my computers now has 94% more tasks than it is supposed to. My other one is now 122% over limit. That's 16 hours of non-stop processing for one, and 45 hours for the other one! Of course, I have now set them to no new tasks. Much more important, however, is that I think someone really should contact the staff right away. The spliters are now creating as many as 80 tasks per second, and there are over 5.8 million units in the field. If we remember back to when they put the limits on, it was because the number of units out in the field and being returned overloaded the system. SETI was literally choking on units! It caused them a lot of grief and took a long time to clear things out. So, I reiterate, I think that whoever can reach the staff should contact them as soon as possible. Does anyone else concur? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
All hosts are down on gpu work because of the scheduler sending 0 tasks upon request.Yeah, my Linux host is still struggling to get work. Anywhere from 4-8 requests to get any and then sometimes it's 45, other times only 2. Grant Darwin NT |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Well I just set my 3570K rig to no new tasks with both rigs having over 600 tasks each in progress now. :-OI've set myself to no new task for a while, until the situation gets better. I hope everyone who needs WUs can get them.I did that with my 2500K rig when it hit 750 in progress. . . Well, with the system WU allocation limiters out on strike what I am seeing is that I am being assigned the full allocation I am requesting, that is, half a days work as set in the work fetch settings. So I am sure if anyone wishes to limit the size of their cache, and this is recommended, then try reducing the size of your work fetch to a value that is close to the limits that should be there. Stephen :) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Now I have another host with more than 100 cpu tasks. But the host that has been empty for hours of cpu work never gets any if requested. And then sets a 1400 second backoff timer. Strange.I reduced my cache setting so my Windows system could get some CPU tasks, But still no joy with the Linux system- the fact is for every time my Linux system gets work, my Windows system gets it 3-5 times more often. I'll probably have to reduced the cache setting even further so the Linux system can get some CPU work, then gradually bump it up till I can get 24hours worth (or as close to it as the new limits will allow)- the problem is the ridiculous number of requests it takes to actually get some work... Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Sat 07 Dec 2019 12:57:51 AM PST | SETI@home | Project requested delay of 303 secondsThat's a server backoff. Sat 07 Dec 2019 12:57:51 AM PST | SETI@home | [work_fetch] backing off CPU 1400 secand that's a client backoff. Normal, and unrelated to the servers troubles. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
It kinda doesn't make sense. My newer, faster Linux PC should have WAY more WUs than my older one then. It's got less than half what the older one has.If the system isn't meeting the requests for work, then that will happen regardless of what your cache settings & the server limits may be. My systems are a good example- the slower Windows system gets work on pretty much every other request. The faster Linux system on every 4-10 requests. End result- Windows system has a full cache, Linux system can't get close to a full cache and is about to run out of CPU work. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13855 Credit: 208,696,464 RAC: 304 |
Results-out-in-the-field over 6 million, still no sign of the Ready-to-send buffer refilling. Grant Darwin NT |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Well, I'm down below 800 tasks on my 2 x GPU machine, and I'm back to 'no work available', with no mention of a limit. Looks like "Zalster's Theorem" is right, but I won't know for certain until I get back from lunch. If that is right then we reach a new limit for the cache size, 64 x 400 = 25600 WU!!! LOL Now we are ready for the next generation of GPU's That rises 2 questions: - What is the new limit for the CPU? 400 WU too? - If you run a >100 thread CPU it will run a WU on each thread? |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Based on Cliff's info, it appears that the limits are now 200CPU + 400*[nGPU] Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Based on Cliff's info, it appears that the limits are now 200CPU + 400*[nGPU] OK Thanks. To test i will change my spoofed count to 24 that will give a 9600 GPU + 200 CPU WU size... Below the 10000 limit... just in case... One point is to be observed on the next outage. What is the impact of a new large number of hosts with such big caches could done in the DB. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
So do I - I returned from lunch to find exactly 800 GPU tasks on the 2 x GPU machine - I don't run SETI CPU tasks on that machine, which makes it easier.Well, I'm down below 800 tasks on my 2 x GPU machine, and I'm back to 'no work available', with no mention of a limit. Looks like "Zalster's Theorem" is right, but I won't know for certain until I get back from lunch.I think I concur with the 4x GPU limit now. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I doubt the project DB's has been sufficiently upgraded/fixed, to be able to cope with this in the long run.I don't think it's quite as bad as that, but I do agree that we should be proceeding at caution, and keeping a careful eye on the database. We had a similar - but much worse - situation in November 2013. That one was worse because comms delays caused a huge number of ghost workunits to be created: they existed in the database, but were not downloaded, so they didn't inhibit clients from requesting more. That was also a weekend, and by the time I wrote (with some trepidation) to Eric, there were over 10.5 million tasks supposedly 'out in the field'. That stat comes from message 1302186: from memory, it took about a week to recover from the mess. That mess-up arose from an ill-judged intervention by David Anderson, and so far David is the only admin to have participated in captainiom / JSM's thread at GitHub. I've posted a consolidated concern which will be notified to both David and Eric, to say "Please keep an eye on the effects of this". Not a nice thing to have to do, again, at a weekend. |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
no official answers, but it looks like the group has figured out new limits. I'll have to think about where I want to set my own limits. In the last 9 hours someone on the seti end added more files to be split, so someone has been keeping an eye on things. The results out in the field is over 6 million and it is still splitting fine (at a high rate). It still can't keep up, as it has a large backlog of "holes" to fill. I usually set my cache at 10 days, because I knew I wanted the max allowed, and I figured it would never reach that. Is there a daily limit?? Will this allow machines with high error rates to go wild?? edit - we are now splitting the blc14s that they pulled a full days ago, so this could add to the issues. |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
I've posted a consolidated concern which will be notified to both David and Eric, to say "Please keep an eye on the effects of this". Not a nice thing to have to do, again, at a weekend. Thank you Richard. |
Cliff Harding Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 |
I looked at my old log entries and and for me eastern time, it seems to have started at - 06-Dec-2019 23:55:00 [SETI@home] Sending scheduler request: To report completed tasks. 06-Dec-2019 23:55:00 [SETI@home] Reporting 2 completed tasks 06-Dec-2019 23:55:00 [SETI@home] Requesting new tasks for CPU and NVIDIA GPU 06-Dec-2019 23:55:02 [SETI@home] Scheduler request completed: got 66 new tasks I don't buy computers, I build them!! |
Bernie Vine Send message Joined: 26 May 99 Posts: 9958 Credit: 103,452,613 RAC: 328 |
Not a nice thing to have to do, again, at a weekend. At my last company, no new or changed software was ever rolled out on a Friday, Wednesday was preferred, as being the farthest point from a weekend. Monday, people recovering from the weekend, Tuesday getting up to speed, Thursday thinking about the weekend, Friday, winding down. ;-) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.