Message boards :
Number crunching :
Panic Mode On (114) Server Problems?
Message board moderation
Previous · 1 . . . 39 · 40 · 41 · 42 · 43 · 44 · 45 · Next
Author | Message |
---|---|
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
They started to reduce the backlog of wu and result deletions again when they fixed Georgem. Anytime the deleters crank up you can't get any work. Down to a less than a dozen gpu tasks now on one host. results out in the field has dropped to 4.7 million and it is usually closer to 5. I think it is better than the system crashing though, but it is bothersome when machine caches run dry. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Oh well. Machines are running out of work. I just shifted one to Beta. it's nice that BETA doesn't seem to have this problem.....oh wait. Maybe they should look at Why BETA doesn't have this problem? It might be helpful determining Why BETA doesn't have this problem... |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Oh well. Machines are running out of work. I just shifted one to Beta. it's nice that BETA doesn't seem to have this problem.....oh wait. . . May I suggest because Beta handles a mere fraction of the traffic through main? Stephen ? ? |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Panic ... Warning ... INCOMING !!!! |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Splitters are offline now and RTS buffer at 760K. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
Oh well. Machines are running out of work. I just shifted one to Beta. it's nice that BETA doesn't seem to have this problem.....oh wait. Math. What a concept! :) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Oh well. Machines are running out of work. I just shifted one to Beta. it's nice that BETA doesn't seem to have this problem.....oh wait. If you would have thought about it first, you probably wouldn't have posted that. Think about it a little. It just stared working again, did the traffic slow down any to speak of? It's a problem with computers communicating with each other, like one saying 'got any work', the other saying, 'sure'. That is obviously not working, and traffic has nothing to do with it. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Work requests are being recognized and filled again. Panic over. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Oh well. Machines are running out of work. I just shifted one to Beta. it's nice that BETA doesn't seem to have this problem.....oh wait. . . It isn't just about having the number of tasks but moving that data around, both in the local network and across the global network. Handling a tiny amount of the data that has to be shifted in main makes Beta far less prone to congestion issues. But yes, glitches in data transfer between servers are probably at the core of the problem, but comparing it to Beta is not really a balanced comparison. Stephen <shrug> |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The thing hadn't sent work for around 2 HOURS, yet there was too much traffic for the machines to talk to each other? Please... |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes, I think that was the case. When the file deleters kick in, they make a huge I/O contention on the servers that starve other processes out. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Right.... it stops a simple query from one machine to the other, but, other things keep going. Including queries between the same machines from BETA. If you say so.... |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Right.... it stops a simple query from one machine to the other, but, other things keep going. Including queries between the same machines from BETA. . . OK, have you heard of a DOS attack? Do you understand how that works? Same principle here except that there is no hostile intent and the traffic is real not fake. But the outcome is the same. Stephen . . |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Pretty targeted attack wouldn't you say? It Only affects the query from the scheduler and the RTS machine, everything else is unaffected. Ever heard of a typo in the code? It's much more believable, and has happened before. Again, the traffic Stopped for Two Hours and during those Two Hours the machines never communicated on that One query. Other queries were unaffected. Why didn't other events stop if there was too much traffic to communicate? Ever think of that? Oh, and BETA still worked fine during all this alleged traffic, it uses the Same machines BTW. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I never really paid any attention to the hardware on Beta. Same servers as Main EXCEPT for Oscar, Carolyn, Paddym, Georgem, Marvin, Lando and Centurion. So yes the projects do share some of the same servers, but there is double the amount of I/O going on at Main compared to Beta just in the number of interconnections to databases. Not even accounting for the 10X number of users. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Main; scheduling server synergy scheduler process synergy feeder synergy db purge bruno BETA; Scheduler bruno feeder.el6.x86_64 synergy But wait... I thought synergy & bruno were incommunicado. Yet BETA communicated with them just fine. Two Hours on main and the same machines weren't talking because of traffic? Rightttttt |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Pretty targeted attack wouldn't you say? It Only affects the query from the scheduler and the RTS machine, everything else is unaffected. . . I am not convinced on the MORE believable part but it is certainly a possibility. Again it would have to be in a part of the code where it only manifests under some conditions and very specific activity. Either way we are not in a position to even investigate it properly much less do anything about it. Stephen <shrug> |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 |
Project has no tasks available. Out of GPU WU's, Einstein is keeping them warm:-) Kevin |
JohnDK Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127 |
OK I'll do it: PANIC again again... |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
tasks don't seem to be validating either, i'm still sending the work back, but RAC keeps dropping, and pendings increasing. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.