Panic Mode On (116) Server Problems?

Message boards : Number crunching : Panic Mode On (116) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 47 · Next

AuthorMessage
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 1987736 - Posted: 29 Mar 2019, 11:17:17 UTC

I also just get a blank page when trying to view the server status page. Also, my recent reported completed tasks are not showing up on my task page. I reported 26 completed tasks at around 1100 UTC, but they don't show. Credit shows an increase, just not showing the tastks on the task page. Weird.
ID: 1987736 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1987748 - Posted: 29 Mar 2019, 13:54:07 UTC

still no stats page, so we are flying blind. The bits of info listed in the posts here point to a problem with Carolyn. The good news is that files are still getting downloaded the RTS isn't empty.
ID: 1987748 · Report as offensive
Profile Someone537833 Project Donor
Volunteer tester

Send message
Joined: 3 Dec 17
Posts: 21
Credit: 38,445,632
RAC: 113
Canada
Message 1987751 - Posted: 29 Mar 2019, 14:40:32 UTC

Hello.

Having the servers down seems to be a fairly common occurrence. Has there ever been an attempt at using boinc distributed computing or other methods to alleviate pressure from the servers?

For example, I have unlimited data on a 75Mb/s connection and could host files. My ISP has up to a 300Mb/s connection that I could upgrade to.

Other ideas are downloading those 50GB files needed to be split, my computer splits it, then uploads the split files?

Another is simply hosting storage, with encryption keys confirming validity, and uploading the data to the SETI servers when needed.

I'm not familiar with the process of actually splitting the files, and storage, but maybe there are other ideas?

If I'm out of line, please forgive me. Just looking to help.

Thank you.
ID: 1987751 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1987752 - Posted: 29 Mar 2019, 14:45:23 UTC - in response to Message 1987748.  
Last modified: 29 Mar 2019, 14:46:18 UTC

still no stats page, so we are flying blind. The bits of info listed in the posts here point to a problem with Carolyn. The good news is that files are still getting downloaded the RTS isn't empty.

You can not prove that by me, I can no longer get to the Server page, only a blank page here.
Referring to the RTS.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1987752 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1987758 - Posted: 29 Mar 2019, 15:23:23 UTC

I've got one of the slowest machines... up until now I haven't been bothered by the most recent download errors. I did just get the transient http error this time (only downloading one file ). There is still a connection issue, but the RTS still has WUs . I've been getting 28mr19ad recently.
ID: 1987758 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1987763 - Posted: 29 Mar 2019, 15:36:22 UTC - in response to Message 1987758.  

Even with my workarounds from last night, woke up with all machines in various lengths of backoffs. Up to 4 1/2 hours. Down only a couple hundred tasks or so on each machine. So the backoffs hadn't been too long overnight or the caches would have been down much more.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1987763 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1987765 - Posted: 29 Mar 2019, 15:38:48 UTC - in response to Message 1987751.  


If I'm out of line, please forgive me. Just looking to help.


Your not out of line. Part of the issue is enough money to do a system redesign. Basically, that portion of the Seti@Home project would need a complete redesign to make it more modular to allow what you are proposing.

It is possible that if we could get a redesign that provided true client/server relationships with us the clients and Berkley a remotely accessible database, then it might very well be possible to redistribute a lot of the back-end data-analysis not just the middle end data crunching we do.

The explanation, that I think I understand, for the back-end data-analysis seems to require access to "all the data" to do that analysis. So if the re-design included a true client/server type relationship, it would be possible for us, out in the field, to contribute to the analysis because the Database would be accessible via the Internet rather than a "local" data center/super computer.

I am guessing that this would take a multi-million dollar project to do such a redesign and then do the programming.

Since Seti@Home has been a shoe-string project ever since the NSF grant(s) expired and it started shedding staff......
I kinda doubt the possibility.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1987765 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1987772 - Posted: 29 Mar 2019, 15:57:57 UTC - in response to Message 1987765.  
Last modified: 29 Mar 2019, 15:58:34 UTC


If I'm out of line, please forgive me. Just looking to help.
... Basically, that portion of the Seti@Home project would need a complete redesign to make it more modular to allow what you are proposing.... Tom
Correct me i"m wrong, but this wouldn't be a SETI thing anyway.
It would be a BOINC thing, as the architecture is universal across all BOINC projects. The SETI aspect of it is mainly the apps that deal with the MB, GBT and AP (i.e. project-specific) data.

Look at any other BOINC project out there, and you'll see the same server architecture, doing the same functions and often even with the same or similar page names within the site. Any redesign there would be a DA thing, as far as I can see, and I've never seen much discussion about massive redesigns in the BOINC forums.

Having said that, SETI is probably the only BOINC project large enough to stress an otherwise well-thought out architecture, so any drive for change would logically begin here.
ID: 1987772 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1987777 - Posted: 29 Mar 2019, 16:07:32 UTC

Server status page is back.
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1987777 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1987778 - Posted: 29 Mar 2019, 16:08:08 UTC

status page is back. My personal status page is also starting to catch up too, but I think it will take some time to process it all and get it current.
ID: 1987778 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1987779 - Posted: 29 Mar 2019, 16:09:06 UTC - in response to Message 1987772.  


If I'm out of line, please forgive me. Just looking to help.
... Basically, that portion of the Seti@Home project would need a complete redesign to make it more modular to allow what you are proposing.... Tom
Correct me i"m wrong, but this wouldn't be a SETI thing anyway.
It would be a BOINC thing, as the architecture is universal across all BOINC projects. The SETI aspect of it is mainly the apps that deal with the MB, GBT and AP (i.e. project-specific) data.

Look at any other BOINC project out there, and you'll see the same server architecture, doing the same functions and often even with the same or similar page names within the site. Any redesign there would be a DA thing, as far as I can see, and I've never seen much discussion about massive redesigns in the BOINC forums.

Having said that, SETI is probably the only BOINC project large enough to stress an otherwise well-thought out architecture, so any drive for change would logically begin here.


Hmmmm..... Most of the projects don't have regular maintenance on Tuesday. I thought that was a "size of database" issue? And what your telling me is Seti has "too much data" :) Sigh.
A proud member of the OFA (Old Farts Association).
ID: 1987779 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1987782 - Posted: 29 Mar 2019, 16:11:43 UTC - in response to Message 1987778.  
Last modified: 29 Mar 2019, 16:12:38 UTC

status page is back. My personal status page is also starting to catch up too, but I think it will take some time to process it all and get it current.
Just in the nick of time. RTS 29,472, after a 10-hour hiatus (replica behind 37,026 seconds).
ID: 1987782 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1987790 - Posted: 29 Mar 2019, 16:33:34 UTC - in response to Message 1987779.  

Having said that, SETI is probably the only BOINC project large enough to stress an otherwise well-thought out architecture, so any drive for change would logically begin here.


Hmmmm..... Most of the projects don't have regular maintenance on Tuesday. I thought that was a "size of database" issue? And what your telling me is Seti has "too much data" :) Sigh.
Whenever things get troublesome, seems like the first thing you see on the SSP is database trying to catch up. Seems I recall some discussion from Eric and others that SAH needs exceed the ability of their current database capabilities, but that outside of porting to a completely different (and expensive) database solution anything else that gets done tends to be a band-aid.
All I know is that when you look at other projects, active workunit counts tend to be in the hundreds and thousands, not the multi-millions that are seen here. Victim of our own success, perhaps.
ID: 1987790 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1987806 - Posted: 29 Mar 2019, 18:12:51 UTC

now getting "Project has no tasks available"

no panic, just the latest observation
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1987806 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1987820 - Posted: 29 Mar 2019, 19:11:50 UTC

looks like only one splitter is running + whatever resends .

hoping they will restart the splitters full steam once carolyn catches up.
ID: 1987820 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1987830 - Posted: 29 Mar 2019, 20:06:37 UTC
Last modified: 29 Mar 2019, 20:08:53 UTC

Regarding the continuing data handling problems with SETI:
Has anyone considered the possibility of splitting the massive databases in half? Each half would then be placed on separate servers. The split might work well if it were done chronologically, say, one set containing data from 1999-2009, and the other half containing data from 2010-present. This would greatly reduce the load on each set of databases and their servers. It would, of course, be necessary to modify some existing software that requires access to both sets. However, some software might need no changes; the software could simply be run twice, one run per database set. Results from the two runs can then be combined.

I understand that this database and project software redesign would require some time and effort, but it might well be worth doing, as it is becoming increasingly apparent that the existing software, database, and hardware are approaching their absolute limits of functionality. Should something not be done, SETI may soon be unable to continue, simply because it has been so very successful!

Perhaps some of the truly top-notch IT professionals in our vast community would be willing to help Eric and the others at Berkeley offload some of the burden of this large undertaking. As we in IT know, a team effort makes otherwise overwhelming tasks doable.

I know many of us care enough to offer whatever assistance we can.

I think it would be good to toss this around and see if it might be a solution to SETI's ongoing difficulties.
ID: 1987830 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1987841 - Posted: 29 Mar 2019, 21:22:54 UTC - in response to Message 1987666.  

It's getting late and I still have download problems. The Hammer deal works for me, it's about all that does work at the moment. The biggest problem is the Mac with 5 GPUs and a 500 WU cache, it can't seem to make it 5 minutes without stalling a download. This morning it was Out of work with a cache full of stalled downloads, can't leave it more than a few hours or it stops working.


. . Yes this d/l problem means the rigs need constant babysitting here too. I find if you catch the issue when it first starts (just one or two stalled d/ls) they restart more easily and the others tend not to stall once the data is flowing.

Stephen

. . :(
ID: 1987841 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1987844 - Posted: 29 Mar 2019, 21:28:12 UTC - in response to Message 1987691.  

Nice Keith. My systems have been pretty hands off for me all day. Once I changed it back to max xfers 2, I pretty much didn’t have to touch it.

Now I wonder what’s going on with the stagnant RAC. Prior to the outage on Tuesday, my RAC was steadily climbing. I took the hit from the outage and the beast running out of work. But expected RAC to recover after a day or two like it usually does. But still RAC has been stagnant for several days now.


. . My RAC's on all machine have been steadily diving since the start of the download problem. Like yourself I expected it to dive for a day or two then recover, but it is a steadily downhill graph from that time.

Stephen

:(
ID: 1987844 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1987847 - Posted: 29 Mar 2019, 21:33:13 UTC - in response to Message 1987721.  

And the Haveland graphs are starved for data as well.
Yes, the data sources went dark at - it seems - exactly 05:00 UTC.

But in the two hours before that, the replica database started to fall behind and the MB result creation rate fell to near zero. Not looking good.


. . I have been getting 'no tasks available' for the last 5 hours plus.

Stephen

? ?
ID: 1987847 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1987854 - Posted: 29 Mar 2019, 21:50:39 UTC

Sent a message to Eric.

Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1987854 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 47 · Next

Message boards : Number crunching : Panic Mode On (116) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.