Panic Mode On (108) Server Problems?

Message boards : Number crunching : Panic Mode On (108) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 20 · 21 · 22 · 23 · 24 · 25 · 26 . . . 32 · Next

AuthorMessage
Profile Jeff Buck
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1433
Credit: 146,932,558
RAC: 193,269
United States
Message 1904294 - Posted: 2 Dec 2017, 2:09:44 UTC - in response to Message 1904284.  

Uhh, where on the SSP is the information requested? I see no mention of the OS or database software each server is running?
I don't know about the OS, but the DB software is mentioned in those brief DB descriptions at the very beginning of the Glossary: "mysql" for the master and "informix" for the science DBs.
ID: 1904294 · Report as offensive
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 3051
Credit: 203,094,642
RAC: 286,327
United States
Message 1904298 - Posted: 2 Dec 2017, 2:25:07 UTC - in response to Message 1904294.  

Uhh, where on the SSP is the information requested? I see no mention of the OS or database software each server is running?
I don't know about the OS, but the DB software is mentioned in those brief DB descriptions at the very beginning of the Glossary: "mysql" for the master and "informix" for the science DBs.

Ahh, thanks for pointing that out to me Jeff. I was looking for that information in the Hosts table describing the hardware.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1904298 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9171
Credit: 118,616,085
RAC: 50,639
Australia
Message 1904301 - Posted: 2 Dec 2017, 2:43:52 UTC - in response to Message 1904292.  
Last modified: 2 Dec 2017, 2:45:05 UTC

Or maybe we set it so there is a minimum deadline of 1 week, to allow for Seti server issues as well as local issues, so for any system that returns work within 1 week or less- that will be the deadline. For systems that take over 1 week average turnaround time, their deadline is their Average turnaround time + 1 week.
So for a system that takes 8 days, their deadlines will be 8+7= 15 days. 23 days to return work their deadlines will be 30 days, etc.

I don't know that it even needs to be cut that close. Heck, even a three or four week cushion (for the longer-running tasks) would effect a significant improvement, I think.

Even the longest running tasks generally don't take that long to run as such, so more time for longer tasks isn't really necessary; it'd be better just to have a larger minimum return time to allow for problems.

My C2D is presently doing it's last CPU tasks, taking about 3hrs 15min for Arecibo VLARs (when it's also processing GPU work those can take up to 7 hours).
The main issue for long return times isn't so much how long it takes to actually process a WU, but how long it takes to process a WU for systems that aren't on 24/7, or have rather silly (IMHO) settings for BOINC (eg Suspend when non-BOINC CPU usage is above 25%), combined with running multiple projects.
So to allow for system outages (both cruncher and Seti) you could go with the present minimum of 2 weeks, or even bump it up to 3 weeks, but still base it on the application's Average turnaround time. For applications with 3 weeks or less turnaround time, then make the deadline 3 weeks. For those that take longer than 3 weeks, the deadline time should be their application turnaround time + 3 weeks.
Basically all work has a 3 week safety margin.

This should significantly reduce the time it takes to get a validated result, but still not impact on the slower systems ability to do Seti work, as well as reduce the load on the main database (which we know isn't the issue this time around but has been the case several times in the past).
Grant
Darwin NT
ID: 1904301 · Report as offensive
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 3051
Credit: 203,094,642
RAC: 286,327
United States
Message 1904305 - Posted: 2 Dec 2017, 3:00:01 UTC

All very sensible suggestions today about how to achieve slimmer databases and shorten the deadlines. I saw a post that as long as Eric was happy with the deadlines, all was good. My opinion is that the "grrrr -grumble" factor of being matched up with slow wingmen and a large pending count would dramatically go down if the deadlines were shortened.

But we aren't here to make setizens happy ... are we?
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1904305 · Report as offensive
Profile betregerProject Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 6748
Credit: 16,373,082
RAC: 18,353
United States
Message 1904308 - Posted: 2 Dec 2017, 3:34:01 UTC - in response to Message 1904305.  
Last modified: 2 Dec 2017, 3:36:50 UTC

But we aren't here to make setizens happy ... are we?

NO! We are here to find ETI.
ID: 1904308 · Report as offensive
Profile Jeff Buck
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1433
Credit: 146,932,558
RAC: 193,269
United States
Message 1904313 - Posted: 2 Dec 2017, 4:19:30 UTC - in response to Message 1904240.  

One thing I think I've noticed about the Android devices, also, is that many of them are sent tasks that fail every time, simply because the devices don't support the assigned app. I've seen some that have never returned an error-free task, even after many months of crunching, yet their owners don't seem to notice.
Just to put a little something behind this statement I made earlier, here are three such machines that are/were wingmen on some of my currently assigned tasks:

http://setiathome.berkeley.edu/show_host_detail.php?hostid=8262256
http://setiathome.berkeley.edu/show_host_detail.php?hostid=8365042
http://setiathome.berkeley.edu/show_host_detail.php?hostid=8369269

They each have ZERO total credits, despite faithfully cycling through tasks pretty much on a daily basis. The first listed host has been at it for nearly 7 months. (I've seen them with over a year of such futility, but don't happen to have any on my wing at the moment.)

If Eric is truly so concerned about embracing low-volume crunchers such as these, he needs to see to it that the scheduler only sends them tasks they can actually process. Having a 14 Jan 2018 deadline is no help to a device that has no hope of ever returning a valid task. I'm sure these folks sincerely believe they're making a worthwhile contribution to the project but, through no fault of their own (as far as I know), they're not.
ID: 1904313 · Report as offensive
Darth Beaver
Avatar

Send message
Joined: 20 Aug 99
Posts: 6675
Credit: 20,532,454
RAC: 16,123
Australia
Message 1904316 - Posted: 2 Dec 2017, 4:37:06 UTC

If you really wish to help , while the project is down do another project or start mining crypto and then donate it to the project .

If enough people do it for the next 3 weeks there could be up to $400,000 made and with that sort of cash they will be able to fix the problems and maybe set up a separate server for all the very slow machines without causing a problem with the main project maybe
ID: 1904316 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 9171
Credit: 118,616,085
RAC: 50,639
Australia
Message 1904333 - Posted: 2 Dec 2017, 7:39:44 UTC

Other than the AP splitters, Server Status has gone green, and 1 of my systems just picked up a dozen WUs.
Grant
Darwin NT
ID: 1904333 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 105
Credit: 10,051,843
RAC: 20,714
Norway
Message 1904345 - Posted: 2 Dec 2017, 8:24:11 UTC - in response to Message 1904333.  

Other than the AP splitters, Server Status has gone green, and 1 of my systems just picked up a dozen WUs.

Yes, the guys at Berkeley must have worked late..53 WUs have landed here. Not rejoicing quite yet, but things are looking up :)


...Ghia...
Humans may rule the world...but bacteria run it...
ID: 1904345 · Report as offensive
Profile Keith Myers
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 3051
Credit: 203,094,642
RAC: 286,327
United States
Message 1904347 - Posted: 2 Dec 2017, 8:41:27 UTC - in response to Message 1904333.  

Obviously, the guys in the lab were burning the midnight oil. I just restarted machines and one of them picked up 87 tasks right away.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1904347 · Report as offensive
Profile David@home
Volunteer tester
Avatar

Send message
Joined: 16 Jan 03
Posts: 730
Credit: 3,806,913
RAC: 10,347
United Kingdom
Message 1904348 - Posted: 2 Dec 2017, 8:45:06 UTC

Great news that the SETI team managed to solve the database problem.

Had to do a manual update as BOINC manager was in a 4 hour deep sleep. Picked up GPU work only but hopefully will pick up CPU work as the splitters catch up with demand.

Wonder who the lucky ones were that got a cache of all those Astropulse redos that were building up, I missed out on those.
ID: 1904348 · Report as offensive
Tutankhamon
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6871
Credit: 43,231,559
RAC: 14,173
Sweden
Message 1904351 - Posted: 2 Dec 2017, 9:19:54 UTC

Back too early. I think I'll wait until Wednesday to start requesting new work.
ID: 1904351 · Report as offensive
Profile Sid
Volunteer tester

Send message
Joined: 12 Jun 07
Posts: 16
Credit: 9,244,953
RAC: 13,572
United States
Message 1904352 - Posted: 2 Dec 2017, 9:48:15 UTC
Last modified: 2 Dec 2017, 9:52:45 UTC

167 on both Linux and Windoze machines. . . .looks like we're back.
ID: 1904352 · Report as offensive
kittyman
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 49256
Credit: 888,730,956
RAC: 157,694
United States
Message 1904357 - Posted: 2 Dec 2017, 10:09:07 UTC

Well. Meowmeowmeow.
Middle of a Friday night and Seti comes back online?
Kudos to whoever was working on things this late!!
Thankyouthankyouthankyou!
Meow!
A kitty keeps loneliness away.
More meowing, less hissing. I speak meow, do you?

Have made friends in this life.
Most were cats.
ID: 1904357 · Report as offensive
Tutankhamon
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6871
Credit: 43,231,559
RAC: 14,173
Sweden
Message 1904358 - Posted: 2 Dec 2017, 10:13:35 UTC

But, as expected, there's trouble in paradise.
SSP stopped updating: [As of 2 Dec 2017, 9:40:04 UTC]

And what usually follows after that.....well, you all know that :-(
ID: 1904358 · Report as offensive
kittyman
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 49256
Credit: 888,730,956
RAC: 157,694
United States
Message 1904360 - Posted: 2 Dec 2017, 10:15:44 UTC - in response to Message 1904358.  
Last modified: 2 Dec 2017, 10:16:17 UTC

But, as expected, there's trouble in paradise.
SSP stopped updating: [As of 2 Dec 2017, 9:40:04 UTC]

And what usually follows after that.....well, you all know that :-(

Well, I am hoping the success was not that short lived, and the SSP snag is just due to the heavy load things must be under.
Kitties are hopeful.
Meow.
A kitty keeps loneliness away.
More meowing, less hissing. I speak meow, do you?

Have made friends in this life.
Most were cats.
ID: 1904360 · Report as offensive
Tutankhamon
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6871
Credit: 43,231,559
RAC: 14,173
Sweden
Message 1904361 - Posted: 2 Dec 2017, 10:17:55 UTC - in response to Message 1904360.  
Last modified: 2 Dec 2017, 10:18:11 UTC

But, as expected, there's trouble in paradise.
SSP stopped updating: [As of 2 Dec 2017, 9:40:04 UTC]

And what usually follows after that.....well, you all know that :-(

Well, I am hoping the success was not that short lived, and the SSP snag is just due to the heavy load things must be under.
Kitties are hopeful.
Meow.

The SSP just updated. Thanks Dog for that :-)
ID: 1904361 · Report as offensive
kittyman
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 49256
Credit: 888,730,956
RAC: 157,694
United States
Message 1904362 - Posted: 2 Dec 2017, 10:19:49 UTC - in response to Message 1904361.  

But, as expected, there's trouble in paradise.
SSP stopped updating: [As of 2 Dec 2017, 9:40:04 UTC]

And what usually follows after that.....well, you all know that :-(

Well, I am hoping the success was not that short lived, and the SSP snag is just due to the heavy load things must be under.
Kitties are hopeful.
Meow.

The SSP just updated. Thanks Dog for that :-)

I prefer to thank the kitties.
Meow!
A kitty keeps loneliness away.
More meowing, less hissing. I speak meow, do you?

Have made friends in this life.
Most were cats.
ID: 1904362 · Report as offensive
Profile JimbocousProject Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1030
Credit: 95,289,645
RAC: 78,473
United States
Message 1904363 - Posted: 2 Dec 2017, 10:25:39 UTC

All nice, full caches on all machines. Definitely came roaring back ...
ID: 1904363 · Report as offensive
kittyman
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 49256
Credit: 888,730,956
RAC: 157,694
United States
Message 1904366 - Posted: 2 Dec 2017, 10:33:11 UTC - in response to Message 1904363.  
Last modified: 2 Dec 2017, 10:34:57 UTC

All nice, full caches on all machines. Definitely came roaring back ...

Then you got lucky and hit the servers just after they came back up.
Now it's going to be like the server lottery trying to get work with all the hungry computers to feed.
And THAT depends on things staying glued together under the heavy load.
Meowpatience.
A kitty keeps loneliness away.
More meowing, less hissing. I speak meow, do you?

Have made friends in this life.
Most were cats.
ID: 1904366 · Report as offensive
Previous · 1 . . . 20 · 21 · 22 · 23 · 24 · 25 · 26 . . . 32 · Next

Message boards : Number crunching : Panic Mode On (108) Server Problems?


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.