Message boards :
Number crunching :
Panic Mode On (108) Server Problems?
Message board moderation
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 29 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Since the issue is with the SETI@home science database and not the master or replica databases I'm not sure how how the deadlines would even be relevant. The reason for the serverside limits was due to the load on master database and it falling over or just plain getting bogged down. Grant Darwin NT |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Since the issue is with the SETI@home science database and not the master or replica databases I'm not sure how how the deadlines would even be relevant. That is correct. Either the MySQL database software or the hardware the BOINC master db system is using couldn't handle scanning a table of 11,000,000+ tasks several hundred, or thousand?, times a second. Which, again, is in fact not the science database that runs informix . SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
Can somebody please give me a link to the official message saying it will be down until next year? I haven't been able to find anything. |
Dr.Diesel Send message Joined: 14 May 99 Posts: 41 Credit: 123,695,755 RAC: 139 |
Which, again, is in fact not the science database that runs informix . Is this along with each's systems OS and primary DB documented anywhere? |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Which, again, is in fact not the science database that runs informix . You can check the Server Status for most of that information. https://setiathome.berkeley.edu/show_server_status.php SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
There has been no message, so people are making it up as they feel like it. I'm voting for next week. Grant Darwin NT |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
Thank you Grant for the clarification. Thanks to Eric & the team for working on this issue |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Which, again, is in fact not the science database that runs informix . Uhh, where on the SSP is the information requested? I see no mention of the OS or database software each server is running? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
My very first message on this subject stated:Since the issue is with the SETI@home science database and not the master or replica databases I'm not sure how how the deadlines would even be relevant. ...speeding up the turnaround would also lessen the storage requirements of the master and replica databases, an issue that seems to rear its head quite often.And I tried in a subsequent message to reiterate that distinction: If, by "main" database, you mean the science database, that's true, and it's the science database that crashed this week. But the master and replica databases are the ones impacted by the active task and WU volume, and also the ones most often running into space-related problems.Why does the science database keep getting dragged in? |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Or maybe we set it so there is a minimum deadline of 1 week, to allow for Seti server issues as well as local issues, so for any system that returns work within 1 week or less- that will be the deadline. For systems that take over 1 week average turnaround time, their deadline is their Average turnaround time + 1 week.I don't know that it even needs to be cut that close. Heck, even a three or four week cushion (for the longer-running tasks) would effect a significant improvement, I think. People do go on vacation, or off on business trips, or shut down their machines for awhile for other reasons. It would be nice if, before they did so, they drained their queues, but that tends not to happen and the system should allow sufficient latitude for that. Heck, I had a significant unplanned outage across the board last February, when a major storm knocked out the electricity in my area for five and a half days. So, the turnaround average on my main crunchers took a big hit. :^) Another thing that I think would help, but perhaps just to a small degree, would be a way for conscientious users to abandon or abort tasks via the web site. I mention this because it seems we periodically see questions and/or apologies in the forum from users whose systems have irrevocably died, or met with some extreme reconfiguration, while still having lots of tasks in the queue. There's currently no alternative than to simply let those tasks time out. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Uhh, where on the SSP is the information requested? I see no mention of the OS or database software each server is running?I don't know about the OS, but the DB software is mentioned in those brief DB descriptions at the very beginning of the Glossary: "mysql" for the master and "informix" for the science DBs. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Uhh, where on the SSP is the information requested? I see no mention of the OS or database software each server is running?I don't know about the OS, but the DB software is mentioned in those brief DB descriptions at the very beginning of the Glossary: "mysql" for the master and "informix" for the science DBs. Ahh, thanks for pointing that out to me Jeff. I was looking for that information in the Hosts table describing the hardware. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Or maybe we set it so there is a minimum deadline of 1 week, to allow for Seti server issues as well as local issues, so for any system that returns work within 1 week or less- that will be the deadline. For systems that take over 1 week average turnaround time, their deadline is their Average turnaround time + 1 week. Even the longest running tasks generally don't take that long to run as such, so more time for longer tasks isn't really necessary; it'd be better just to have a larger minimum return time to allow for problems. My C2D is presently doing it's last CPU tasks, taking about 3hrs 15min for Arecibo VLARs (when it's also processing GPU work those can take up to 7 hours). The main issue for long return times isn't so much how long it takes to actually process a WU, but how long it takes to process a WU for systems that aren't on 24/7, or have rather silly (IMHO) settings for BOINC (eg Suspend when non-BOINC CPU usage is above 25%), combined with running multiple projects. So to allow for system outages (both cruncher and Seti) you could go with the present minimum of 2 weeks, or even bump it up to 3 weeks, but still base it on the application's Average turnaround time. For applications with 3 weeks or less turnaround time, then make the deadline 3 weeks. For those that take longer than 3 weeks, the deadline time should be their application turnaround time + 3 weeks. Basically all work has a 3 week safety margin. This should significantly reduce the time it takes to get a validated result, but still not impact on the slower systems ability to do Seti work, as well as reduce the load on the main database (which we know isn't the issue this time around but has been the case several times in the past). Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
All very sensible suggestions today about how to achieve slimmer databases and shorten the deadlines. I saw a post that as long as Eric was happy with the deadlines, all was good. My opinion is that the "grrrr -grumble" factor of being matched up with slow wingmen and a large pending count would dramatically go down if the deadlines were shortened. But we aren't here to make setizens happy ... are we? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
betreger Send message Joined: 29 Jun 99 Posts: 11408 Credit: 29,581,041 RAC: 66 |
But we aren't here to make setizens happy ... are we? NO! We are here to find ETI. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
One thing I think I've noticed about the Android devices, also, is that many of them are sent tasks that fail every time, simply because the devices don't support the assigned app. I've seen some that have never returned an error-free task, even after many months of crunching, yet their owners don't seem to notice.Just to put a little something behind this statement I made earlier, here are three such machines that are/were wingmen on some of my currently assigned tasks: http://setiathome.berkeley.edu/show_host_detail.php?hostid=8262256 http://setiathome.berkeley.edu/show_host_detail.php?hostid=8365042 http://setiathome.berkeley.edu/show_host_detail.php?hostid=8369269 They each have ZERO total credits, despite faithfully cycling through tasks pretty much on a daily basis. The first listed host has been at it for nearly 7 months. (I've seen them with over a year of such futility, but don't happen to have any on my wing at the moment.) If Eric is truly so concerned about embracing low-volume crunchers such as these, he needs to see to it that the scheduler only sends them tasks they can actually process. Having a 14 Jan 2018 deadline is no help to a device that has no hope of ever returning a valid task. I'm sure these folks sincerely believe they're making a worthwhile contribution to the project but, through no fault of their own (as far as I know), they're not. |
Darth Beaver Send message Joined: 20 Aug 99 Posts: 6728 Credit: 21,443,075 RAC: 3 |
If you really wish to help , while the project is down do another project or start mining crypto and then donate it to the project . If enough people do it for the next 3 weeks there could be up to $400,000 made and with that sort of cash they will be able to fix the problems and maybe set up a separate server for all the very slow machines without causing a problem with the main project maybe |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Other than the AP splitters, Server Status has gone green, and 1 of my systems just picked up a dozen WUs. Grant Darwin NT |
Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50 |
Other than the AP splitters, Server Status has gone green, and 1 of my systems just picked up a dozen WUs. Yes, the guys at Berkeley must have worked late..53 WUs have landed here. Not rejoicing quite yet, but things are looking up :) ...Ghia... Humans may rule the world...but bacteria run it... |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Obviously, the guys in the lab were burning the midnight oil. I just restarted machines and one of them picked up 87 tasks right away. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.