Panic Mode On (108) Server Problems?

Author	Message
Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1904212 - Posted: 1 Dec 2017, 21:10:53 UTC Last modified: 1 Dec 2017, 21:16:56 UTC It has been suggested a number of times, by a number of Setizens, that the project's task deadlines are much too long, compared with other projects. That perhaps shortening the deadlines could speed up the turnaround time for the tasks that inevitably time out due to lapsed or drive-by users, who fail to clear their caches before dropping out of the project, as well as the "ghost" tasks that we see all the time. That speeding up the turnaround would also lessen the storage requirements of the master and replica databases, an issue that seems to rear its head quite often. The shutdown of the scheduling server has given us an opportunity to get an idea as to how much of an impact these deadline timeouts actually have. At the present time, over 25,000 tasks have accumulated in the RTS buffer, all of them likely tasks that have passed their deadlines since the scheduling server was shut down less than 24 hours ago. That's out of 2,453,080 tasks the SSP shows as being "out in the field". So, in less than one day, more than 1% of the outstanding tasks have timed out. Since every one of those tasks has also been causing one (or more) of that WU's wingman's tasks to be held in the database in a pending status, that represents 2% of the outstanding tasks that have been in limbo for the full duration of the original deadline. Now, I don't know what percentage of the master database is occupied by task data, versus workunit data, account data, host data, and so on, but it seems to me that if 2% of the storage space occupied by the task data could be freed up by just shortening the deadlines by a single day, it's something that should be worth exploring. Does that number scale? Could shortening the deadlines by 5 days reduce the requirements by 10%, or making them 10 days shorter cut 20%? I don't know, but having actual numbers like this to look at should certainly be helpful in making such an assessment. (Of course, the project admins could likely have always extracted such numbers at any time they wanted, with just a simple database query.) Food for thought.....I hope. :^) EDIT: The actual total number of tasks ("results") in the database should include the 3,193,665 "returned and awaiting validation", which is where the wingmen's pending tasks would be counted. So, my stated percentages are too high, but still significant, I think. ID: 1904212 ·

betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1904214 - Posted: 1 Dec 2017, 21:24:00 UTC Last modified: 1 Dec 2017, 21:24:23 UTC On the same subject of database size. When data is rerun on a newer app do they purge the original result? If not that's a lot of useless baggage. ID: 1904214 ·

moomin Send message Joined: 21 Oct 17 Posts: 6204 Credit: 38,420 RAC: 0	Message 1904216 - Posted: 1 Dec 2017, 21:32:20 UTC What? SETI now doesn't accept my already crunched files ready to report. https://setiathome.berkeley.edu/results.php?userid=10596805 ID: 1904216 ·

Advent42 Send message Joined: 23 Mar 17 Posts: 175 Credit: 4,015,683 RAC: 0	Message 1904218 - Posted: 1 Dec 2017, 21:36:40 UTC - in response to Message 1904216. Have just returned from a leave........so no tasks whatsoever...... Down until the new year... Too bad....oh well....back to scanning the skies with my telescope...:-) ID: 1904218 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1904225 - Posted: 1 Dec 2017, 21:57:25 UTC - in response to Message 1904212. I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1904225 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1904227 - Posted: 1 Dec 2017, 22:00:50 UTC - in response to Message 1904225. I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline. As I understand it, the reason there has been no adjustment is because Eric does not wish to disenfranchise anybody from participating in this project. And that would include folks with very meager hardware resources. Not everybody can afford what some of us are able to. That is why. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1904227 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1904229 - Posted: 1 Dec 2017, 22:07:57 UTC - in response to Message 1904212. Jeff, I agree that the deadlines are a way too long for tasks. It just makes no sense to me why with a 10 max cache setting, the deadlines are as much as 8 weeks. But your numbers are a way out. What is being flagged as a resend by timing out is part of the normal 4.5M tasks that are normally in the field, not the 2.5 that are left, and yes pendings should be included too. AP tasks usually have shorter deadline around 25 days, MB much longer. I think a reasonable deadline would be 20 days, 30 at most. Not 8 weeks. But yes, it would help a small amount for the database size. ID: 1904229 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22202 Credit: 416,307,556 RAC: 380	Message 1904230 - Posted: 1 Dec 2017, 22:08:57 UTC Further to Mark's comment. SETI has a far lower timout ratio than most other projects because they have deadlines that are demonstrably too short. Why send out 800 hours worth of tasks to a computer that will only complete 600 hours of processing before the dealine is reached. Actually long deadlines do not impact on the size of the main database but on the size of one of the intermediate databases. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1904230 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1904233 - Posted: 1 Dec 2017, 22:12:56 UTC - in response to Message 1904227. I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline. As I understand it, the reason there has been no adjustment is because Eric does not wish to disenfranchise anybody from participating in this project. And that would include folks with very meager hardware resources. Such as phones and low end to mid range tablets. Take those and pre-Core 2 Duo architectures out of the picture and you could reduce the deadlines by at least 2/3. Grant Darwin NT ID: 1904233 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1904235 - Posted: 1 Dec 2017, 22:15:11 UTC - in response to Message 1904233. I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline. As I understand it, the reason there has been no adjustment is because Eric does not wish to disenfranchise anybody from participating in this project. And that would include folks with very meager hardware resources. Such as phones and low end to mid range tablets. Take those and pre-Core 2 Duo architectures out of the picture and you could reduce the deadlines by at least 2/3. Well, you could suggest that to Eric, but don't expect him to do anything remotely like that. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1904235 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1904237 - Posted: 1 Dec 2017, 22:19:27 UTC - in response to Message 1904235. Take those and pre-Core 2 Duo architectures out of the picture and you could reduce the deadlines by at least 2/3. Well, you could suggest that to Eric, but don't expect him to do anything remotely like that.[/quote] Since he went to the effort to develop applications that would run on phones I don't think it's very likely either. But if he really wants to increase the amount of work being processed, and not have the servers collapse under the increased load, it's something he's going to have to give serious consideration to in the not too distant future. Grant Darwin NT ID: 1904237 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1904240 - Posted: 1 Dec 2017, 22:25:51 UTC - in response to Message 1904229. Jeff, I agree that the deadlines are a way too long for tasks. It just makes no sense to me why with a 10 max cache setting, the deadlines are as much as 8 weeks. But your numbers are a way out. What is being flagged as a resend by timing out is part of the normal 4.5M tasks that are normally in the field, not the 2.5 that are left, and yes pendings should be included too. AP tasks usually have shorter deadline around 25 days, MB much longer. I think a reasonable deadline would be 20 days, 30 at most. Not 8 weeks. But yes, it would help a small amount for the database size. Ah, good point about the 4.5M. I had overlooked just how much that and, probably, the "awaiting validation" had shrunk even before the scheduler was shut down. But the 25K per day (actually now up to 27K and probably will add another K or two before we even hit the 24 hour mark) should still be a valid number, even if the percentage is smaller. I don't know where the happy medium would be, but 8 weeks is definitely excessive, even for Android devices I would think. Even cutting it back by a week on the longer deadlines (and by a corresponding percentage on the shorties) should ease the database load. I also wonder why deadlines couldn't be based on a host's turnaround time, with a shorter deadline for the faster hardware. Of course, that would mean that any given WU would likely have different deadlines for each task but, so what? One thing I think I've noticed about the Android devices, also, is that many of them are sent tasks that fail every time, simply because the devices don't support the assigned app. I've seen some that have never returned an error-free task, even after many months of crunching, yet their owners don't seem to notice. ID: 1904240 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1904242 - Posted: 1 Dec 2017, 22:31:27 UTC - in response to Message 1904230. Further to Mark's comment. SETI has a far lower timout ratio than most other projects because they have deadlines that are demonstrably too short. Why send out 800 hours worth of tasks to a computer that will only complete 600 hours of processing before the dealine is reached. Actually long deadlines do not impact on the size of the main database but on the size of one of the intermediate databases. What are those timeout ratios? I've never seen them anywhere, which is why this buildup of timeout resends seemed like a good time to bring up the topic. If, by "main" database, you mean the science database, that's true, and it's the science database that crashed this week. But the master and replica databases are the ones impacted by the active task and WU volume, and also the ones most often running into space-related problems. ID: 1904242 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1904244 - Posted: 1 Dec 2017, 22:39:16 UTC Don't forget it isn't just the raw crunching time you need to consider for deadlines - it's also all the dead time when the computer is switched off or in use. And fir Android, when it's away from the charger. ID: 1904244 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1904246 - Posted: 1 Dec 2017, 22:45:34 UTC - in response to Message 1904227. I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline. As I understand it, the reason there has been no adjustment is because Eric does not wish to disenfranchise anybody from participating in this project. And that would include folks with very meager hardware resources. Not everybody can afford what some of us are able to. That is why. But if my 5 year old phone can easily finish tasks within a month, and Android devices must be the weakest hardware in use, is it reasonable to send it tasks that have a deadline 8 weeks out from when it was sent. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1904246 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1904248 - Posted: 1 Dec 2017, 22:53:21 UTC - in response to Message 1904244. Don't forget it isn't just the raw crunching time you need to consider for deadlines - it's also all the dead time when the computer is switched off or in use. And fir Android, when it's away from the charger. Agreed. That's why I mentioned turnaround time as a possible yardstick (or meterstick, if you prefer). I'm sure there are some who just run S@h as it was originally designed, as a pretty screensaver, for short bursts here and there. But assigning deadlines to everyone based on those "lowest common denominators" seems like a horrible waste. ID: 1904248 ·

Dr.Diesel Send message Joined: 14 May 99 Posts: 41 Credit: 123,695,755 RAC: 139	Message 1904255 - Posted: 1 Dec 2017, 23:29:31 UTC - in response to Message 1904101. Eric mentioned Informix, which I suspect is IBM's Informix DB, probably waiting on a new build from them. Or at minimum awaiting an undocumented config switch as a temp work-a-round. There was some backend working happening last night, perhaps they gave it a go and ran into further issues. It is that and last I heard we are on an older version of it. I would hope it is a well know issue they ran into and time, because of the large size of the db, is the only factor. Having worked with IBM in the past, I further feel for Eric's sanity. ID: 1904255 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1904260 - Posted: 1 Dec 2017, 23:46:02 UTC - in response to Message 1904227. I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline. As I understand it, the reason there has been no adjustment is because Eric does not wish to disenfranchise anybody from participating in this project. And that would include folks with very meager hardware resources. Not everybody can afford what some of us are able to. That is why. Since the issue is with the SETI@home science database and not the master or replica databases I'm not sure how how the deadlines would even be relevant. Given the data is stored in the science database after it has been validated. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1904260 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1904266 - Posted: 1 Dec 2017, 23:51:57 UTC - in response to Message 1904248. Don't forget it isn't just the raw crunching time you need to consider for deadlines - it's also all the dead time when the computer is switched off or in use. And fir Android, when it's away from the charger. Agreed. That's why I mentioned turnaround time as a possible yardstick (or meterstick, if you prefer). I'm sure there are some who just run S@h as it was originally designed, as a pretty screensaver, for short bursts here and there. But assigning deadlines to everyone based on those "lowest common denominators" seems like a horrible waste. If a computer has a much faster recorded average turnround already, it won't be pushing deadlines. 100 tasks per device, with 7 week deadlines, equates to about 12 hours per task. The other thing that people in this conversation often forget is that many users share their available resources across multiple projects. That increases deadline pressure too - especially for the 'other' projects with shorter deadlines. And the whole deadline question only applies to the BOINC transaction database - which has been running pretty smoothly, apart from the disk failure recently (and that might have been in the file storage array, not the database). Either way, nothing to do with the Informix science database which is the source of today's malaise. If Eric is happy with the current deadlines, then I'm satisfied by that. ID: 1904266 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1904268 - Posted: 2 Dec 2017, 0:06:03 UTC - in response to Message 1904248. Last modified: 2 Dec 2017, 0:09:32 UTC But assigning deadlines to everyone based on those "lowest common denominators" seems like a horrible waste. I guess it depends on how it is implemented. Keep the present per WU deadlines (2-8 weeks depending on expected run time). However when a WU is allocated to a system the actual deadline for that allocated WU is based on the hosts actual Average turnaround time for each application That appears to vary from 1.5hrs to over 3 weeks (the longest I've seen). So for a monster host (such as Petri's) that would be a 1.5hr deadline for GPU work (CPU deadlines based on CPU application turnaround times). After that, the WU would be re-issued to another host. But what if something happens to one of those faster hosts? Here in Darwin the power going out for up to 6 hours at a time isn't that unusual. It would be a bit rough for all of those WUs to error out when they finally get power back just because of a short deadline that they can normally meet. And what if the Seti servers go down in a screaming heap such has occurred this time? The servers declare all of their cache has missed the deadline before they are able to report the work they have completed? Do they need to make an addition to the server code that adds the server down time + 1 to 12 hours (depending on how long he servers were down for) to all the WU deadlines and amend those deadlines before the Servers allow any Scheduler requests so people's work isn't marked as "No response by deadline date"? A lot of effort for little gain IMHO. Or maybe we set it so there is a minimum deadline of 1 week, to allow for Seti server issues as well as local issues, so for any system that returns work within 1 week or less- that will be the deadline. For systems that take over 1 week average turnaround time, their deadline is their Average turnaround time + 1 week. So for a system that takes 8 days, their deadlines will be 8+7= 15 days. 23 days to return work their deadlines will be 30 days, etc. Grant Darwin NT ID: 1904268 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.