Panic Mode On (108) Server Problems?

Message boards : Number crunching : Panic Mode On (108) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 29 · Next

AuthorMessage
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1904212 - Posted: 1 Dec 2017, 21:10:53 UTC
Last modified: 1 Dec 2017, 21:16:56 UTC

It has been suggested a number of times, by a number of Setizens, that the project's task deadlines are much too long, compared with other projects. That perhaps shortening the deadlines could speed up the turnaround time for the tasks that inevitably time out due to lapsed or drive-by users, who fail to clear their caches before dropping out of the project, as well as the "ghost" tasks that we see all the time. That speeding up the turnaround would also lessen the storage requirements of the master and replica databases, an issue that seems to rear its head quite often.

The shutdown of the scheduling server has given us an opportunity to get an idea as to how much of an impact these deadline timeouts actually have. At the present time, over 25,000 tasks have accumulated in the RTS buffer, all of them likely tasks that have passed their deadlines since the scheduling server was shut down less than 24 hours ago. That's out of 2,453,080 tasks the SSP shows as being "out in the field". So, in less than one day, more than 1% of the outstanding tasks have timed out. Since every one of those tasks has also been causing one (or more) of that WU's wingman's tasks to be held in the database in a pending status, that represents 2% of the outstanding tasks that have been in limbo for the full duration of the original deadline.

Now, I don't know what percentage of the master database is occupied by task data, versus workunit data, account data, host data, and so on, but it seems to me that if 2% of the storage space occupied by the task data could be freed up by just shortening the deadlines by a single day, it's something that should be worth exploring. Does that number scale? Could shortening the deadlines by 5 days reduce the requirements by 10%, or making them 10 days shorter cut 20%? I don't know, but having actual numbers like this to look at should certainly be helpful in making such an assessment. (Of course, the project admins could likely have always extracted such numbers at any time they wanted, with just a simple database query.)

Food for thought.....I hope. :^)

EDIT: The actual total number of tasks ("results") in the database should include the 3,193,665 "returned and awaiting validation", which is where the wingmen's pending tasks would be counted. So, my stated percentages are too high, but still significant, I think.
ID: 1904212 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1904214 - Posted: 1 Dec 2017, 21:24:00 UTC
Last modified: 1 Dec 2017, 21:24:23 UTC

On the same subject of database size. When data is rerun on a newer app do they purge the original result? If not that's a lot of useless baggage.
ID: 1904214 · Report as offensive
moomin
Avatar

Send message
Joined: 21 Oct 17
Posts: 6204
Credit: 38,420
RAC: 0
Sweden
Message 1904216 - Posted: 1 Dec 2017, 21:32:20 UTC

What?
SETI now doesn't accept my already crunched files ready to report.
https://setiathome.berkeley.edu/results.php?userid=10596805
ID: 1904216 · Report as offensive
Profile Advent42
Avatar

Send message
Joined: 23 Mar 17
Posts: 175
Credit: 4,015,683
RAC: 0
Ireland
Message 1904218 - Posted: 1 Dec 2017, 21:36:40 UTC - in response to Message 1904216.  

Have just returned from a leave........so no tasks whatsoever......
Down until the new year...
Too bad....oh well....back to scanning the skies with my telescope...:-)
ID: 1904218 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1904225 - Posted: 1 Dec 2017, 21:57:25 UTC - in response to Message 1904212.  

I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1904225 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1904227 - Posted: 1 Dec 2017, 22:00:50 UTC - in response to Message 1904225.  

I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline.

As I understand it, the reason there has been no adjustment is because Eric does not wish to disenfranchise anybody from participating in this project.

And that would include folks with very meager hardware resources. Not everybody can afford what some of us are able to.

That is why.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1904227 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1904229 - Posted: 1 Dec 2017, 22:07:57 UTC - in response to Message 1904212.  

Jeff, I agree that the deadlines are a way too long for tasks. It just makes no sense to me why with a 10 max cache setting, the deadlines are as much as 8 weeks. But your numbers are a way out. What is being flagged as a resend by timing out is part of the normal 4.5M tasks that are normally in the field, not the 2.5 that are left, and yes pendings should be included too. AP tasks usually have shorter deadline around 25 days, MB much longer. I think a reasonable deadline would be 20 days, 30 at most. Not 8 weeks.

But yes, it would help a small amount for the database size.
ID: 1904229 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22184
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1904230 - Posted: 1 Dec 2017, 22:08:57 UTC

Further to Mark's comment.
SETI has a far lower timout ratio than most other projects because they have deadlines that are demonstrably too short. Why send out 800 hours worth of tasks to a computer that will only complete 600 hours of processing before the dealine is reached.

Actually long deadlines do not impact on the size of the main database but on the size of one of the intermediate databases.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1904230 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1904233 - Posted: 1 Dec 2017, 22:12:56 UTC - in response to Message 1904227.  

I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline.

As I understand it, the reason there has been no adjustment is because Eric does not wish to disenfranchise anybody from participating in this project.

And that would include folks with very meager hardware resources.

Such as phones and low end to mid range tablets.
Take those and pre-Core 2 Duo architectures out of the picture and you could reduce the deadlines by at least 2/3.
Grant
Darwin NT
ID: 1904233 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1904235 - Posted: 1 Dec 2017, 22:15:11 UTC - in response to Message 1904233.  

I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline.

As I understand it, the reason there has been no adjustment is because Eric does not wish to disenfranchise anybody from participating in this project.

And that would include folks with very meager hardware resources.

Such as phones and low end to mid range tablets.
Take those and pre-Core 2 Duo architectures out of the picture and you could reduce the deadlines by at least 2/3.

Well, you could suggest that to Eric, but don't expect him to do anything remotely like that.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1904235 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1904237 - Posted: 1 Dec 2017, 22:19:27 UTC - in response to Message 1904235.  

Take those and pre-Core 2 Duo architectures out of the picture and you could reduce the deadlines by at least 2/3.

Well, you could suggest that to Eric, but don't expect him to do anything remotely like that.[/quote]
Since he went to the effort to develop applications that would run on phones I don't think it's very likely either.
But if he really wants to increase the amount of work being processed, and not have the servers collapse under the increased load, it's something he's going to have to give serious consideration to in the not too distant future.
Grant
Darwin NT
ID: 1904237 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1904240 - Posted: 1 Dec 2017, 22:25:51 UTC - in response to Message 1904229.  

Jeff, I agree that the deadlines are a way too long for tasks. It just makes no sense to me why with a 10 max cache setting, the deadlines are as much as 8 weeks. But your numbers are a way out. What is being flagged as a resend by timing out is part of the normal 4.5M tasks that are normally in the field, not the 2.5 that are left, and yes pendings should be included too. AP tasks usually have shorter deadline around 25 days, MB much longer. I think a reasonable deadline would be 20 days, 30 at most. Not 8 weeks.

But yes, it would help a small amount for the database size.
Ah, good point about the 4.5M. I had overlooked just how much that and, probably, the "awaiting validation" had shrunk even before the scheduler was shut down. But the 25K per day (actually now up to 27K and probably will add another K or two before we even hit the 24 hour mark) should still be a valid number, even if the percentage is smaller.

I don't know where the happy medium would be, but 8 weeks is definitely excessive, even for Android devices I would think. Even cutting it back by a week on the longer deadlines (and by a corresponding percentage on the shorties) should ease the database load. I also wonder why deadlines couldn't be based on a host's turnaround time, with a shorter deadline for the faster hardware. Of course, that would mean that any given WU would likely have different deadlines for each task but, so what?

One thing I think I've noticed about the Android devices, also, is that many of them are sent tasks that fail every time, simply because the devices don't support the assigned app. I've seen some that have never returned an error-free task, even after many months of crunching, yet their owners don't seem to notice.
ID: 1904240 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1904242 - Posted: 1 Dec 2017, 22:31:27 UTC - in response to Message 1904230.  

Further to Mark's comment.
SETI has a far lower timout ratio than most other projects because they have deadlines that are demonstrably too short. Why send out 800 hours worth of tasks to a computer that will only complete 600 hours of processing before the dealine is reached.

Actually long deadlines do not impact on the size of the main database but on the size of one of the intermediate databases.
What are those timeout ratios? I've never seen them anywhere, which is why this buildup of timeout resends seemed like a good time to bring up the topic.

If, by "main" database, you mean the science database, that's true, and it's the science database that crashed this week. But the master and replica databases are the ones impacted by the active task and WU volume, and also the ones most often running into space-related problems.
ID: 1904242 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1904244 - Posted: 1 Dec 2017, 22:39:16 UTC

Don't forget it isn't just the raw crunching time you need to consider for deadlines - it's also all the dead time when the computer is switched off or in use. And fir Android, when it's away from the charger.
ID: 1904244 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1904246 - Posted: 1 Dec 2017, 22:45:34 UTC - in response to Message 1904227.  

I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline.

As I understand it, the reason there has been no adjustment is because Eric does not wish to disenfranchise anybody from participating in this project.

And that would include folks with very meager hardware resources. Not everybody can afford what some of us are able to.

That is why.

But if my 5 year old phone can easily finish tasks within a month, and Android devices must be the weakest hardware in use, is it reasonable to send it tasks that have a deadline 8 weeks out from when it was sent.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1904246 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1904248 - Posted: 1 Dec 2017, 22:53:21 UTC - in response to Message 1904244.  

Don't forget it isn't just the raw crunching time you need to consider for deadlines - it's also all the dead time when the computer is switched off or in use. And fir Android, when it's away from the charger.
Agreed. That's why I mentioned turnaround time as a possible yardstick (or meterstick, if you prefer). I'm sure there are some who just run S@h as it was originally designed, as a pretty screensaver, for short bursts here and there. But assigning deadlines to everyone based on those "lowest common denominators" seems like a horrible waste.
ID: 1904248 · Report as offensive
Profile Dr.Diesel Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 14 May 99
Posts: 41
Credit: 123,695,755
RAC: 139
United States
Message 1904255 - Posted: 1 Dec 2017, 23:29:31 UTC - in response to Message 1904101.  

Eric mentioned Informix, which I suspect is IBM's Informix DB, probably waiting on a new build from them. Or at minimum awaiting an undocumented config switch as a temp work-a-round. There was some backend working happening last night, perhaps they gave it a go and ran into further issues.

It is that and last I heard we are on an older version of it.
I would hope it is a well know issue they ran into and time, because of the large size of the db, is the only factor.


Having worked with IBM in the past, I further feel for Eric's sanity.
ID: 1904255 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1904260 - Posted: 1 Dec 2017, 23:46:02 UTC - in response to Message 1904227.  

I too have wondered why SETI has stuck with the extremely long deadlines I assess were implemented for the original hardware used on the project. That kind of hardware is 18 years in the past and does not need to continue to be supported. I agree with you Jeff, I would expect the sizes of databases and the strain they put on the project would be greatly lessened if the deadlines were reduced by a month, lets say from the current 2 month deadline.

As I understand it, the reason there has been no adjustment is because Eric does not wish to disenfranchise anybody from participating in this project.

And that would include folks with very meager hardware resources. Not everybody can afford what some of us are able to.

That is why.

Since the issue is with the SETI@home science database and not the master or replica databases I'm not sure how how the deadlines would even be relevant.
Given the data is stored in the science database after it has been validated.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1904260 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1904266 - Posted: 1 Dec 2017, 23:51:57 UTC - in response to Message 1904248.  

Don't forget it isn't just the raw crunching time you need to consider for deadlines - it's also all the dead time when the computer is switched off or in use. And fir Android, when it's away from the charger.
Agreed. That's why I mentioned turnaround time as a possible yardstick (or meterstick, if you prefer). I'm sure there are some who just run S@h as it was originally designed, as a pretty screensaver, for short bursts here and there. But assigning deadlines to everyone based on those "lowest common denominators" seems like a horrible waste.
If a computer has a much faster recorded average turnround already, it won't be pushing deadlines. 100 tasks per device, with 7 week deadlines, equates to about 12 hours per task.

The other thing that people in this conversation often forget is that many users share their available resources across multiple projects. That increases deadline pressure too - especially for the 'other' projects with shorter deadlines. And the whole deadline question only applies to the BOINC transaction database - which has been running pretty smoothly, apart from the disk failure recently (and that might have been in the file storage array, not the database). Either way, nothing to do with the Informix science database which is the source of today's malaise.

If Eric is happy with the current deadlines, then I'm satisfied by that.
ID: 1904266 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1904268 - Posted: 2 Dec 2017, 0:06:03 UTC - in response to Message 1904248.  
Last modified: 2 Dec 2017, 0:09:32 UTC

But assigning deadlines to everyone based on those "lowest common denominators" seems like a horrible waste.

I guess it depends on how it is implemented.
Keep the present per WU deadlines (2-8 weeks depending on expected run time).
However when a WU is allocated to a system the actual deadline for that allocated WU is based on the hosts actual Average turnaround time for each application

That appears to vary from 1.5hrs to over 3 weeks (the longest I've seen).

So for a monster host (such as Petri's) that would be a 1.5hr deadline for GPU work (CPU deadlines based on CPU application turnaround times). After that, the WU would be re-issued to another host.
But what if something happens to one of those faster hosts? Here in Darwin the power going out for up to 6 hours at a time isn't that unusual. It would be a bit rough for all of those WUs to error out when they finally get power back just because of a short deadline that they can normally meet.
And what if the Seti servers go down in a screaming heap such has occurred this time? The servers declare all of their cache has missed the deadline before they are able to report the work they have completed?

Do they need to make an addition to the server code that adds the server down time + 1 to 12 hours (depending on how long he servers were down for) to all the WU deadlines and amend those deadlines before the Servers allow any Scheduler requests so people's work isn't marked as "No response by deadline date"? A lot of effort for little gain IMHO.

Or maybe we set it so there is a minimum deadline of 1 week, to allow for Seti server issues as well as local issues, so for any system that returns work within 1 week or less- that will be the deadline. For systems that take over 1 week average turnaround time, their deadline is their Average turnaround time + 1 week.
So for a system that takes 8 days, their deadlines will be 8+7= 15 days. 23 days to return work their deadlines will be 30 days, etc.
Grant
Darwin NT
ID: 1904268 · Report as offensive
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · 22 · 23 . . . 29 · Next

Message boards : Number crunching : Panic Mode On (108) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.