About Deadlines or Database reduction proposals

Message boards : Number crunching : About Deadlines or Database reduction proposals
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 16 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2034263 - Posted: 27 Feb 2020, 20:12:18 UTC - in response to Message 2034257.  
Last modified: 27 Feb 2020, 20:15:57 UTC

The way you get 14 million waiting is if a large number of tasks in the field are Resends, with multiple Hosts per Work Unit, waiting on a slow Host to run the task. A shorter Work Cache would help in that scenario....maybe a day or so.
ID: 2034263 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2034270 - Posted: 27 Feb 2020, 20:30:36 UTC - in response to Message 2034257.  
Last modified: 27 Feb 2020, 20:48:20 UTC

If you are getting work with a long deadline, isn't that new work?
No, the deadline at this project is fixed by the AR of the data. The time allowed for processing - from issue of your copy of the WU to your deadline date - is the same for all volunteers, whether they are processing one of the initial pair of tasks, or a later resend.

Which reminds me - BOINC project servers have a configuration option for Accelerating retries, which includes sending out the resends with a reduced_delay_bound [aka deadline]. We could use that now.
ID: 2034270 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19407
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2034275 - Posted: 27 Feb 2020, 20:46:45 UTC

Question.
Why is the setting for cache size measured in time, but the restriction on cache size is measured in tasks?

This IMHO seems to go against all logic, a slow host, for whatever restriction, hardware or time crunching, to ask for a ten day cache and get more tasks than it will crunch, but at the same time restricts the top hosts with multiple GPU's, that can crunch a task in ~2 mins on each GPU, to 150 tasks shared across the multiple GPU's.

Surely it would be better for the project to restrict the max cache size on time rather than tasks. So that the fast host might actually get the 12 hour cache required to cover the outrages, and the slow host will not get the 150 tasks it has no hope of completing before deadline.

Another question, which no doubt, the software boffins will say is too difficult to implement now, is:
Why is the data contained on the "Computer Details" screen not used to more effect.
It describes the CPU, so could have been used in the recent AMD problem.
It could also still be in use to solve the Nvidia driver problem, and not issue tasks to drivers on a "banned list"
ID: 2034275 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22539
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2034288 - Posted: 27 Feb 2020, 22:22:22 UTC
Last modified: 27 Feb 2020, 22:28:19 UTC

Cache size is the lesser of the time selected by the user and the limits set by the server.

edit to add:
Your second question - technically quite feasible to do, but probably not done as they were seen as a short term problems, thus not worth changing the server code given the constraints on the project staff (not enough of them)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2034288 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2034361 - Posted: 28 Feb 2020, 4:28:55 UTC - in response to Message 2034226.  
Last modified: 28 Feb 2020, 4:44:45 UTC

Reducing the deadline will result in some number of current slow computers to cease to contribute to SETI (even slow computers process WUs).
Not so.
The slowest of systems can do a WU in 2 days. Even if it's not on all day every day, 1 month is plenty of time for it to return work.
If it's not, then there really is no point in catering to a system that can contribute so little, when catering to it cause problems such as we re having now with the database.
I really don't consider a system that returns 1 WU a month to be contributing, but that's what setting a 28 day deadline will still enable. It will still be able to participate in Seti.
Grant
Darwin NT
ID: 2034361 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2034363 - Posted: 28 Feb 2020, 4:32:28 UTC - in response to Message 2034231.  

Richard's final comment is one that is well worth remembering in these discussions:
neither the project, nor the other cruncher, can speed up the missing volunteer.
But reducing the deadline will (once all the existing outstanding ones have been cleared out, of course).
Grant
Darwin NT
ID: 2034363 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2034364 - Posted: 28 Feb 2020, 4:37:14 UTC - in response to Message 2034233.  

Eventually the inactive caches will timeout, and the WUs will be available for resend. It
has taken many years for the data to arrive on earth - we can wait a few more weeks
to process it.
But the database can't. That's part of the present problem- work is being sent out, but no result returned. So you have to wait for the deadline to be reached for it to be sent to another system to provide a valid result. Oh, that system didn't provide a result either- wait till the deadline expires again & resend it. Or a system provided a result, that didn't Validate, so it's is sent out again (hopefully this time it doesn't timeout & the result returned validates). If not, do it again.
Grant
Darwin NT
ID: 2034364 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2034367 - Posted: 28 Feb 2020, 4:46:21 UTC - in response to Message 2034270.  

Which reminds me - BOINC project servers have a configuration option for Accelerating retries, which includes sending out the resends with a reduced_delay_bound [aka deadline]. We could use that now.
That sounds like a very good suggestion.
Grant
Darwin NT
ID: 2034367 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36865
Credit: 261,360,520
RAC: 489
Australia
Message 2034370 - Posted: 28 Feb 2020, 5:09:00 UTC
Last modified: 28 Feb 2020, 5:12:28 UTC

By Richard Haselgrove
Actually I think the deadlines were set, after the upgrade that produced the large differences in crunch times at different Angle Ranges, based on the work done by Joe Segur.
2007 - postid692684 - Estimates and Deadlines revisited
And back in 2007 you could still use a pre SSE instruction CPU. ;-)
Now remember those numbers were based at a time when some here were still using or had farms of old Socket 486-Super 7 back then.

Also by Richard Haselgrove
We were reminded recently of Estimates and Deadlines revisited from 2008 (just before GPUs were introduced). That link drops you in on the final outcome, but here's a summary of Joe's table of deadlines:
Angle     Deadline (days from issue)
0.001     23.25
0.05      23.25 (VLAR)
0.0501    27.16
0.22548   16.85
0.22549   32.23
0.295     27.76
0.385     24.38
0.41      23.70 (common from Arecibo)
1.12744    7.00 (VHAR)
10         7.00
Since then, we've had two big increases in crunching time, due to increases in search sensitivity, and each has been accompanied by an extension of deadlines. So the table now looks something like:
Angle     Deadline
0.05      52.75 (VLAR)
0.425     53.39 (nearest from Arecibo)
1.12744   20.46 (VHAR)
So, deadlines overall have more than doubled since 2008, without any allowance for the faster average computer available now. I think we could safely halve the current figures, as the simplest adjustment.
Now why have they doubled since then?

As a second thought why is CreditNew so out of wack now?

Is it a problem caused by those old pre-SSE CPU's no longer being in the system these days, but were a good part of the factoring in back then?

I'm sorry, I'm just starting to see a pattern (though it maybe just my imagination), but I thought that I'd just blurt it out anyway. ;-)

Cheers.
ID: 2034370 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2034371 - Posted: 28 Feb 2020, 5:31:23 UTC

No one seems to have an issue with the shortest WUs deadline being 7 days?

On one of my systems that type of WU takes the CPU 30min to process. The longest running WU i've seen on the same system took 1hr 45min (most long running WUs are done in 1hr 30 or so)
That's 3.5 times longer than the shortest.

3.5 * 7 gives us 24.5 days if the deadline was in proportion to that for the shortest of a-non noisy WU. Add on another few days as a buffer, and that gives us 28 days as a maximum deadline.
Inline with what has been proposed.
And as i mentioned in another post, i don't consider a system that returns 1 result in a month to really be contributing anything to the project, but that is what a 28 day deadline will mean as a lower limit of what a system has to be able to process to do work for Seti. As long as it can do 1 WU a month, it can take part in the project.
Grant
Darwin NT
ID: 2034371 · Report as offensive     Reply Quote
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 2034379 - Posted: 28 Feb 2020, 7:07:20 UTC
Last modified: 28 Feb 2020, 7:10:23 UTC

Finally found it ....
A year ago Eric posted a graph of the age of workunits (tasks) residing on the server RAID giving a good snapshot of the client turn around times.

Even those tasks which the graph shows as hanging around for 60 days wouldn't be there if the client had a smaller cache in the first place.

Thanks Richard for those server settings. Now if we could just find someone with some time to try different settings and document the results of each.

When shrinking cache size, it might be best to do it in 3.5d steps every day, so that we don't get that HUGE outbound of resends that Richard noted would happen.
ID: 2034379 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19407
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2034380 - Posted: 28 Feb 2020, 7:26:06 UTC - in response to Message 2034379.  

Thanks for that graph.
A quick look says a 10 day deadline would probably be workable. But if the powers that be want to be cautious then a 25 day deadline would be sufficient.
ID: 2034380 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2034381 - Posted: 28 Feb 2020, 7:36:32 UTC - in response to Message 2034380.  

Thanks for that graph.
A quick look says a 10 day deadline would probably be workable. But if the powers that be want to be cautious then a 25 day deadline would be sufficient.
Even 5 days would be plenty, but the project does want to make it possible for even low powered infrequently used systems to contribute.
And crunchers do have issues beyond their control- floods, fires, storms and power, network, system issues even without those other events occurring. So 28 days gives people plenty of time to deal with problems & return work before it times out & and still get Credit for it.
Grant
Darwin NT
ID: 2034381 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2034390 - Posted: 28 Feb 2020, 11:37:48 UTC - in response to Message 2034226.  
Last modified: 28 Feb 2020, 12:04:55 UTC

Reducing the deadline will result in some number of current slow computers to cease to
contribute to SETI (even slow computers process WUs). Aren't the splitters already
faster than processors? We artificially throttle the splitter processing because we aren't
validating the processed WUs quickly enough.

If the "goal" is a smaller waiting validation queue without regard to the overall number of
WUs processed, then throttling back the processing will do that. The big systems will like
it, but the participation in SETI will decrease.

Sitting in a user's


. . You are ignoring the fact that it is the database size compared to available memory that is part of the cause of the validation process being so slow, and it is the large number of tasks that are "in progress" that is part of that database bloat. Reducing the amount of WUs sitting for long periods in 'in progress' queues will reduce the size of the database hopefully to a size that will fit in memory available and speed up that validation process.

Stephen

. .
ID: 2034390 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22539
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2034394 - Posted: 28 Feb 2020, 12:03:33 UTC

You are ignoring the fact that it is the database size compared to available memory that is causing the validation process to be slow

This statement is wrong - the validators are more than capable of keeping up with all the data thrown at them - it is the rate of return of work by users that limits the validation rate.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2034394 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19407
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2034397 - Posted: 28 Feb 2020, 12:36:36 UTC
Last modified: 28 Feb 2020, 12:38:34 UTC

As I was just having some lunch, I had a look at why there are so many Valid tasks showing in my account. It turns out that nearly 600 out of the total of 1020 it is over 24 hours since they were validated. I didn't check all 600 but didn't see any in the 10% (2/page) that I did look at.

So why haven't they been purged, or is it they are part of the 4 million awaiting Assimilation?

If everybody's accounts are the same and they are still awaiting Assimilation, the that number should be below 2.5 million.
ID: 2034397 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22539
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2034410 - Posted: 28 Feb 2020, 14:31:19 UTC

We might have a smoking gun - the assimilator queue should be fairly small, certainly not in the millions. Thought - are the assimilators being throttled at the same time as the splitters?
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2034410 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2034411 - Posted: 28 Feb 2020, 14:40:23 UTC - in response to Message 2034410.  

We might have a smoking gun - the assimilator queue should be fairly small, certainly not in the millions. Thought - are the assimilators being throttled at the same time as the splitters?
Mind you, the assimilator queue is subject to yet another constraint - the processing speed of the Science database, as distinct from the BOINC database that we've been discussing up till now.

Anything which slows down the science database - like, for example, taking a fresh snapshot for processing with Nebula over at Einstein/ATLAS cluster - will likely bork assimilation for a while. And I do see backroom tweaks being applied to Nebula periodically - it's not dead.
ID: 2034411 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2034439 - Posted: 28 Feb 2020, 19:11:32 UTC

Validation is working just fine but the server status page is displaying the data with slightly misleading labels. Tasks labeled 'Results returned and awaiting validation' are really all the returned tasks that haven't been assimilated yet.

Out of those 14 million results labeled as 'Results returned and awaiting validation' about 5 million are results from hosts whose wingmen are still crunching their task corresponding to the same workunit. Only about two thousand are tasks that are ready to be validated but validators haven't got to them yet and 9 milllion are validated tasks that are waiting to be assimilated.

The real problem is in assimilation being unable process the results at the rate they are being returned.
ID: 2034439 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2034440 - Posted: 28 Feb 2020, 19:24:32 UTC - in response to Message 2034411.  

...Anything which slows down the science database - like, for example, taking a fresh snapshot for processing with Nebula over at Einstein/ATLAS cluster - will likely bork assimilation for a while...
Is anyone else allowed access to the Science database? Hopefully only a very few are allowed access if it borks the system for a while....
ID: 2034440 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 16 · Next

Message boards : Number crunching : About Deadlines or Database reduction proposals


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.