About Deadlines or Database reduction proposals

Message boards : Number crunching : About Deadlines or Database reduction proposals
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 · Next

AuthorMessage
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22491
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2035011 - Posted: 2 Mar 2020, 10:13:01 UTC

the Resends could be preferentially sent to "reliable" (BOINC scheduler code speak for "a system with
a short turnaround and few errors") system for reprocessing.


At least two flaws in your argument: First is that cherry-picking which host runs which task as a philosophy is not accepted by SETI; and second, we very recently saw what happens when a class of host goes rogue - it was bad enough when it was a class with a low population, now what if it had been one of the large population classes - randomness significantly reduces the probability of such unfortunate events.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2035011 · Report as offensive     Reply Quote
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 2035012 - Posted: 2 Mar 2020, 10:13:13 UTC - in response to Message 2035001.  

@ rob smith
Well, the working database has millions of tasks at various stages ...
Yes, 23,769,626 from the server statistics page I just opened. Correct? Don't know.
... this temporary table is far too big to sit in memory ...
Can you show me your calculations for this? I estimate the 24*10^6 x size of each table entry (wild-ass guess here) of 10^3
would only give 24*10^9, or 24 GB. I KNOW there are many more things in the system, so no stones please. Just some
facts and data would be nice.
... has to be "paged" in and out to allow work to be processed ...
True enough, when it doesn't have enough RAM to hold everything it is using. And I am not fully convinced that the system is being much impacted
right now, since the server is processing about 20,000 MORE tasks now that it was in November last year when the DB was much smaller
(15,031,843 on 11/13/2019 from the Way Back Machine).
... a process cannot work on a task that is marked as being used in an earlier step, or indeed go back and look at one it's already dealt with.
There is a situation or two where this actually can occur, but not regularly.
There is the odd issue in that the validators do not always correctly flag tasks they've dealt with, and this appears to coincide with
either a major server "bump", or being processed as they are stopped. Richard(?) did do some digging on this a long time ago, but since
this was such a rare occurrence nothing appears to have been done about it.
This is exactly the kind of thing I enjoyed doing when I was the systems programmer for some large IBM mainframes. I wish I could get
the logs and work on it.
ID: 2035012 · Report as offensive     Reply Quote
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 2035013 - Posted: 2 Mar 2020, 10:25:01 UTC - in response to Message 2035011.  

@ rob smith
First is that cherry-picking which host runs which task as a philosophy is not accepted by SETI
"Cherry picking" is when only the best based on some criteria (ripest, reddest, biggest, ...) are
selected. Resends are normally neither the best or the worst.
... we very recently saw what happens when a class of host goes rogue ...
There are also parameters in the server cc_config.xml to limit the number of WUs/tasks that a given host (or user)
can get each day. Appropriate fencing would indeed be prudent with that change.
ID: 2035013 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 2035014 - Posted: 2 Mar 2020, 10:27:47 UTC - in response to Message 2035013.  

There are also parameters in the server cc_config.xml to limit the number of WUs/tasks that a given host (or user) can get each day. Appropriate fencing would indeed be prudent with that change.
Unfortunately that system doesn't work as well as it should. It too has been the subject of discussion many times in the past.
Grant
Darwin NT
ID: 2035014 · Report as offensive     Reply Quote
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 2035017 - Posted: 2 Mar 2020, 10:31:58 UTC - in response to Message 2035009.  

@ Grant (SSSF)
I've already posted my suggested system somewhere around these forums (it can handle 3TB of RAM,
sufficient improvement over the current system's 96GB to be good for a couple of years i should think).
Ouch! I saw your dream machine specs and wished it were possible quickly. I was thinking more along
the line of 32GB RAM or such. How much does RAM cost for those servers? More than for my home PC's
I'll bet. Also, how much RAM can be added to"Synergy" ?

Start a longer fund raiser for the dream machine server.

OK crunchers, what say you all?
ID: 2035017 · Report as offensive     Reply Quote
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 2035019 - Posted: 2 Mar 2020, 10:34:23 UTC - in response to Message 2035014.  

@ Grant (SSSF)
the subject of discussion many times in the past.
In this forum? Do you know when and/or where the discussions are so that I can study them?
ID: 2035019 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22491
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2035020 - Posted: 2 Mar 2020, 10:36:30 UTC

Yesterday I had a look through all the tasks on my cruncher. for "_2" and greater.
I was expecting a large number of them, and the majority to be "time outs", but... (total number of tasks, re-sends or not was ~2700)
Total "_2" & "_3" was ~165
In total, across, in-progress, pending, inconclusive and valid the vast majority were down to inconclusive re-sends, ~130
Errors was next at 30
the rest being invalid, time-out, abort totaling 9 between them.

(For the small number of "_3" I counted both triggers, so for a "_3" task on my list that had an error, followed by and abort this was counted twice, once under each heading, but the task on my list only once)

Remember these are triggers for a task being re-sent to me.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2035020 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 2035021 - Posted: 2 Mar 2020, 10:50:23 UTC - in response to Message 2035019.  

@ Grant (SSSF)
the subject of discussion many times in the past.
In this forum? Do you know when and/or where the discussions are so that I can study them?
Not a clue. They've cropped up from time to time over the years.
I don't think there has ever been a dedicated thread about the short comings of BOINCs handling of problem systems, it's just developed each time in an existing thread- such as about a misbehaving system- and gone on from there.
Grant
Darwin NT
ID: 2035021 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13841
Credit: 208,696,464
RAC: 304
Australia
Message 2035022 - Posted: 2 Mar 2020, 10:55:10 UTC - in response to Message 2035017.  
Last modified: 2 Mar 2020, 11:08:24 UTC

How much does RAM cost for those servers? More than for my home PC'sI'll bet.
Not to mention the Server itself & the CPUs.


Also, how much RAM can be added to"Synergy" ?
The present database server is maxed out. That's the problem.
Otherwise a quick whip around would've had more RAM in there in no time at all.



Edit- that server is over kill- it'd meet our requirements for the next 50 years, and then some.
Just a basic single socket EPYC Rome server would kill the existing server performance wise, and probably use less power, and support heaps more RAM.

Grant
Darwin NT
ID: 2035022 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2035038 - Posted: 2 Mar 2020, 12:13:16 UTC - in response to Message 2035030.  

no analyzing will ever happen.

Unfortunately, I fear that analyzing (the most important part of this, now that we have so many results) will never happen, since this has now become a "bread and butter factory" for some people.
Not for Eric:

Author: korpela
Date: 2020-02-18 14:43:33 -0800 (Tue, 18 Feb 2020)
New Revision: 4112

Changed pulse and triplet period matching to use a better estimate of
pulse period error and when comparing pulses against triplets, using the error
in the triplet period which will always be large compared to the pulse period
error.
ID: 2035038 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2035046 - Posted: 2 Mar 2020, 12:39:59 UTC - in response to Message 2035043.  

It's about Nebula.
ID: 2035046 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3799
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2035056 - Posted: 2 Mar 2020, 14:03:37 UTC

My half-baked hypothesis is that we'll have a very extended (days or more) outrage this Tuesday. The team promised they'd post pics, stats etc. of the new device for the 16TB drives we just fundraised for, and as we haven't seen anything on it yet this may be what it's about. I would expect if they are consolidating all of the project into one box it will be a good time for a complete database rebuild to get rid of bad records, including all the junk entries from the RX 5700 cards and any other detritus that has built up over the years. We don't want that in Nebula.
ID: 2035056 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14676
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2035059 - Posted: 2 Mar 2020, 14:14:09 UTC - in response to Message 2035056.  

If they do that, I hope they give us enough notice to switch to backup projects, and flush through all cached tasks before the outage starts - so we don't start filling the new database with crud as soon as it comes back online.

And I hope two more things as well:
1) I hope setizens do something similar, rather than trying to grab oodles of work before the outage starts.
2) and I hope our staff warn all the other BOINC project administrators what's about to hit them!
ID: 2035059 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22491
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2035069 - Posted: 2 Mar 2020, 15:37:49 UTC

2) and I hope our staff warn all the other BOINC project administrators what's about to hit them!

and not just sit back and laugh at the chaos/panic caused by having several thousand very hungry hosts arrive on their doorsteps demanding instant feeding....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2035069 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2035078 - Posted: 2 Mar 2020, 16:24:23 UTC - in response to Message 2035000.  
Last modified: 2 Mar 2020, 16:24:54 UTC

@ W-K 666
Are you sure. Eric's graph https://setiathome.berkeley.edu/forum_thread.php?id=83848&postid=1978208#1978208 indicates 98% of all tasks get returned in 3 days.
This is my calculated %'s
"tasks returned" estimated by eyeball.


. . That is results returned on a single host basis, not results validated. The majority of the slow hosts affect fast hosts and so reduce the early returns in terms of validated tasks, making those early returns sit in limbo until the slowies get their work back.

Stephen

:(
ID: 2035078 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2035081 - Posted: 2 Mar 2020, 16:34:44 UTC - in response to Message 2035013.  

@ rob smith
First is that cherry-picking which host runs which task as a philosophy is not accepted by SETI
"Cherry picking" is when only the best based on some criteria (ripest, reddest, biggest, ...) are
selected. Resends are normally neither the best or the worst.
... we very recently saw what happens when a class of host goes rogue ...
There are also parameters in the server cc_config.xml to limit the number of WUs/tasks that a given host (or user)
can get each day. Appropriate fencing would indeed be prudent with that change.


. . Well Darrell,

. . If you truly believe what you are suggesting then lead by example and drop all your machine work fetch sizes. So instead of holding 600 WUs like most good producers show us how well things run when your machines have only 20 or 60 WUs cached. That is is the easiest change of all and you can do it yourself. 5 secs per machine and you can demonstrate how effective machines operate with miniscule caches. Give it a go ...

Stephen

. .
ID: 2035081 · Report as offensive     Reply Quote
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21092
Credit: 7,508,002
RAC: 20
United Kingdom
Message 2035088 - Posted: 2 Mar 2020, 17:29:01 UTC

Just for a thought for identifying/solving the source bottleneck rather than merely massaging the symptoms:


The entire Boinc system uses a singular central database as the "state machine" that keeps all activity, everything, coherent.

Are we bottlenecked on the maximum rate of transactions that the database can process?

And is the database itself bottlenecked on a combination of the maximum IOPS for the underlying filesystem(s)?

And is there a critical bottleneck on the transaction rate possible feeding into a (singular?) log file for the database?



Just a few thoughts...

Keep searchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 2035088 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22491
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2035101 - Posted: 2 Mar 2020, 18:32:07 UTC

Well, yes and no - each project (or group of projects) uses its own set of servers, the overall scores shown on the BOINC website are the fed there by queries run periodically on each project server. BOINC central does nothing to control or even influence the operation of the individual projects (sometimes I wish it did).
Each host is running a BOINC client application, which provides local-to-the-host scheduling, and manages communication between the host and each of the project servers. Communication with the project servers is on a "push to server" basis - host requests something, project server responds, be it work, news etc.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2035101 · Report as offensive     Reply Quote
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21092
Credit: 7,508,002
RAC: 20
United Kingdom
Message 2035110 - Posted: 2 Mar 2020, 18:54:06 UTC - in response to Message 2035101.  
Last modified: 2 Mar 2020, 18:54:53 UTC

Thanks for the good answer for answering for what I wrote...

However: What I was meaning/intending to describe was for the database central to a project such as here at s@h.

Hence substitute "Boinc individual project" for the misworded Boinc ('central')!

Hence is s@h transaction limited upon the IO speed for a single thread for the database transaction log file?


Keep searchin'
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 2035110 · Report as offensive     Reply Quote
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 2035180 - Posted: 2 Mar 2020, 21:36:22 UTC
Last modified: 2 Mar 2020, 21:37:44 UTC

Since seti@home is shutting down at the end of March the points raised in this thread is moot.

https://setiathome.berkeley.edu/forum_thread.php?id=85267#2035179
ID: 2035180 · Report as offensive     Reply Quote
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 · Next

Message boards : Number crunching : About Deadlines or Database reduction proposals


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.