Message boards :
Number crunching :
About Deadlines or Database reduction proposals
Message board moderation
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 · Next
Author | Message |
---|---|
rob smith Send message Joined: 7 Mar 03 Posts: 22491 Credit: 416,307,556 RAC: 380 |
the Resends could be preferentially sent to "reliable" (BOINC scheduler code speak for "a system with At least two flaws in your argument: First is that cherry-picking which host runs which task as a philosophy is not accepted by SETI; and second, we very recently saw what happens when a class of host goes rogue - it was bad enough when it was a class with a low population, now what if it had been one of the large population classes - randomness significantly reduces the probability of such unfortunate events. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Darrell Wilcox Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118 |
@ rob smith Well, the working database has millions of tasks at various stages ...Yes, 23,769,626 from the server statistics page I just opened. Correct? Don't know. ... this temporary table is far too big to sit in memory ...Can you show me your calculations for this? I estimate the 24*10^6 x size of each table entry (wild-ass guess here) of 10^3 would only give 24*10^9, or 24 GB. I KNOW there are many more things in the system, so no stones please. Just some facts and data would be nice. ... has to be "paged" in and out to allow work to be processed ...True enough, when it doesn't have enough RAM to hold everything it is using. And I am not fully convinced that the system is being much impacted right now, since the server is processing about 20,000 MORE tasks now that it was in November last year when the DB was much smaller (15,031,843 on 11/13/2019 from the Way Back Machine). ... a process cannot work on a task that is marked as being used in an earlier step, or indeed go back and look at one it's already dealt with.There is a situation or two where this actually can occur, but not regularly. There is the odd issue in that the validators do not always correctly flag tasks they've dealt with, and this appears to coincide withThis is exactly the kind of thing I enjoyed doing when I was the systems programmer for some large IBM mainframes. I wish I could get the logs and work on it. |
Darrell Wilcox Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118 |
@ rob smith First is that cherry-picking which host runs which task as a philosophy is not accepted by SETI"Cherry picking" is when only the best based on some criteria (ripest, reddest, biggest, ...) are selected. Resends are normally neither the best or the worst. ... we very recently saw what happens when a class of host goes rogue ...There are also parameters in the server cc_config.xml to limit the number of WUs/tasks that a given host (or user) can get each day. Appropriate fencing would indeed be prudent with that change. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
There are also parameters in the server cc_config.xml to limit the number of WUs/tasks that a given host (or user) can get each day. Appropriate fencing would indeed be prudent with that change.Unfortunately that system doesn't work as well as it should. It too has been the subject of discussion many times in the past. Grant Darwin NT |
Darrell Wilcox Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118 |
@ Grant (SSSF) I've already posted my suggested system somewhere around these forums (it can handle 3TB of RAM,Ouch! I saw your dream machine specs and wished it were possible quickly. I was thinking more along the line of 32GB RAM or such. How much does RAM cost for those servers? More than for my home PC's I'll bet. Also, how much RAM can be added to"Synergy" ? Start a longer fund raiser for the dream machine server. OK crunchers, what say you all? |
Darrell Wilcox Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118 |
@ Grant (SSSF) the subject of discussion many times in the past.In this forum? Do you know when and/or where the discussions are so that I can study them? |
rob smith Send message Joined: 7 Mar 03 Posts: 22491 Credit: 416,307,556 RAC: 380 |
Yesterday I had a look through all the tasks on my cruncher. for "_2" and greater. I was expecting a large number of them, and the majority to be "time outs", but... (total number of tasks, re-sends or not was ~2700) Total "_2" & "_3" was ~165 In total, across, in-progress, pending, inconclusive and valid the vast majority were down to inconclusive re-sends, ~130 Errors was next at 30 the rest being invalid, time-out, abort totaling 9 between them. (For the small number of "_3" I counted both triggers, so for a "_3" task on my list that had an error, followed by and abort this was counted twice, once under each heading, but the task on my list only once) Remember these are triggers for a task being re-sent to me. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
@ Grant (SSSF)Not a clue. They've cropped up from time to time over the years. I don't think there has ever been a dedicated thread about the short comings of BOINCs handling of problem systems, it's just developed each time in an existing thread- such as about a misbehaving system- and gone on from there. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
How much does RAM cost for those servers? More than for my home PC'sI'll bet.Not to mention the Server itself & the CPUs. Also, how much RAM can be added to"Synergy" ?The present database server is maxed out. That's the problem. Otherwise a quick whip around would've had more RAM in there in no time at all. Edit- that server is over kill- it'd meet our requirements for the next 50 years, and then some. Just a basic single socket EPYC Rome server would kill the existing server performance wise, and probably use less power, and support heaps more RAM. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14676 Credit: 200,643,578 RAC: 874 |
no analyzing will ever happen.Not for Eric: Author: korpela |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14676 Credit: 200,643,578 RAC: 874 |
It's about Nebula. |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3799 Credit: 1,114,826,392 RAC: 3,319 |
My half-baked hypothesis is that we'll have a very extended (days or more) out |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14676 Credit: 200,643,578 RAC: 874 |
If they do that, I hope they give us enough notice to switch to backup projects, and flush through all cached tasks before the outage starts - so we don't start filling the new database with crud as soon as it comes back online. And I hope two more things as well: 1) I hope setizens do something similar, rather than trying to grab oodles of work before the outage starts. 2) and I hope our staff warn all the other BOINC project administrators what's about to hit them! |
rob smith Send message Joined: 7 Mar 03 Posts: 22491 Credit: 416,307,556 RAC: 380 |
2) and I hope our staff warn all the other BOINC project administrators what's about to hit them! and not just sit back and laugh at the chaos/panic caused by having several thousand very hungry hosts arrive on their doorsteps demanding instant feeding.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
@ W-K 666 . . That is results returned on a single host basis, not results validated. The majority of the slow hosts affect fast hosts and so reduce the early returns in terms of validated tasks, making those early returns sit in limbo until the slowies get their work back. Stephen :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
@ rob smith . . Well Darrell, . . If you truly believe what you are suggesting then lead by example and drop all your machine work fetch sizes. So instead of holding 600 WUs like most good producers show us how well things run when your machines have only 20 or 60 WUs cached. That is is the easiest change of all and you can do it yourself. 5 secs per machine and you can demonstrate how effective machines operate with miniscule caches. Give it a go ... Stephen . . |
ML1 Send message Joined: 25 Nov 01 Posts: 21092 Credit: 7,508,002 RAC: 20 |
Just for a thought for identifying/solving the source bottleneck rather than merely massaging the symptoms: The entire Boinc system uses a singular central database as the "state machine" that keeps all activity, everything, coherent. Are we bottlenecked on the maximum rate of transactions that the database can process? And is the database itself bottlenecked on a combination of the maximum IOPS for the underlying filesystem(s)? And is there a critical bottleneck on the transaction rate possible feeding into a (singular?) log file for the database? Just a few thoughts... Keep searchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
rob smith Send message Joined: 7 Mar 03 Posts: 22491 Credit: 416,307,556 RAC: 380 |
Well, yes and no - each project (or group of projects) uses its own set of servers, the overall scores shown on the BOINC website are the fed there by queries run periodically on each project server. BOINC central does nothing to control or even influence the operation of the individual projects (sometimes I wish it did). Each host is running a BOINC client application, which provides local-to-the-host scheduling, and manages communication between the host and each of the project servers. Communication with the project servers is on a "push to server" basis - host requests something, project server responds, be it work, news etc. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
ML1 Send message Joined: 25 Nov 01 Posts: 21092 Credit: 7,508,002 RAC: 20 |
Thanks for the good answer for answering for what I wrote... However: What I was meaning/intending to describe was for the database central to a project such as here at s@h. Hence substitute "Boinc individual project" for the misworded Boinc ('central')! Hence is s@h transaction limited upon the IO speed for a single thread for the database transaction log file? Keep searchin' Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Boiler Paul Send message Joined: 4 May 00 Posts: 232 Credit: 4,965,771 RAC: 64 |
Since seti@home is shutting down at the end of March the points raised in this thread is moot. https://setiathome.berkeley.edu/forum_thread.php?id=85267#2035179 |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.