Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 88 · 89 · 90 · 91 · 92 · 93 · 94 · Next
| Author | Message |
|---|---|
Unixchick ![]() Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22
|
And for extremely slow rarely on systems, 1 month is plenty of time for them to return a WU. It's actually plenty of time for them to return many WUs. +1 |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14016 Credit: 208,696,464 RAC: 304
|
Theoretically they could replace all of these systems with just a couple AMD Epyc based servers.A single modern dual socket Epyc server can have more cores than all those listed servers combined! There are even many single core chips - those must be really ancient. The present bottleneck is storage I/O (Input Output). HDDs are not good for non sequential work. While the recent plans to move to a new storage unit with larger capacity (and so much faster) HDDs may help, the present srever accessing the data can no longer keep it all cached in memory. Hence the present issues. If the project were to put out their wish list for a new SSD based storage unit, and a new database server with significantly more RAM & faster CPUs (and idleally the replica as well), i'm sure the community would come to the party. That would well and truly fix the present database backlog issues. Of course once you remove one bottleneck, others will show up- as it is the download servers have repeated issues meeting current demands, the upload server is continually having issues (any news on it's replacement?), and the Scheduler is always having random issues meeting demand. Replacing the exiting database server with something much better would allow the replaced hardware to be used for those functions that are already showing signs of not coping. As old and limited as it is, the present database server is much more powerfull & has much more RAM than most of the other servers. Grant Darwin NT |
rob smith ![]() Send message Joined: 7 Mar 03 Posts: 22958 Credit: 416,307,556 RAC: 380
|
Yes, SETI@Home only gets a small fraction of the total Break Through Listening project's data - it only gets the bit that it is interested in and is capable of processing. A lot of the data collected is from the wrong frequency ranges, or the telescope doesn't have the equipment required to deliver the data to S@H. Just think how long it has taken to not gain access to the data from Parks, technical issues have obstructed that feed. The there is LOFAR, which produces data from a set of frequencies unusable by S@H, but is a contributor to the BTLP, or the UK Lovell telescope, or Merlin (and many other telescopes that don't operate in the manner that is required by S@H, but contribute data to the BTLP. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
....they really just need better hardware... We might be able to raise the money but how can we help with the Time? I know. Lets put together a Shadow Data Center that keeps a full copy of the Primary. But do it with more Modern hardware. Then as the other computers die, the Shadow takes over. After all "The Shadow Knows...." Actually if we could come up with a way to remotely replicate the Primary without impacting its performance, that might not be a half bad idea. Once we are "up" they could take down the old system and move the Shadow hardware into place. Lets see now.... If we get X number of dual core Epyc's with "Maximum" memory for all 8 of those channels.... ;) Tom A proud member of the OFA (Old Farts Association). |
|
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266
|
It would be grand if the project could meter work allocation out to the host computers based on their ability to return processed work. Would you honestly tell me that we don't have a single mind among either the community at large, or the big brains in the computer science classes in the post graduate school at Berkeley who could not use the available information contained about each host, who could not use this information to create an algorithm in a matter of hours, to arrive at a solution which limits every host to a commensurate and sane number? These tidbits of information we have: Created 24 Nov 2019, 19:42:47 UTC Average credit 46,727.03 Average turnaround time 0.37 days Last time contacted server 22 Feb 2020, 0:38:19 UTC Fraction of time BOINC is running 99.66% While BOINC is running, fraction of time computing is allowed 100.00% We already process the necessary information to justify daily download numbers, which are based upon each individual computer's participation in the project, given a fixed limit of four weeks per work unit, to come up with just a simple number each computer's statistics could hold. Tasks/unit of time. |
juan BFP ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799
|
Still no new work? Where is the <panic> bottom?
|
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14016 Credit: 208,696,464 RAC: 304
|
Still no new work? Where is the <panic> bottom?Every now and then some turn up, you just have to be extremely lucky with the timing of your request (even luckier than after the weekly outages). Grant Darwin NT |
Wiggo Send message Joined: 24 Jan 00 Posts: 38678 Credit: 261,360,520 RAC: 489
|
I don't know if there's a proper answer or solution to anything here, but remember that our deadlines were (and are still) based on what was considered fair by pre-BOINC days and what I was using back then took almost 10 days (222hrs in fact) to complete a CPU task (a pre MMX job) while since then any processor that doesn't support at least SSE instructions will not work here these days (not that you'd want to run 1 of them now anyway), plus the use of GPU's/iGPU's wasn't even thought of back then. On a side note my 2 little rigs (just in huge cases) have been able to stay near or close to their cache limits for the last 9.5hrs (they were almost out of GPU work when I got up an hour before that), but maybe my 2 rigs are sitting in some sort of "Sweet Spot" (who actually knows anything these days?). Cheers. |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19988 Credit: 40,757,560 RAC: 67
|
Actually I think the deadlines were set, after the upgrade that produced the large differences in crunch times at different Angle Ranges, based on the work done by Joe Segur. 2007 - postid692684 - Estimates and Deadlines revisited |
Unixchick ![]() Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22
|
good news Workunits waiting for assimilation is less than 4 million Results returned and awaiting validation is less than 14 million. progress. |
Wiggo Send message Joined: 24 Jan 00 Posts: 38678 Credit: 261,360,520 RAC: 489
|
Actually I think the deadlines were set, after the upgrade that produced the large differences in crunch times at different Angle Ranges, based on the work done by Joe Segur.And back in 2007 you could still use a pre SSE instruction CPU. ;-) Cheers |
|
Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50
|
I've been running Boinc/Seti 24/7 for 3 years and haven't seen a cuda task since my system stabiliuzed with SOG app. Imagine my surprise when I discovered 17 of them this morning. What's up with that ? I've made no changes to that system in ages.... Humans may rule the world...but bacteria run it... |
rob smith ![]() Send message Joined: 7 Mar 03 Posts: 22958 Credit: 416,307,556 RAC: 380
|
Every now and then the servers will send out a few tasks destined for other apps to confirm that you are still using the best performing one. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
rob smith ![]() Send message Joined: 7 Mar 03 Posts: 22958 Credit: 416,307,556 RAC: 380
|
We might be able to raise the money but how can we help with the Time? A fairly simple solution. Recently we've been looking that the capital money (e.g. buying new disks), but there needs to be a fair bit of money for wages, rent, power etc - the revenue spend. My figures might be a bit off, but I would guess it would cost about $100k to cover salary and all other costs for a single person for a year. While not as glamorous as a lump of hardware. Having another pair of hands would ease the burden of daily server tasks to allow the likes of Eric to develop the software and prepare grant applications etc. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
|
Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50
|
Every now and then the servers will send out a few tasks destined for other apps to confirm that you are still using the best performing one. Tnx...makes sense...just finding it funny that it took 3 years... :) Humans may rule the world...but bacteria run it... |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530
|
Tnx...makes sense...just finding it funny that it took 3 years... :)It just took 3 years for you to notice. I guess you didn't watch your queue all the time during those 3 years to make sure no non SoG task slips through. |
Kissagogo27 Send message Joined: 6 Nov 99 Posts: 717 Credit: 8,032,827 RAC: 62
|
I found one stuck since the end of January https://setiathome.berkeley.edu/show_host_detail.php?hostid=8884799 i am still waiting about it https://setiathome.berkeley.edu/workunit.php?wuid=3861245798 xD |
rob smith ![]() Send message Joined: 7 Mar 03 Posts: 22958 Credit: 416,307,556 RAC: 380
|
It hasn't reached its deadline, so why panic? Actually that is one of mine, and as I'm not going to be able to get back to it for a few weeks and sort out its power supply you'll just have to be patient Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Kissagogo27 Send message Joined: 6 Nov 99 Posts: 717 Credit: 8,032,827 RAC: 62
|
i don't panic at all ;) |
juan BFP ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799
|
The splitters are down..... again..... This will be going to be a long weekend of ups & downs. No <panic> yet. The WU cache is holding. Need to go for some more six packs for the festivities. <edit< Just because i write this message they are up again. Could be a hell of a coincidence, but apparently each time the AP splitters starts the other goes down for some time.
|
©2026 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.