Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 . . . 22 · 23 · 24 · 25 · 26 · 27 · 28 . . . 107 · Next
Author | Message |
---|---|
rob smith Send message Joined: 7 Mar 03 Posts: 22225 Credit: 416,307,556 RAC: 380 |
There are ways..... But less impressive is the number of errors due to late returns - over 300 and counting :-( Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The One Wingman didn't agree with the One Overflow marked Valid Most of my six pages of Quorum=1 tasks are just regular, normal tasks and not overflows. Also not many are AMD either. A lot are just regular cpu tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Wouldn't that introduce a large number of rubbish results into the science database? This way may be slow and painful but it seems to me it is necessary ... Stephen ? ? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Looking at my olders pendings, I found this interesting computer: . . It is hosts such as this one that justify the suggested change of forcing all existing outstanding WUs to a deadline of one week at this stage which would force them to resends and maybe get results before the curtain comes down. His return rates are 17 days for CPU tasks and 55 days for SoG tasks on a 1080ti. That is very, very wrong. 15,000 cached tasks? Crazy!! Stephen :( |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
The One Wingman didn't agree with the One Overflow marked Valid . . But how hard would it be to sort the 'normal' CPU or SoG results from those of less confident hosts? Because it would be unwise to just accept single host results from such machines. Not just AMD 57nn cards but any hosts with a suspect result quality. Stephen ? ? |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
Lots of new files added to the splitters now. Question is will we get these all processed within the next 13 to 14 days? In my eyes there has been progress made because "results waiting DB purge" has risen to over a million. I do not know the last time I saw this |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
. . Wouldn't that introduce a large number of rubbish results into the science database? This way may be slow and painful but it seems to me it is necessary ...The Results from the "Bad" AMD 5700s will Never be used to Search for ET as the Results from those machines are ALL at Chirp = 0, and will be Removed as RFI even if they do make it to the Database. The other Machines sent the minimum quorum = 1 WUs are reliable machines and do Not produce "rubbish", that's Why they were sent those WUs. At this point it would be Much better to FIX the Database than try to save a few WUs that will be trashed as RFI. Trying to save a few WUs is causing Thousands from being completed. Why try to save a few if it means Missing Thousands? I think any changes to the quorums should be reverted back to where they were before the problem started in early Dec. I'd much rather have a few WUs listed as RFI than have the fastest machines go DAYS producing Nothing instead of Thousands of completed tasks a day. You do realize the Problem with the AMD 5700s was going on for Months before the changes were made to the quorum count, right? It's not as if they haven't already loaded the database with loads of RFI WUs. |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 |
Looking at my olders pendings, I found this interesting computer: How does this happen? 15,000 tasks for a 1K credit person? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The client operation can be broken in a multitude of ways with a host that does not fit within the norms. So the default braking mechanism fails spectacularly in some cases. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 |
The client operation can be broken in a multitude of ways with a host that does not fit within the norms. So the default braking mechanism fails spectacularly in some cases. Obviously :). Spectacularly is a very fitting and proper word here. LOL. |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
There is a healthy sized ready to send queue and the system is running well. The replica is also keeping up. I'm puzzled as certain numbers seem high, so I don't know what magic was worked. I'm glad it is running well now though. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304 |
No outage again this Tuesday?It would appear not. So far things are continuing to struggle along. Whenever we have an outage it takes over 12 hours once the system comes back up for things to settle down anyway. And until no new work is issued (or the variable Quorum's changed back to a fixed 2), none of the backlogs or bloat are going to clear. So we might as well just stagger along towards the finish line, with only 2 weeks to go now. Grant Darwin NT |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
How are files added to be split?? I think the Aricebo files are automatically added. Not sure how the blc files arrive. Can the seti crew add files to be split remotely?? How is this affecting the data collection sites? Scientists who have time on telescopes may not be able to travel to get there. |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
Can the seti crew add files to be split remotely??Server computers are generally boxes stacked in server racks with no keyboards or monitors, so they are always accessed remotely. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
By SSP All unstarted tapes has been removed. The beginning of the end? |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
even if we run out of BLC work, it looks like the Arecibo automation is still running. 2 more tapes have been added. it also looks like they removed any throttling. they put the brick on the gas pedal and let it go! hahaha. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Freewill Send message Joined: 19 May 99 Posts: 766 Credit: 354,398,348 RAC: 11,693 |
|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874 |
I'll accept 16mr20af as automation, but I think 16se11ab must have been manually chosen. |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
even if we run out of BLC work, it looks like the Arecibo automation is still running. 2 more tapes have been added. Also I just noticed after the latest update of the SSP the deleted seem to be doing their job as well because results waiting to be purged is back under a million |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
we may indeed go out with a bang! Where is the kaboom? It's suppose to be a kaboom! The new WU is flow without restrictions, the size of the DB WU count is >20MM and there are no problems t all. Did they finally find the fix for the DB size problem? Just now when the curtains are ready to close? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.