Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 . . . 94 · 95 · 96 · 97 · 98 · 99 · 100 . . . 108 · Next
| Author | Message |
|---|---|
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 13797 Credit: 40,757,560 RAC: 151
|
From what I remember, the people with the AMD 5700s were cranking one out every 20 seconds or so. Those machines would be the ones to investigate. It was really quite alarming to see so many clearly False Valids being generated. I had a look to see if they were from any of the known problems or 'noise bombs' and it doesn't seem to be the case. and some of mine are blc's the others from Arecibo and all lokk like they ran full distance. |
|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 6,279
|
From what I remember, the people with the AMD 5700s were cranking one out every 20 seconds or so. Those machines would be the ones to investigate. It was really quite alarming to see so many clearly False Valids being generated. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 12990 Credit: 208,696,464 RAC: 690
|
Forums have almost ground to a halt. So the Scheduler should go MIA again any minute now... Edit, yep- fail, fail fail. And even the web site is barely responding. 7/03/2020 15:35:06 | SETI@home | Scheduler request failed: Couldn't connect to server 7/03/2020 15:36:58 | SETI@home | Scheduler request failed: Couldn't connect to server 7/03/2020 15:40:28 | SETI@home | Scheduler request failed: Couldn't connect to server Grant Darwin NT |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 13797 Credit: 40,757,560 RAC: 151
|
An unknown variable. If you only have two with an RAC over 1.6 million, but I have ten with an RAC of 26,400 then it probably too difficult to make an accurate guess. But i doubt if it is anywhere near a million. edit] Not all from the same tape, just all split on the 30th Jan |
|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 6,279
|
Ah, many are now gone, leaving two that are listed as Validated with a "minimum quorum 1" I wonder how many of those are still lurking around? Is it Validated or what? https://setiathome.berkeley.edu/workunit.php?wuid=3861283408 granted credit 104.20 minimum quorum : 1 initial replication : 2 Task Computer Sent Time reported Status Runtime CPUtime Credit Application 8493614556 8097309 30 Jan 2020, 17:37:30 UTC 31 Jan 2020, 10:07:46 UTC Completed and validated 259.93 244.61 104.20 SETI@home v8 v8.11 (cuda42_mac)x86_64-apple-darwin 8493614557 8743335 30 Jan 2020, 17:37:22 UTC 23 Mar 2020, 9:03:18 UTC In progress --- --- --- SETI@home v8 v8.24 (opencl_ati5_SoG_nocal)windows_intelx86Millions? |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 13797 Credit: 40,757,560 RAC: 151
|
You do realize Eric ran that script many hours ago, right? I'll give you another 2.5 hours though. The script is probably still running and will until Eric gets up and takes a look at the progress. It is going to take some time to remove the 12 million tasks in the bloat |
|
Grumpy Swede Send message Joined: 1 Nov 08 Posts: 8170 Credit: 49,849,242 RAC: 147
|
I give up!!! Believe what you want, I don't care. Geeze.... |
|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 6,279
|
You do realize Eric ran that script many hours ago, right? I'll give you another 2.5 hours though. Every WU older than 29 Feb... |
|
Grumpy Swede Send message Joined: 1 Nov 08 Posts: 8170 Credit: 49,849,242 RAC: 147
|
Every single WU on every page? It goes on for pages, starting with Feb 29, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=3160&state=4 Exactly!! |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 13797 Credit: 40,757,560 RAC: 151
|
Every single WU on every page? It goes on for pages, starting with Feb 29, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=3160&state=4 That is a demonstration of progress. The listing of valid's comes from the Replica, where the task is still visible. But the workunit page comes from the master and the workunit has been purged. |
|
Grumpy Swede Send message Joined: 1 Nov 08 Posts: 8170 Credit: 49,849,242 RAC: 147
|
Every single WU on every page? It goes on for pages, starting with Feb 29, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=3160&state=4 Yes, until the replica catches up with the moment in time when the tasks in question were deleted. Remember, Eric did run a script, and the replica was way behind even then. Just watch what happens in the coming hours with those pages. But of course, since this has become an endless discussion club with wild and sometimes uninformed theories, you do not need to believe me, since that would stop the endless discussions and speculations of what's going on :-) |
|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 6,279
|
Every single WU on every page? It goes on for pages, starting with Feb 29, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=3160&state=4 |
|
Grumpy Swede Send message Joined: 1 Nov 08 Posts: 8170 Credit: 49,849,242 RAC: 147
|
I tried it with a Host with many Valids and it's still spinning. I then tried it with a smaller number and reached a large number of WUs dated 18 Feb that all fail to open with the error, The replica is over 8000 seconds behind, and you're trying to open something that in reality is no longer there, it's in fact already deleted. I've seen the same behaviour over the years many times when the replica is behind, so nothing new there. |
kittyman ![]() Send message Joined: 9 Jul 00 Posts: 50494 Credit: 1,018,363,574 RAC: 2,276
|
LOL, I suppose that would be true. Meow. "Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein "With cats." kittyman
|
|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 6,279
|
I tried it with a Host with many Valids and it's still spinning. I then tried it with a smaller number and reached a large number of WUs dated 18 Feb that all fail to open with the error, Unable to handle request can't find workunit It's just a WAG, but, I would imagine it would be difficult to Assimilate something that can't be found....I Dunno See if you can open this inside of a few minutes, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=3260&state=4 |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1119 Credit: 48,373,696 RAC: 74,889
|
No, they haven't, go to the very end of the listing for my MB Valid tasks and you will find, 10 tasks issued 30 Jan 2020I can't browse the listing of valid tasks alone. Only the 'all tasks' list really works. Trying to choose anything else just leaves the browser loading the page forever without ever getting anything. Even when I try to click my invalid task list that has only two tasks in it. |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1119 Credit: 48,373,696 RAC: 74,889
|
This graph shows that 'waiting for validation' on SSP really means 'waiting for validation or assimilation': The purple curve is the validation queue size as shown on SSP. The green curve is assimilation queue size form SSP multiplied by 2.2 to scale if from workunits to results. The blue curve is their difference, i.e. the true number of results waiting for validation. The blue curve looks very much like the validation queue before the assimilation problem started. Stable around 5 million with a sharp spike just after each weekly downtime when everyone reports their results crunched during the downtime. We also see that the spike drops down as fast as it climbed up, so validation has worked fine, but simultaneously with this drop, the assimilation curve climbs up and then stays there. So the validated results get stuck in the assimilation queue. The assimilation queue descends down much slower. So slow that the next downtime hits before it has reached the level it had before the previous downtime. So every downtime pushes it higher and higher. |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 13797 Credit: 40,757,560 RAC: 151
|
Assimilation logjam holds tasks in that 15 milllion result SSP slot for about three days only. So the quorum 1 tasks have been assimilated and deleted long ago. No, they haven't, go to the very end of the listing for my MB Valid tasks and you will find, 10 tasks issued 30 Jan 2020, 1:55:47 UTC thru 30 Jan 2020, 11:30:16 UTC all validated by one result, mine, wingman still not reported. Deadlines are the 22nd or 23rd March. |
|
Ville Saari Send message Joined: 30 Nov 00 Posts: 1119 Credit: 48,373,696 RAC: 74,889
|
Assimilation queue size is about three days worth of production and also my tasks disappear from the web site about three days after they have been validated. I seems to work with fifo principle. |
|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 6,279
|
Are you sure they are less than 3 days old? How could you test how old they are? I was just giving an example of a task that could cause problems, of course, it could be anything causing the problem. |
©2020 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.