The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 94 · 95 · 96 · 97 · 98 · 99 · 100 . . . 108 · Next

AuthorMessage
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 13797
Credit: 40,757,560
RAC: 151
United Kingdom
Message 2036453 - Posted: 7 Mar 2020, 6:11:29 UTC - in response to Message 2036452.  

From what I remember, the people with the AMD 5700s were cranking one out every 20 seconds or so. Those machines would be the ones to investigate. It was really quite alarming to see so many clearly False Valids being generated.

I had a look to see if they were from any of the known problems or 'noise bombs' and it doesn't seem to be the case. and some of mine are blc's the others from Arecibo and all lokk like they ran full distance.
ID: 2036453 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 6,279
United States
Message 2036452 - Posted: 7 Mar 2020, 6:07:02 UTC - in response to Message 2036449.  

From what I remember, the people with the AMD 5700s were cranking one out every 20 seconds or so. Those machines would be the ones to investigate. It was really quite alarming to see so many clearly False Valids being generated.
ID: 2036452 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 12990
Credit: 208,696,464
RAC: 690
Australia
Message 2036451 - Posted: 7 Mar 2020, 6:02:56 UTC
Last modified: 7 Mar 2020, 6:16:16 UTC

Forums have almost ground to a halt. So the Scheduler should go MIA again any minute now...


Edit, yep- fail, fail fail. And even the web site is barely responding.
7/03/2020 15:35:06 | SETI@home | Scheduler request failed: Couldn't connect to server
7/03/2020 15:36:58 | SETI@home | Scheduler request failed: Couldn't connect to server
7/03/2020 15:40:28 | SETI@home | Scheduler request failed: Couldn't connect to server
Grant
Darwin NT
ID: 2036451 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 13797
Credit: 40,757,560
RAC: 151
United Kingdom
Message 2036449 - Posted: 7 Mar 2020, 5:57:18 UTC - in response to Message 2036446.  
Last modified: 7 Mar 2020, 6:01:13 UTC

An unknown variable.
If you only have two with an RAC over 1.6 million, but I have ten with an RAC of 26,400 then it probably too difficult to make an accurate guess.

But i doubt if it is anywhere near a million. they are probably all from the same tape that was split on the 30th Jan.

edit]
Not all from the same tape, just all split on the 30th Jan
ID: 2036449 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 6,279
United States
Message 2036446 - Posted: 7 Mar 2020, 5:33:26 UTC
Last modified: 7 Mar 2020, 5:46:26 UTC

Ah, many are now gone, leaving two that are listed as Validated with a "minimum quorum 1"
I wonder how many of those are still lurking around? Is it Validated or what? https://setiathome.berkeley.edu/workunit.php?wuid=3861283408
granted credit 	104.20
minimum quorum : 1
initial replication : 2

   Task     Computer          Sent                  Time reported                  Status          Runtime  CPUtime  Credit                   Application
8493614556  8097309  30 Jan 2020, 17:37:30 UTC  31 Jan 2020, 10:07:46 UTC  Completed and validated  259.93  244.61  104.20 SETI@home v8 v8.11 (cuda42_mac)x86_64-apple-darwin
8493614557  8743335  30 Jan 2020, 17:37:22 UTC  23 Mar 2020, 9:03:18 UTC         In progress          ---     ---    ---   SETI@home v8 v8.24 (opencl_ati5_SoG_nocal)windows_intelx86
Millions?
ID: 2036446 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 13797
Credit: 40,757,560
RAC: 151
United Kingdom
Message 2036442 - Posted: 7 Mar 2020, 4:46:54 UTC - in response to Message 2036439.  
Last modified: 7 Mar 2020, 4:51:47 UTC

You do realize Eric ran that script many hours ago, right? I'll give you another 2.5 hours though.
Every WU older than 29 Feb...

The script is probably still running and will until Eric gets up and takes a look at the progress.

It is going to take some time to remove the 12 million tasks in the bloat
ID: 2036442 · Report as offensive     Reply Quote
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8170
Credit: 49,849,242
RAC: 147
Sweden
Message 2036441 - Posted: 7 Mar 2020, 4:44:57 UTC

I give up!!!
Believe what you want, I don't care.

Geeze....
ID: 2036441 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 6,279
United States
Message 2036439 - Posted: 7 Mar 2020, 4:43:10 UTC - in response to Message 2036434.  

You do realize Eric ran that script many hours ago, right? I'll give you another 2.5 hours though.
Every WU older than 29 Feb...
ID: 2036439 · Report as offensive     Reply Quote
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8170
Credit: 49,849,242
RAC: 147
Sweden
Message 2036436 - Posted: 7 Mar 2020, 4:39:44 UTC - in response to Message 2036435.  

Every single WU on every page? It goes on for pages, starting with Feb 29, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=3160&state=4

That is a demonstration of progress.
The listing of valid's comes from the Replica, where the task is still visible.
But the workunit page comes from the master and the workunit has been purged.

Exactly!!
ID: 2036436 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 13797
Credit: 40,757,560
RAC: 151
United Kingdom
Message 2036435 - Posted: 7 Mar 2020, 4:38:38 UTC - in response to Message 2036433.  

Every single WU on every page? It goes on for pages, starting with Feb 29, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=3160&state=4

That is a demonstration of progress.
The listing of valid's comes from the Replica, where the task is still visible.
But the workunit page comes from the master and the workunit has been purged.
ID: 2036435 · Report as offensive     Reply Quote
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8170
Credit: 49,849,242
RAC: 147
Sweden
Message 2036434 - Posted: 7 Mar 2020, 4:30:13 UTC - in response to Message 2036433.  
Last modified: 7 Mar 2020, 4:38:21 UTC

Every single WU on every page? It goes on for pages, starting with Feb 29, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=3160&state=4

Yes, until the replica catches up with the moment in time when the tasks in question were deleted.
Remember, Eric did run a script, and the replica was way behind even then.

Just watch what happens in the coming hours with those pages.

But of course, since this has become an endless discussion club with wild and sometimes uninformed theories,
you do not need to believe me, since that would stop the endless discussions and speculations of what's going on :-)
ID: 2036434 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 6,279
United States
Message 2036433 - Posted: 7 Mar 2020, 4:27:36 UTC - in response to Message 2036432.  

Every single WU on every page? It goes on for pages, starting with Feb 29, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=3160&state=4
ID: 2036433 · Report as offensive     Reply Quote
Grumpy Swede
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8170
Credit: 49,849,242
RAC: 147
Sweden
Message 2036432 - Posted: 7 Mar 2020, 4:18:14 UTC - in response to Message 2036430.  
Last modified: 7 Mar 2020, 4:22:38 UTC

I tried it with a Host with many Valids and it's still spinning. I then tried it with a smaller number and reached a large number of WUs dated 18 Feb that all fail to open with the error,
Unable to handle request
can't find workunit

It's just a WAG, but, I would imagine it would be difficult to Assimilate something that can't be found....I Dunno

The replica is over 8000 seconds behind, and you're trying to open something that in reality is no longer there, it's in fact already deleted.

I've seen the same behaviour over the years many times when the replica is behind, so nothing new there.
ID: 2036432 · Report as offensive     Reply Quote
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 50494
Credit: 1,018,363,574
RAC: 2,276
United States
Message 2036431 - Posted: 7 Mar 2020, 4:09:16 UTC - in response to Message 2036430.  


It's just a WAG, but, I would imagine it would be difficult to Assimilate something that can't be found....

LOL, I suppose that would be true.

Meow.
"Learn from yesterday. Live for today. Hope for tomorrow." Albert Einstein
"With cats." kittyman

ID: 2036431 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 6,279
United States
Message 2036430 - Posted: 7 Mar 2020, 4:05:29 UTC - in response to Message 2036424.  

I tried it with a Host with many Valids and it's still spinning. I then tried it with a smaller number and reached a large number of WUs dated 18 Feb that all fail to open with the error,
Unable to handle request
can't find workunit

It's just a WAG, but, I would imagine it would be difficult to Assimilate something that can't be found....I Dunno

See if you can open this inside of a few minutes, https://setiathome.berkeley.edu/results.php?hostid=8097309&offset=3260&state=4
ID: 2036430 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1119
Credit: 48,373,696
RAC: 74,889
Finland
Message 2036424 - Posted: 7 Mar 2020, 3:39:05 UTC - in response to Message 2036421.  

No, they haven't, go to the very end of the listing for my MB Valid tasks and you will find, 10 tasks issued 30 Jan 2020
I can't browse the listing of valid tasks alone. Only the 'all tasks' list really works. Trying to choose anything else just leaves the browser loading the page forever without ever getting anything. Even when I try to click my invalid task list that has only two tasks in it.
ID: 2036424 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1119
Credit: 48,373,696
RAC: 74,889
Finland
Message 2036423 - Posted: 7 Mar 2020, 3:32:55 UTC

This graph shows that 'waiting for validation' on SSP really means 'waiting for validation or assimilation':



The purple curve is the validation queue size as shown on SSP.
The green curve is assimilation queue size form SSP multiplied by 2.2 to scale if from workunits to results.
The blue curve is their difference, i.e. the true number of results waiting for validation.

The blue curve looks very much like the validation queue before the assimilation problem started. Stable around 5 million with a sharp spike just after each weekly downtime when everyone reports their results crunched during the downtime.

We also see that the spike drops down as fast as it climbed up, so validation has worked fine, but simultaneously with this drop, the assimilation curve climbs up and then stays there. So the validated results get stuck in the assimilation queue. The assimilation queue descends down much slower. So slow that the next downtime hits before it has reached the level it had before the previous downtime. So every downtime pushes it higher and higher.
ID: 2036423 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 13797
Credit: 40,757,560
RAC: 151
United Kingdom
Message 2036421 - Posted: 7 Mar 2020, 3:26:10 UTC - in response to Message 2036411.  
Last modified: 7 Mar 2020, 3:30:46 UTC

Assimilation logjam holds tasks in that 15 milllion result SSP slot for about three days only. So the quorum 1 tasks have been assimilated and deleted long ago.

No, they haven't, go to the very end of the listing for my MB Valid tasks and you will find, 10 tasks issued 30 Jan 2020, 1:55:47 UTC thru 30 Jan 2020, 11:30:16 UTC all validated by one result, mine, wingman still not reported.

Deadlines are the 22nd or 23rd March.
ID: 2036421 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1119
Credit: 48,373,696
RAC: 74,889
Finland
Message 2036416 - Posted: 7 Mar 2020, 3:11:37 UTC
Last modified: 7 Mar 2020, 3:12:37 UTC

Assimilation queue size is about three days worth of production and also my tasks disappear from the web site about three days after they have been validated. I seems to work with fifo principle.
ID: 2036416 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 6,279
United States
Message 2036413 - Posted: 7 Mar 2020, 2:50:52 UTC - in response to Message 2036411.  

Are you sure they are less than 3 days old? How could you test how old they are? I was just giving an example of a task that could cause problems, of course, it could be anything causing the problem.
ID: 2036413 · Report as offensive     Reply Quote
Previous · 1 . . . 94 · 95 · 96 · 97 · 98 · 99 · 100 . . . 108 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.