The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 70 · 71 · 72 · 73 · 74 · 75 · 76 . . . 94 · Next

AuthorMessage
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34872
Credit: 261,360,520
RAC: 489
Australia
Message 2030413 - Posted: 1 Feb 2020, 23:59:28 UTC

Well my 3570K rig is now idling while my 2500K loads up.

Cheers.
ID: 2030413 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030417 - Posted: 2 Feb 2020, 0:21:07 UTC

Wow, there actually has been progress in reducing Results returned and awaiting validation from whence I looked last. Down about 600K from the previous snapshot from a several hours ago.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030417 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 2030418 - Posted: 2 Feb 2020, 0:24:04 UTC - in response to Message 2030417.  

Wow, there actually has been progress in reducing Results returned and awaiting validation from whence I looked last. Down about 600K from the previous snapshot from a several hours ago.
The result of no new work for an extended period.
Now that new work is going out, say goodbye to that progress.
Grant
Darwin NT
ID: 2030418 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 2030419 - Posted: 2 Feb 2020, 0:33:55 UTC
Last modified: 2 Feb 2020, 0:47:45 UTC

Well, it's gone from downloads timing out almost instantly, to sticky downloads.
5min+ of Elapsed time counting away, and not so much as a bit transferred on this latest batch. And the usual fix for this problem (Suspend & then re-enable network activity) isn't having any effect.

Even the uploads are having more issues than usual with the instant timeouts.


For Eric's to do list.
Once they get the database issues sorted out, then it's time to work on the download server & upload server issues.

Edit 10+ minutes, and nothing transferred.

12 min for things to finally start, 5 more minutes of suspending & enabling network access to get them all downloaded.
Grant
Darwin NT
ID: 2030419 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2030421 - Posted: 2 Feb 2020, 0:47:48 UTC - in response to Message 2030349.  

Ok, I've had it. Enough is enough.....



. . Yep it does get that way ...

. . I'm taking the time to try and rebuild/update my Core2 Duo rig.

. . Later versions of Linux and BOINC and special sauce. It's going to be a headache because the existing version is a repository BOINC in the system part of the disk. So moving it so a nice quicky SSD in the user part of the new drive will be a headache, but if I follow the Juan method I should be OK.

. . So in due course; the fullness of time; further down the track; sometime in the distant future ... I should be able to return this unit to SETI when there is more work being sent out and with a neater, slightly quicker system. This optimism is unlike me but what the heck ...

Stephen

<shrug>
ID: 2030421 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2030423 - Posted: 2 Feb 2020, 0:50:46 UTC - in response to Message 2030386.  

Running empty again. Shooting down the host to save some electric power.


. . Following Grumpy's example then ? :)

Stephen
ID: 2030423 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2030424 - Posted: 2 Feb 2020, 0:53:01 UTC - in response to Message 2030423.  

Running empty again. Shooting down the host to save some electric power.


. . Following Grumpy's example then ? :)

Stephen

It seems a bit extreme unless you wanted to build a new system anyway. ;)
ID: 2030424 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2030425 - Posted: 2 Feb 2020, 0:55:39 UTC - in response to Message 2030409.  

Yeah Matt is really missed in times like these and he would've had those MBv7's put to bed long ago.
Wait, I've been out of the loop for a while. Matt left?
Yeah Matt has been over at the Breakthrough Listen project for a couple of years now.
Or maybe just at the Breakthrough Listen office down on Campus - that's where I found him back in July.

https://i.imgur.com/Lninw9X.jpg
And now and again out at Parkes.

Cheers.


. . Not recently I don't think :(

Stephen

? ?
ID: 2030425 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2030426 - Posted: 2 Feb 2020, 0:56:48 UTC - in response to Message 2030410.  

A word of warning- if you do manage to sore some work, be prepared to have to Retry pending transfers a few 100 (it feels like a thousand) times.
The download servers are borked as well as everything else.


. . been there, done that, over and over and over and ... oh what the heck ...

Stephen

:(
ID: 2030426 · Report as offensive
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 2030433 - Posted: 2 Feb 2020, 1:32:58 UTC

Just ran across something new.

I looked at my pending page and found a workunit at randon.
It said "completed, waiting validation.

But when I clicked on it to see the status, I found this


https://setiathome.berkeley.edu/workunit.php?wuid=3863183873

Reported by 3 units and validated.

Something is borked.
Dave

ID: 2030433 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11362
Credit: 29,581,041
RAC: 66
United States
Message 2030436 - Posted: 2 Feb 2020, 1:59:53 UTC - in response to Message 2030433.  

That's not unusual. The first 2 tasks did not match but were close. The 3rd was close enough to validate the first 2.
ID: 2030436 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030437 - Posted: 2 Feb 2020, 2:01:52 UTC - in response to Message 2030433.  

Reported by 3 units and validated.

Something is borked.

Side effect of the minimum 3 quorum for early overflows and the flakey AMD card problem.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030437 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2030439 - Posted: 2 Feb 2020, 2:23:01 UTC

I am getting a very small flow of Seti@Home tasks.

So I have NNTed both E@H and WCG in hopes of sucking more down :)
Some of the weather tasks run more than a half a dayso it will take a while to widdle about half of them down.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2030439 · Report as offensive
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 2030459 - Posted: 2 Feb 2020, 3:34:33 UTC - in response to Message 2030433.  

I guess I was not clear.

Looking at my pending page, it says that the workunit is "completed, waiting validation"

YET

looking at the workunit, it says validated.

Just ran across something new.

I looked at my pending page and found a workunit at randon.
It said "completed, waiting validation.

But when I clicked on it to see the status, I found this


https://setiathome.berkeley.edu/workunit.php?wuid=3863183873

Reported by 3 units and validated.

Something is borked.

Dave

ID: 2030459 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2030462 - Posted: 2 Feb 2020, 3:57:11 UTC - in response to Message 2030433.  
Last modified: 2 Feb 2020, 4:02:33 UTC

Just ran across something new.

I looked at my pending page and found a workunit at randon.
It said "completed, waiting validation.

But when I clicked on it to see the status, I found this


https://setiathome.berkeley.edu/workunit.php?wuid=3863183873

Reported by 3 units and validated.

Something is borked.


. . Not sure what your issue is with that WU. Normally a WU will linger in the system for approx 24 hours after validation, currently with the problems everyone has been discussing and are very concerned about they are hanging about for much longer. That unit has only just validated so I would not expect to wave goodbye to it for a day or 3 yet ...

. . OH, maybe you missed the discussion of the change that was introduced because of NAVI cards whereby overflow results are being triple checked for validation.

{edit}
. . The misleading listing on the your stats page may be due to the lag with the replica database??

Stephen

:(
ID: 2030462 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030463 - Posted: 2 Feb 2020, 3:58:40 UTC - in response to Message 2030459.  

Yes, things are borked. That is what this thread has been discussing for the past two weeks. Also the replica database is 8000 seconds behind. So what you see on your page is already 2 hours old.

There is nothing normal about the current situation so no reason to expect normal classifications. I would just not worry about it since there is nothing you can do on your end to change anything.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030463 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2030464 - Posted: 2 Feb 2020, 4:33:30 UTC

There will also be weirdness from the replica db not being caught up with the main db.
ID: 2030464 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 2030465 - Posted: 2 Feb 2020, 4:42:57 UTC

Tired of this current snafu. Shut my crunchers down and will hold off until SETI has a couple of days of normal work flow. Good crunching, all.
ID: 2030465 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2030468 - Posted: 2 Feb 2020, 5:09:37 UTC - in response to Message 2030373.  

I am seeing no reductions in the size of the database with all the task counts at all time highs. Nothing is going to happen until we fall below the magic 20M number.
We fell below that at 02:50 UTC, but nothing has happened yet at the splitters.

Assimilation queue seems to be slowly going down now. Two steps down, one up.
ID: 2030468 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 2030471 - Posted: 2 Feb 2020, 5:20:08 UTC - in response to Message 2030468.  

Assimilation queue seems to be slowly going down now. Two steps down, one up.
Until we can get "Results returned and awaiting validation" down to around 3.5 million (given the present amount of Work in progress- so 7 million to go), and the "Workunits waiting for assimilation" back down to 0 (3.7 million to go), any new work just causes those numbers to climb.
And ideally we'd want the purge numbers to be within about 500k of the In progress numbers (i think that was the general ball park when things were working).
Grant
Darwin NT
ID: 2030471 · Report as offensive
Previous · 1 . . . 70 · 71 · 72 · 73 · 74 · 75 · 76 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.