The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 67 · 68 · 69 · 70 · 71 · 72 · 73 . . . 94 · Next

AuthorMessage
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2030222 - Posted: 1 Feb 2020, 1:53:32 UTC

What WUs are coming through here are all crap short ones. Nothing to heat the house. I would prefer some resends instead of these crap new ones.
ID: 2030222 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2030223 - Posted: 1 Feb 2020, 1:54:20 UTC - in response to Message 2030220.  

Maybe this will be a "dry" Weekend (eg. no drinking?)


Called that one hours ago. :^)
ID: 2030223 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13750
Credit: 208,696,464
RAC: 304
Australia
Message 2030227 - Posted: 1 Feb 2020, 2:12:49 UTC - in response to Message 2030220.  
Last modified: 1 Feb 2020, 2:13:26 UTC

Fri 31 Jan 2020 07:44:42 PM CST | SETI@home | Project has no tasks available
Splitters have been down for a few hours now.
Getting the odd resend here & there (many of which are just noise bombs).
Grant
Darwin NT
ID: 2030227 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 2030237 - Posted: 1 Feb 2020, 5:04:33 UTC - in response to Message 2030232.  

There have been a lot of ups and downs lately, but it seems that the downs are winning.

So, is this going to be the new normal for the weekends now?


I don't buy computers, I build them!!
ID: 2030237 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2030238 - Posted: 1 Feb 2020, 5:11:03 UTC - in response to Message 2030237.  

So, is this going to be the new normal for the weekends now?
Doesn't look like that. Last week the weekend was the only time of the week my computers could consistently keep their caches full.
ID: 2030238 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2030240 - Posted: 1 Feb 2020, 6:09:21 UTC

We are drifting further and further from a situation where the splitters could start running. There's now almost 21 million results in the db and it is still rising despite of splitters having been stopped for 10 hours. So much resends...

About 9 milllion of those 21 million results are probably being held hostage by the ever growing assimilation queue.
ID: 2030240 · Report as offensive
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 2030242 - Posted: 1 Feb 2020, 6:19:36 UTC

Interesting

"Results in the field" is DROPING

"Results received in last hour" is SLOWING

AND

"Results returned and awaiting validation" is RISING

Doesn't seem like "awaiting validation" should be going up if "results received" is slowing.
Dave

ID: 2030242 · Report as offensive
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 2030250 - Posted: 1 Feb 2020, 7:53:07 UTC - in response to Message 2030249.  

It sure did...
Dave

ID: 2030250 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13750
Credit: 208,696,464
RAC: 304
Australia
Message 2030255 - Posted: 1 Feb 2020, 8:30:12 UTC

Scheduler was MIA for a while there, but now it's back to "Project has no tasks available"
Grant
Darwin NT
ID: 2030255 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030258 - Posted: 1 Feb 2020, 8:50:08 UTC

OK, we're back - it all went very black there for a few minutes, didn't it?

I'm trying to think through what could possibly have caused all this database bloat. My suggestion of re-running the transitioner to pick up tasks which should have validated, but didn't, seems to have flushed out a few - but not as many as I expected. So what else has gone wrong?

First oddity - WU 3781186004. The middle guy - who seems a perfectly respectable cruncher, with a decent rig and a team member, got his task on 9 December (soon after the limits were raised) - and has done nothing with it. Why? He's returning good work, quickly, now, and has lots of credit at other projects.

Only finger of suspicion I can see right now is 'Driver version 432.00' on Windows 10. And he's returned about 80 good tasks - all of a similar age - in the last day. Did he realise that everything was stuck and downgrade the driver? Could all of this be down to Microsoft (auto update), NVidia (bad driver), and our own long deadlines?

Preserving
8317964641	8873167	9 Dec 2019, 10:04:20 UTC	10 Dec 2019, 4:31:07 UTC	Completed and validated	253.39	128.74	126.77	SETI@home v8
Anonymous platform (NVIDIA GPU)
8317964642	8272778	9 Dec 2019, 10:04:17 UTC	31 Jan 2020, 15:04:19 UTC	Not started by deadline - canceled	0.00	0.00	---	SETI@home v8 v8.22 (opencl_nvidia_SoG)
windows_intelx86
8497103820	8313133	31 Jan 2020, 15:04:03 UTC	31 Jan 2020, 22:52:18 UTC	Completed and validated	765.91	709.91	126.77	SETI@home v8
Anonymous platform (NVIDIA GPU)
ID: 2030258 · Report as offensive
BetelgeuseFive Project Donor
Volunteer tester

Send message
Joined: 6 Jul 99
Posts: 158
Credit: 17,117,787
RAC: 19
Netherlands
Message 2030262 - Posted: 1 Feb 2020, 9:50:12 UTC - in response to Message 2030260.  
Last modified: 1 Feb 2020, 9:55:09 UTC

Also got a bunch, but not all of them resends. Even got an Astropulse (_0).
Looks like things are moving again ...

Edit: downloads are extremely slow

Tom
ID: 2030262 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030264 - Posted: 1 Feb 2020, 10:09:37 UTC
Last modified: 1 Feb 2020, 10:45:41 UTC

And I got a BLC35_1 about an hour ago. The replica has fallen behind again, so I can't (yet) see when it was split - but I hope it wasn't recently.

Edit - it's gone now. 36 seconds on an old, tired, slow CPU. We really should stop splitting these noisy tapes while we're still in deep doo-doo.
ID: 2030264 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13750
Credit: 208,696,464
RAC: 304
Australia
Message 2030268 - Posted: 1 Feb 2020, 11:09:11 UTC - in response to Message 2030260.  

Just got a bunch of WU's now, but all are resends _2 or higher.
But downloading them, now that is another thing :-)
I managed to score 50 resends on one of my systems. When they finally downloaded, all done in under 4 minutes. Only 3 of them weren't noise bombs.
Grant
Darwin NT
ID: 2030268 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13750
Credit: 208,696,464
RAC: 304
Australia
Message 2030270 - Posted: 1 Feb 2020, 11:15:59 UTC

And to add to the issues, it appears that "Result files waiting for deletion" has developed issues for both MB & AP. Both have gone from effectively 0 to over 510k & 13.7k respectively.
Grant
Darwin NT
ID: 2030270 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2030277 - Posted: 1 Feb 2020, 11:56:19 UTC
Last modified: 1 Feb 2020, 12:12:29 UTC

My Inconclusive results are going up too, even though I've only had a handful of Tasks since last night. Last night I had a large number of Inconclusive results that said 'minimum quorum 1' and only listed a single Inconclusive host. I didn't see how a single Inconclusive host task could ever validate. Now, it's very difficult to bring up my Inconclusive tasks lists, but, it seems those tasks are now listed as; https://setiathome.berkeley.edu/workunit.php?wuid=3862758806
minimum quorum 1
initial replication 3
   Task    Computer            Sent                  Time reported                 Status        Runtime CPUtime Credit             Application
8495599283  1473578  31 Jan 2020, 5:02:48 UTC  31 Jan 2020, 21:47:15 UTC  Completed and validated  15.36  12.61   3.59  SETI@home v8 v8.20 (opencl_ati5_mac) x86_64-apple-darwin
8498611906  6796479   1 Feb 2020, 3:00:50 UTC   1 Feb 2020, 4:00:03 UTC   Completed and validated   4.10   1.93   3.59  SETI@home v8 v8.11 (cuda42_mac) x86_64-apple-darwin
8498669733  8673543   1 Feb 2020, 4:01:52 UTC   1 Feb 2020, 5:29:49 UTC   Completed and validated  15.11  13.09   3.59  SETI@home v8 v8.22 (opencl_nvidia_SoG)
So, the single host are now triple hosts, but they are still just sitting there with a number of them showing one or two Completed, waiting for validation hosts, and some with one or two Inconclusive hosts.
ID: 2030277 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2030278 - Posted: 1 Feb 2020, 11:58:34 UTC - in response to Message 2030268.  

Just got a bunch of WU's now, but all are resends _2 or higher.
But downloading them, now that is another thing :-)
I managed to score 50 resends on one of my systems. When they finally downloaded, all done in under 4 minutes. Only 3 of them weren't noise bombs.

How does one tell if the jobs are resends?
ID: 2030278 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030282 - Posted: 1 Feb 2020, 12:11:44 UTC - in response to Message 2030278.  

How does one tell if the jobs are resends?
In this case, we're talking about new - extra - replications of existing tasks , not about resending lost tasks.

You tell from the task name, as shown in BOINC Manager. You need to be able to see the very end of the name - so use advanced view, and make the column as wide as you need. The last two characters are as follows:

_0 - always the first time a workunit has been sent to a cruncher. Every WU has a task _0
_1 - in normal times, usually created at the same as _0 and sent out straight away. At the moment, some are being created and distributed later.
_2 onwards - probably a new replication, because the first two failed to validate (either because they returned different answers, or one of them never returned at all). But again, just at the moment, some results are untrustworthy, so _2 may be created for a safety-check.
ID: 2030282 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030283 - Posted: 1 Feb 2020, 12:18:29 UTC - in response to Message 2030264.  
Last modified: 1 Feb 2020, 12:20:00 UTC

And I got a BLC35_1 about an hour ago. The replica has fallen behind again, so I can't (yet) see when it was split - but I hope it wasn't recently.

Edit - it's gone now. 36 seconds on an old, tired, slow CPU. We really should stop splitting these noisy tapes while we're still in deep doo-doo.
OK, it's visible now - WU 3863162100. It turns out to be a replication for a task created as a singleton yesterday morning. And because it overflowed, it's been sent to a third host for checking. And it's gone to a host running Windows 7 - no driver problems. Phew.
ID: 2030283 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2030284 - Posted: 1 Feb 2020, 12:20:05 UTC - in response to Message 2030282.  

Thanks, Richard!
ID: 2030284 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2030288 - Posted: 1 Feb 2020, 13:11:41 UTC

Sat 01 Feb 2020 07:08:16 AM CST | SETI@home | Scheduler request completed: got 0 new tasks
Sat 01 Feb 2020 07:08:16 AM CST | SETI@home | [sched_op] Server version 709
Sat 01 Feb 2020 07:08:16 AM CST | SETI@home | Project has no tasks available


The good news is my system(s) are happily crunching along (E@H and WCG). The bad news is "the dry spell" is really dry..... ;)

Tom
A proud member of the OFA (Old Farts Association).
ID: 2030288 · Report as offensive
Previous · 1 . . . 67 · 68 · 69 · 70 · 71 · 72 · 73 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.