The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 68 · 69 · 70 · 71 · 72 · 73 · 74 . . . 94 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030315 - Posted: 1 Feb 2020, 15:23:19 UTC - in response to Message 2030313.  

Despite the huge disparity in run times between your personal build and your wingmate's CPU offering, that one looks likely to validate when the transitioner reaches it. Others - affected by the faulty drivers - may be affected by the new confidence rules on overflows. But they should be looked at, and processed accordingly.
You might want to look at that workunit one more time - it has already validated. All it needs to do now is go away. Same story with thousands of other workunits in my backlog.
TBar is talking about an other problem, where validation is delayed by some hours.
Do you happen to know when that WU validated - was it on 15 January, yesterday, or five minutes before you posted? It might be an early success of the transitioner scan, but unless you've seen it before, we'll never know. Time of validation might be in the server logs, but it's not recorded anywhere that we can see.
ID: 2030315 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030318 - Posted: 1 Feb 2020, 15:25:57 UTC - in response to Message 2030314.  

As of now it is nearly one day that none of my 14 machines have gotten any new jobs... And yet I find no one posting a similar complaint... WHAT IS IT??? AM I BEING TARGETED??? 4 of my higher machines are only running single GPU jobs and even those are going to finish... WHAT IS GOING ON??? ANYONE???
None of us are getting any tasks - it's not targeted on you. But many of us feel that we've posted everything we can on that subject, and have moved on to trying to think of ways we can help the system to recover.
ID: 2030318 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 2030322 - Posted: 1 Feb 2020, 15:32:09 UTC - in response to Message 2030315.  

Do you happen to know when that WU validated - was it on 15 January, yesterday, or five minutes before you posted? It might be an early success of the transitioner scan, but unless you've seen it before, we'll never know. Time of validation might be in the server logs, but it's not recorded anywhere that we can see.
Unfortunately I don't know, but my validated task count has been bloated for months, so I suspect it was validated on 15 January, and that the problem is not the validators but the assimilators.
Also, as the Munin graphs show, the assimilator queue has been growing (un-)steadily since week 2.
ID: 2030322 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3866
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2030324 - Posted: 1 Feb 2020, 15:35:12 UTC - in response to Message 2030314.  
Last modified: 1 Feb 2020, 15:36:26 UTC

And yet I find no one posting a similar complaint...


I am going to go out on a limb here and suggest that your search was less than complete. :^)

As I noted earlier, keep a backup project that you like in BOINC, a second favorite, enabled but in the project preferences set its task share to zero. (Most of us end up with Einstein@Home.) Then if SETI@Home is out of work, BOINC will download just enough work to keep your CPU/GPU(s) busy and no cache. That way if work appears here, you'll get it and not be overloaded with backup project work.
ID: 2030324 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2030326 - Posted: 1 Feb 2020, 15:37:51 UTC
Last modified: 1 Feb 2020, 15:41:12 UTC

I've noticed the number of Valid results on my Hosts have risen by dozens in the past 30 minutes, so, I assume 'forgotten' tasks are now validating. The page I was looking at is also showing tasks have been validated over the past hour, you just have to click on the work unit as the page still shows most of them as Completed, waiting for validation. Once the work unit is opened the tasks are now being shown as Completed and validated.
ID: 2030326 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030329 - Posted: 1 Feb 2020, 16:04:52 UTC - in response to Message 2030326.  

Or, remember that the task lists are driven off the replica database, which is now shown as being almost two hours behind the master. If different pages are driven off different versions of the database, there could easily be a discrepancy between them.
ID: 2030329 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030334 - Posted: 1 Feb 2020, 16:49:34 UTC - in response to Message 2030258.  

Only finger of suspicion I can see right now is 'Driver version 432.00' on Windows 10. And he's returned about 80 good tasks - all of a similar age - in the last day. Did he realise that everything was stuck and downgrade the driver? Could all of this be down to Microsoft (auto update), NVidia (bad driver), and our own long deadlines?

I've been seeing lots of these hosts with this very strange version number (432.00). That is not an official Nvidia version number as Nvidia's always has a XXX.dd point release number. This looks like it might be a Windows derived version or something. It is also ABOVE the recommended version number cutoff to avoid the stalled VHAR tasks which I'm pretty sure is the 431.60 standard version.

If a ton of Windows users got automatically updated on their Nvidia driver by Microsoft and then tried to run this huge amount of Arecibo work we have had over the past month, it could be another reason why the database is so bloated with resends from inconclusives.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030334 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030337 - Posted: 1 Feb 2020, 17:04:48 UTC - in response to Message 2030334.  

Keith - please check message 2030335. I've sent you a PM as well.
ID: 2030337 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2030338 - Posted: 1 Feb 2020, 17:11:44 UTC - in response to Message 2030329.  

Or, remember that the task lists are driven off the replica database, which is now shown as being almost two hours behind the master. If different pages are driven off different versions of the database, there could easily be a discrepancy between them.
Stuff can also be updated between you opening the list page and the individual task.
ID: 2030338 · Report as offensive
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 2030343 - Posted: 1 Feb 2020, 17:54:15 UTC

finally received some new work but, unfortunately, they were BLC 35 and were all noise bombs
ID: 2030343 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2030345 - Posted: 1 Feb 2020, 18:00:10 UTC

Just started getting "Scheduler request failed: Timeout was reached" notices.
ID: 2030345 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 2030346 - Posted: 1 Feb 2020, 18:04:46 UTC

And "Scheduler request failed: Server returned nothing (no headers, no data)"
ID: 2030346 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2030350 - Posted: 1 Feb 2020, 18:33:00 UTC

What if the aliens are gumming up the system because we're close to finding them? Hmmm.
ID: 2030350 · Report as offensive
Profile HAL
Avatar

Send message
Joined: 18 May 99
Posts: 535
Credit: 8,246,955
RAC: 3
United States
Message 2030351 - Posted: 1 Feb 2020, 18:41:26 UTC

Out of work for 4 Raspberry Pis, a laptop, and a dedicated Linux SETI project computer. Shutting them down and going on to other projects until they fix things ... will return then.
I'm putting myself to the fullest possible use, which is all, I think, that any conscious entity can ever hope to do.
ID: 2030351 · Report as offensive
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 2030353 - Posted: 1 Feb 2020, 18:44:21 UTC

now getting this:

2/1/2020 12:18:55 PM | SETI@home | Scheduler request completed: got 0 new tasks
2/1/2020 12:18:55 PM | SETI@home | Server can't open database
ID: 2030353 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030362 - Posted: 1 Feb 2020, 19:28:53 UTC

All I seem to get is early overflow resends from BLC35.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030362 · Report as offensive
Miklos M.

Send message
Joined: 5 May 99
Posts: 955
Credit: 136,115,648
RAC: 73
Hungary
Message 2030370 - Posted: 1 Feb 2020, 20:07:44 UTC

Only getting a few tasks for my 2080's, but mostly none. Watching RAC dropping.
ID: 2030370 · Report as offensive
Profile Peter

Send message
Joined: 12 Feb 14
Posts: 19
Credit: 1,385,738
RAC: 6
Slovakia
Message 2030371 - Posted: 1 Feb 2020, 20:18:52 UTC
Last modified: 1 Feb 2020, 20:19:20 UTC

Since morning of my local time, almost no task until now :(
ID: 2030371 · Report as offensive
Profile Schatten

Send message
Joined: 12 Oct 02
Posts: 18
Credit: 14,047,388
RAC: 9
Germany
Message 2030372 - Posted: 1 Feb 2020, 20:23:16 UTC

We are all on the same boat. I hope the situation changes soon.
ID: 2030372 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030373 - Posted: 1 Feb 2020, 20:32:56 UTC

I am seeing no reductions in the size of the database with all the task counts at all time highs. Nothing is going to happen until we fall below the magic 20M number.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030373 · Report as offensive
Previous · 1 . . . 68 · 69 · 70 · 71 · 72 · 73 · 74 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.