The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 37 · 38 · 39 · 40 · 41 · 42 · 43 . . . 94 · Next

AuthorMessage
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028330 - Posted: 18 Jan 2020, 15:47:30 UTC
Last modified: 18 Jan 2020, 16:01:48 UTC

Replica continues to climb in time behind master. Up to over 10hrs now

I’m also not getting very many tasks. Most requests end up with project has no tasks. Occasionally I’ll get 1 task. Nothing substantial for several hours now.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028330 · Report as offensive
Lewbylews6

Send message
Joined: 17 Jan 20
Posts: 1
Credit: 29,096
RAC: 0
Message 2028331 - Posted: 18 Jan 2020, 15:54:23 UTC

On my account it states that i currently have 232 tasks in progress but my BOINC Manager isnt processing anything, when i request an update the most recent log in my Event Log says the following;

18/01/2020 15:48:59 | SETI@home | update requested by user
18/01/2020 15:49:01 | SETI@home | Sending scheduler request: Requested by user.
18/01/2020 15:49:01 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
18/01/2020 15:49:04 | SETI@home | Scheduler request completed: got 0 new tasks
18/01/2020 15:49:04 | SETI@home | Project has no tasks available

Can anyone help in getting me more work to process?

This is from my account;
All tasks for Lewbylews6

State: All (241) · In progress (232) · Validation pending (0) · Validation inconclusive (0) · Valid (9) · Invalid (0) · Error (0)
Application: All (241) · AstroPulse v7 (0) · SETI@home v8 (241)

Thanks,
L
ID: 2028331 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 2028334 - Posted: 18 Jan 2020, 16:07:54 UTC - in response to Message 2028331.  

We're all in that boat. The project has problems with its servers, you can check that on this page: https://setiathome.berkeley.edu/show_server_status.php.
The Results ready to send are incredibly low, while all the other numbers are high, so something is wrong with the back-end and we'll just have to wait until that's fixed.
ID: 2028334 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2028339 - Posted: 18 Jan 2020, 16:37:31 UTC - in response to Message 2028291.  

Although no new workunits are being split, I've been getting substantial numbers of replacement _2 tasks. That implies that the validators are working, and failing substantial numbers of matches (it may be that many of my _2s turn out to be overflows, and vanish in a flash).

Somebody mentioned 'initial replication'. I have a dim memory that we discussed this years ago, and found that the number should more accurately be called 'current replication' - the figure you see may not necessarily be the true initial number. It would be hard to check that until the databases sync up, and we can check newly-split work (ha!) in real time.

Edit - see Initial Replication of FOUR?? Any comments

Wow, that thread goes back a long way. And the issue is with the linguistical choice of the proper definition of "initial".

In my previous example of WU https://setiathome.berkeley.edu/workunit.php?wuid=3756812713 it is apparent that initial replication was 2 as evidenced by the sequential WU numbers originally generated. Only when the second original was not returned in time did the replacement get generated and now the IR number was bumped. So it looks like the code that generates the IR is ancient and has been discussed in antiquity.

So I take back my assertion that the recent AMD code change instantly created 3 sequential WU's in initial replication. Now just to wait and see if all of the recent changes have ANY effect on the database size.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2028339 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2028341 - Posted: 18 Jan 2020, 16:42:54 UTC - in response to Message 2028339.  

Wow, that thread goes back a long way.
I have a dim memory, but this message board remembers everything. And it has a search tool!
ID: 2028341 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2028343 - Posted: 18 Jan 2020, 16:59:22 UTC - in response to Message 2028330.  

Replica continues to climb in time behind master. Up to over 10hrs now

I’m also not getting very many tasks. Most requests end up with project has no tasks. Occasionally I’ll get 1 task. Nothing substantial for several hours now.


. . Same here, this machine was completely OOW on the GPUs so I had to cannibalise the CPU queue. But it is still "no tasks available" and will be OOW again very soon ....

Stephen

:(
ID: 2028343 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2028349 - Posted: 18 Jan 2020, 17:18:30 UTC

Maybe the colder weather is slowing the speed of the electrons down?

Tom
A proud member of the OFA (Old Farts Association).
ID: 2028349 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2028350 - Posted: 18 Jan 2020, 17:21:33 UTC - in response to Message 2028349.  

Maybe the colder weather is slowing the speed of the electrons down?

Tom

Hi Tom,

In California? I would think not. ;)

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2028350 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2028351 - Posted: 18 Jan 2020, 17:25:15 UTC

Greetings,

My last post took about a minute or longer to post. :|

Did our weekly weekend fiasco start a bit early this week? ;) GPU OOW and CPU about 15 hours worth of work.

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2028351 · Report as offensive
Brandaan

Send message
Joined: 5 Jan 20
Posts: 17
Credit: 384,179
RAC: 0
Belgium
Message 2028352 - Posted: 18 Jan 2020, 17:47:01 UTC - in response to Message 2028351.  

got some work about six hours ago since then nothing and unable to report any completed work I feel like they will need to do some real cleaning up and figure out a solution.
ID: 2028352 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2028353 - Posted: 18 Jan 2020, 17:54:02 UTC
Last modified: 18 Jan 2020, 17:56:52 UTC

These issues have been ongoing since the day-long Tuesday outrage... it's as if the system has been overloaded since then just trying to catch up with the demand the extended outage imposed, with scheduler timeouts, slow forums, up/downloads failing and never enough work to keep my faster computers from regularly running out of it.

All I can hope for is that the next Tuesday outrage in two days is of normal duration and resets everything so these issues are put to rest.
ID: 2028353 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2028354 - Posted: 18 Jan 2020, 18:01:45 UTC - in response to Message 2028353.  

It might even be worth declaring a splitter holiday and allowing work allocated before the outage to report, validate, assimilate, delete and purge before starting to fill up the database again.
ID: 2028354 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 2028355 - Posted: 18 Jan 2020, 18:09:18 UTC

Why is it all of a sudden forbidden for people to ask questions (about the work related troubles) in the help desk forums? Why do they all need to be moved into this thread?

Then ask Fred to kill the help desk forums if you don't want people to post there. Or lighten up, not everyone knows there's all these forums at the front, some don't care, others refuse to post here. Leave them be in the help desk, what are those forums otherwise for?
ID: 2028355 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2028361 - Posted: 18 Jan 2020, 18:54:37 UTC - in response to Message 2028354.  
Last modified: 18 Jan 2020, 18:55:35 UTC

It might even be worth declaring a splitter holiday and allowing work allocated before the outage to report, validate, assimilate, delete and purge before starting to fill up the database again.


Coincidentally (or not lol) Dr. Korpela has updated the News feed:

For a couple of reasons, the result table has grown to the point where it no longer fits in main memory. That has been slowing the validators and assimilators, which is causing the result table to grow further.

We'd like to get it down to a manageable size before our Tuesday outage. To that end we are throttling work generation to a rate at which the table size is shrinking. We hope that this rate will increase as the table gets smaller.

So for the next few days work will be hard to come by (but not zero).


Just what we were asking for... a simple brief update if there are issues.
ID: 2028361 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2028362 - Posted: 18 Jan 2020, 18:54:47 UTC
Last modified: 18 Jan 2020, 18:56:33 UTC

I'm now getting messages that the project is down for maintenance. Looks like someone came in on their weekend to kick the servers.

*edit* Just saw Mr. Kevvy's message right after I posted. Great to hear they're on the job. And gave us a brief update. :)
ID: 2028362 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2028365 - Posted: 18 Jan 2020, 19:06:05 UTC - in response to Message 2028361.  

It might even be worth declaring a splitter holiday and allowing work allocated before the outage to report, validate, assimilate, delete and purge before starting to fill up the database again.
Coincidentally (or not lol) Dr. Korpela has updated the News feed:
As far as I'm concerned, it's a coincidence. THERE WAS NO COLLUSION (to quote somebody-or-other).

But I did post in public, and somebody might be reading.
ID: 2028365 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2028369 - Posted: 18 Jan 2020, 19:41:58 UTC - in response to Message 2028361.  

It might even be worth declaring a splitter holiday and allowing work allocated before the outage to report, validate, assimilate, delete and purge before starting to fill up the database again.


Coincidentally (or not lol) Dr. Korpela has updated the News feed:

For a couple of reasons, the result table has grown to the point where it no longer fits in main memory. That has been slowing the validators and assimilators, which is causing the result table to grow further.

We'd like to get it down to a manageable size before our Tuesday outage. To that end we are throttling work generation to a rate at which the table size is shrinking. We hope that this rate will increase as the table gets smaller.

So for the next few days work will be hard to come by (but not zero).


Just what we were asking for... a simple brief update if there are issues.


memory upgrade fundraiser?

if its already maxed out.... server upgrade fundraiser?

:)
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2028369 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2028374 - Posted: 18 Jan 2020, 20:10:52 UTC - in response to Message 2028369.  
Last modified: 18 Jan 2020, 20:11:10 UTC


memory upgrade fundraiser?

if its already maxed out.... server upgrade fundraiser?

:)


I think you could probably get a fund raiser started on that. Maybe talk to the guy who originated our HD upgrade for the Seti project.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2028374 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2028383 - Posted: 18 Jan 2020, 20:29:58 UTC

Its alive!
I actually caught 6 gpu tasks processing in the middle of the Einstein @ Home stuff I am running. And I actually have 11 Set@Home cpu tasks running. The gpus have all reverted to the backup project. Again.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2028383 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 2028384 - Posted: 18 Jan 2020, 20:41:12 UTC - in response to Message 2028305.  

Reducing the server side limits doesn't seem to have helped things much, if at all.
The effect of reduce the limits will bee see in days (or weeks) only due the way SETI works.
The In progress numbers were already 2 million lower. With the Splitter problems & Scheduler problems they were even lower than that when i went to bed, yet it's having no impact on all the other backlogs which indicates there are other system issues at play than just the amount of work in progress.
Grant
Darwin NT
ID: 2028384 · Report as offensive
Previous · 1 . . . 37 · 38 · 39 · 40 · 41 · 42 · 43 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.