Panic Mode On (114) Server Problems?

Message boards : Number crunching : Panic Mode On (114) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 43 · 44 · 45 · 46 · 47 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11657
Credit: 174,029,673
RAC: 119,776
Australia
Message 1980053 - Posted: 12 Feb 2019, 11:29:57 UTC
Last modified: 12 Feb 2019, 11:33:58 UTC

I see the problems are continuing.

From the looks of the Ready-to-send buffer i'm thinking there's some process that's gotten away & hogging up CPU resources.
The Splitters run on after their usual shut down point, then take ages to restart after going well below their usual start point. Scheduler allocation of work is sporadic, at best. At worst, it's hours between getting any work allocated (regardless of how much is there to be had).

The Deleters have finally cleared their backlog, but now the Purgers are struggling to clear theirs. The amount of work returned per hour is quite low (compared to past times, with much larger volumes & when the system was able to keep up).
A lot of the issues seemed to resurface when the Replica came back on line. Maybe see how the system goes with the Replica off line for 12-24 hours? See if it can recover, or is there something else at play?


Edit- and even viewing my limited number of Tasks is becoming almost impossible (although due to the deleter then purger backlogs those numbers are way higher than usual).
Grant
Darwin NT
ID: 1980053 · Report as offensive
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4698
Credit: 151,061,759
RAC: 240,050
Australia
Message 1980055 - Posted: 12 Feb 2019, 11:45:38 UTC - in response to Message 1980053.  

. . I'm surprised I have work on my rigs. Since I got home every work request has been "no tasks" :(

Stephen

:(
ID: 1980055 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 573,020,100
RAC: 911,194
Canada
Message 1980078 - Posted: 12 Feb 2019, 13:02:02 UTC - in response to Message 1980053.  

It is interesting to note that the assimilation backlog (which has bee averaging 400k lately) started when the replica was brought back online.
The Assimilation/Replica graphs look very similar over time.


ID: 1980078 · Report as offensive
Profile Bill Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 30 Nov 05
Posts: 248
Credit: 3,959,779
RAC: 4,803
United States
Message 1980080 - Posted: 13 Feb 2019, 13:23:16 UTC

Its alive!!!!

Well, the forums at least.
Seti@home classic: 1,456 results, 1.613 years CPU time
ID: 1980080 · Report as offensive
Lane42

Send message
Joined: 17 May 99
Posts: 55
Credit: 221,433,871
RAC: 153,125
United States
Message 1980081 - Posted: 13 Feb 2019, 13:32:43 UTC - in response to Message 1980080.  

(:
ID: 1980081 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 568
Credit: 1,950,441
RAC: 853
United States
Message 1980089 - Posted: 13 Feb 2019, 13:55:39 UTC

It is good to see the system back. Thanks to the people who work so hard to keep this system going. It is going to be a rough recovery after such a long down time.
ID: 1980089 · Report as offensive
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41577
Credit: 41,999,167
RAC: 646
Message 1980094 - Posted: 13 Feb 2019, 14:13:27 UTC

Seti@home is what it is. Old kit, minimal staff, no money.

The Lab staff are Scientists and Astronomers, researching into many UCB projects, they are not supposed to be spending their time being computer technicians.

We simply have to accept these outages, unpalatable though they may be.
ID: 1980094 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 573,020,100
RAC: 911,194
Canada
Message 1980097 - Posted: 13 Feb 2019, 14:20:22 UTC
Last modified: 13 Feb 2019, 14:31:08 UTC

Woo Hoo Finally!
I received a single teaser task.
LMAO
EDIT: That fun lasted a whole 79 seconds :(
ID: 1980097 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17795
Credit: 407,146,831
RAC: 147,016
United Kingdom
Message 1980101 - Posted: 13 Feb 2019, 14:23:56 UTC

Place a big mark in your diary, and frame the task ;-)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1980101 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 383
Credit: 211,195,442
RAC: 100,065
United States
Message 1980105 - Posted: 13 Feb 2019, 14:32:55 UTC
Last modified: 13 Feb 2019, 15:00:06 UTC

3,866,416 ready to send ??!!

ok back to a normal 808,351
ID: 1980105 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1423
Credit: 104,100,396
RAC: 40,818
United States
Message 1980108 - Posted: 13 Feb 2019, 14:39:47 UTC

Even though I finally got through to the scheduler and got no tasks, should this be a concern --

02/13/2019 09:34:19 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
02/13/2019 09:34:19 | SETI@home | [http] HTTP_OP::init_post(): http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
02/13/2019 09:34:19 | SETI@home | [http] HTTP_OP::libcurl_exec(): ca-bundle set
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Connection 723 seems to be dead!
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Closing connection 723
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS alert, Client hello (1):
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Connection 721 seems to be dead!
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Closing connection 721
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Connection 722 seems to be dead!
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Closing connection 722
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Trying 208.68.240.126...
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Connected to setiboinc.ssl.berkeley.edu (208.68.240.126) port 80 (#724)


I don't buy computers, I build them!!
ID: 1980108 · Report as offensive
Profile NorthCup

Send message
Joined: 6 Jun 99
Posts: 42
Credit: 44,354,307
RAC: 40,262
Germany
Message 1980109 - Posted: 13 Feb 2019, 14:47:09 UTC

Look at your task-overwiew - the/my last 550 WUs marked as bad - joy in the day ;-)
Klaus
ID: 1980109 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 573,020,100
RAC: 911,194
Canada
Message 1980111 - Posted: 13 Feb 2019, 14:49:25 UTC - in response to Message 1980108.  

126 is the Scheduler and that is probably just an indication of a lot of traffic hitting it.
ID: 1980111 · Report as offensive
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4698
Credit: 151,061,759
RAC: 240,050
Australia
Message 1980112 - Posted: 13 Feb 2019, 14:51:42 UTC - in response to Message 1980108.  

Even though I finally got through to the scheduler and got no tasks, should this be a concern --

02/13/2019 09:34:19 | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
02/13/2019 09:34:19 | SETI@home | [http] HTTP_OP::init_post(): http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
02/13/2019 09:34:19 | SETI@home | [http] HTTP_OP::libcurl_exec(): ca-bundle set
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Connection 723 seems to be dead!
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Closing connection 723
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: TLSv1.2 (OUT), TLS alert, Client hello (1):
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Connection 721 seems to be dead!
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Closing connection 721
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Connection 722 seems to be dead!
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Closing connection 722
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Trying 208.68.240.126...
02/13/2019 09:34:19 | SETI@home | [http] [ID#1] Info: Connected to setiboinc.ssl.berkeley.edu (208.68.240.126) port 80 (#724)


. . I wish I knew. Are you using a proxy ??

Stephen

?
ID: 1980112 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1807
Credit: 770,869,416
RAC: 2,617,502
United States
Message 1980116 - Posted: 13 Feb 2019, 15:11:25 UTC
Last modified: 13 Feb 2019, 15:13:31 UTC

I've managed to send back all my completed tasks (from the two biggest systems). but getting project has no tasks available. so it seems they haven't opened the door yet.

Set NNT, and you'll be able to report old tasks and will get you past the "HTTP internal error". you do not need to wait the 5min timer for reporting tasks. just click "Update" over and over until all of your old tasks are sent back. I set max_tasks_reported to 250. even thousands of tasks only takes a few mins to report them all.

then wait patiently for more work lol.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1980116 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 573,020,100
RAC: 911,194
Canada
Message 1980121 - Posted: 13 Feb 2019, 15:53:05 UTC

I have a few tasks starting to come in now.
ID: 1980121 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1807
Credit: 770,869,416
RAC: 2,617,502
United States
Message 1980123 - Posted: 13 Feb 2019, 16:03:39 UTC

i got 15 on one update on one system, and maybe some more on a slower system. trickling in it seems.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1980123 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4864
Credit: 589,762,704
RAC: 1,393,978
United States
Message 1980124 - Posted: 13 Feb 2019, 16:11:14 UTC - in response to Message 1980053.  
Last modified: 13 Feb 2019, 16:13:54 UTC

I see the problems are continuing....
Scheduler allocation of work is sporadic, at best. At worst, it's hours between getting any work allocated (regardless of how much is there to be had).....
Actually this problem goes back Years, it's just for the first couple of Years there were few people reporting about the problem. Logs were posted, for years, showing how the Scheduler was simply ignoring work requests and just sending a few random tasks ever so often. It first started with the introduction of the BLC tasks in 2016. Finally enough people were reporting the problem that a serious attempt was made to solve the problem. It was found that as long as there was just One type of task in the cache there wasn't a problem, the problem started as soon as a different type task was added to the cache. At this point the cache was mainly filled with BLC tasks, and only a couple Arecibo files would be added per a week.

It was found the problem was solved by sending Arecibo VLARs to the GPUs and this seemed to work, as long as there are only a Few Arecibo Files split a week. Recently Many More Arecibo Files have been added to the cache than just a couple a week, in fact, there is a constant supply. Now, the Exact Same Problem has returned, the exact same problem which has been present for Years. It would seem the process that chooses which task to send hangs at some point and No task is chosen to be sent. As long as there is just One type of task in the cache, the problem doesn't appear. This is how BETA Works BTW. Only One type of task is in the cache at one time, for the past few weeks BETA has only been sending Arecibo tasks. When that file is finished at BETA, probably, a BLC File will be loaded, again, One type of File present in the cache at one time.

I'm willing to bet that if there were only One type of File in the SETI Main cache the recent Scheduler problems would disappear, just as has been the case for Years.
Hopefully those 'Lost' Arecibo files will be finished SOON and we will go back to having only a couple of Arecibo Files a week. Or, someone could manually Stop the Arecibo Files long enough to determine if that is indeed the problem.
ID: 1980124 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9871
Credit: 930,418,214
RAC: 1,524,429
United States
Message 1980128 - Posted: 13 Feb 2019, 16:47:44 UTC

I just have finished reporting. Got a few tasks in download pending with stuck and slow tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1980128 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 568
Credit: 1,950,441
RAC: 853
United States
Message 1980130 - Posted: 13 Feb 2019, 17:01:29 UTC

much needed db purging is happening ... from 9 million down to 7 million and hopefully lower. I think splitting, and giving out WUs will get better once the db purging is not the main focus.
ID: 1980130 · Report as offensive
Previous · 1 . . . 43 · 44 · 45 · 46 · 47 · Next

Message boards : Number crunching : Panic Mode On (114) Server Problems?


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.