The Server Issues / Outages Thread - Panic Mode On! (117)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 41 · 42 · 43 · 44 · 45 · 46 · 47 . . . 52 · Next

AuthorMessage
Profile Jan Henrik
Avatar

Send message
Joined: 13 Jul 14
Posts: 13
Credit: 5,769,438
RAC: 42
Message 2023686 - Posted: 19 Dec 2019, 8:17:27 UTC

server status has lots of red and orange . . . but luckily I´m a slow and still got some in the cache . . .
"less than a pixel"
ID: 2023686 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 36781
Credit: 261,360,520
RAC: 489
Australia
Message 2023689 - Posted: 19 Dec 2019, 9:06:17 UTC

Work is flowing out again.

For the moment anyway.

Cheers.
ID: 2023689 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 2023691 - Posted: 19 Dec 2019, 10:20:50 UTC
Last modified: 19 Dec 2019, 10:34:05 UTC

The Ready-to-send buffer is now down to 550k, so it would be nice if the splitters would now start to produce some more work before it makes it's way to zero again.

Edit-
and bingo. Post about the problem, and it gets sorted. Ready-to-send now down to 470k, but at least the splitters have fired up again.
Grant
Darwin NT
ID: 2023691 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 2023692 - Posted: 19 Dec 2019, 10:46:30 UTC

Looking god here, at this point.
ID: 2023692 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2023693 - Posted: 19 Dec 2019, 10:54:04 UTC - in response to Message 2023691.  

... and bingo. Post about the problem, and it gets sorted. Ready-to-send now down to 470k, but at least the splitters have fired up again.
I think it takes the splitters some time to get out of bed, after they receive the 'wake up' call. I have some sympathy with their point of view.
ID: 2023693 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2023696 - Posted: 19 Dec 2019, 11:33:58 UTC

Might be time to consider panic. It's 6:30 am local time, USA east coast. My top machine is at 75% of in progress jobs and just got "no tasks available" from the servers. The server status page shows about 400,000 available.
ID: 2023696 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2023700 - Posted: 19 Dec 2019, 12:11:37 UTC
Last modified: 19 Dec 2019, 13:00:09 UTC

Results out in the field	0	54,630	6,877,527	2m
Results received in last hour **	0	1,160	150,141	1h
Result turnaround time (last hour average) **	0.00 hours	38.91 hours	35.51 hours	1h
Results returned and awaiting validation	0	41,513	8,732,314	2m
Workunits waiting for validation	0	0	86,180	2m
Workunits waiting for assimilation	0	6	1,461,188	2m
Workunit files waiting for deletion	0	16,546	908,196	2m
Result files waiting for deletion	0	0	290	2m
Workunits waiting for db purging	0	6,832	2,586,456	2m
Results waiting for db purging	71	49,274	7,288,759	2m


With such high numbers sure the servers are having a "hard time".

Maybe that proves we where right when we keep the spoofed builds closed to a small group exactly to avoid this.

Even a small increase of the WU cache multiplied with the huge amount of working hosts could leave the servers to it's limits.

Now just imagine what instead of having a 300 WU limit you have 6400 without control... sure a server crash.

Hope they could find a way to resolve this.

my 0.02

<edit>BTW I tried to do my part by avoid my lone host to contact the servers for the rest of the day or until i see the issue solved.
Maybe some others with big crunchers and large caches could do the same to release some workload from the servers at this time.
ID: 2023700 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5126
Credit: 276,046,078
RAC: 462
Message 2023707 - Posted: 19 Dec 2019, 13:11:25 UTC

Got up this morning (Thursday) and I have no GPU tasks running. Pure CPU tasks and a couple hundred "waiting to report" in Queue.

I think we should put off Panicking until about 10am PST Friday.

It would be interesting to see an analysis of the system logs that shows exactly what is "bottle necking" now because all we really have are "black box" reports.

Just bumped the update manually and a bunch of gpu tasks started downloading. <Shrug> I dunno....

Tom
A proud member of the OFA (Old Farts Association).
ID: 2023707 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2023708 - Posted: 19 Dec 2019, 13:35:25 UTC - in response to Message 2023707.  

Got up this morning (Thursday) and I have no GPU tasks running. Pure CPU tasks and a couple hundred "waiting to report" in Queue.

I think we should put off Panicking until about 10am PST Friday.

It would be interesting to see an analysis of the system logs that shows exactly what is "bottle necking" now because all we really have are "black box" reports.

Just bumped the update manually and a bunch of gpu tasks started downloading. <Shrug> I dunno....

Tom

That is exactly what happening when you have an unstable system, sometimes works, sometimes no. Something i agree they need time to see the analysis logs and find the bottle neck. Hope they could find and solve the issue soon.
ID: 2023708 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2023712 - Posted: 19 Dec 2019, 14:32:25 UTC
Last modified: 19 Dec 2019, 14:43:02 UTC

All my caches are down and hosts being met with no tasks are available with the RTS at 720K.

Something is still plugging up the servers.

Database can't handle any host lookup. Driving blind now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2023712 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2023713 - Posted: 19 Dec 2019, 14:42:38 UTC

Plenty in the RTS
Results ready to send 0 3,418 765,855 5m
It just isn't doing much sending :-(

I've set myself to NNT again until things are better.
ID: 2023713 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2023715 - Posted: 19 Dec 2019, 15:19:26 UTC

Greetings,

A few minutes ago I checked BT and saw that I had a boat load of tasks to report. I left it as-is (I did not use my trigger finger). ;)

I came here and checked a few things out and left and went into BOINC to check my Event Log. I saw that downloads were happening. I just got a boat load of replacement WUs to chew on. :) I don't know that it was just a fluke that I got downloads or not. But it does seem that WUs are going out to those in need of them.

I believe they should further reduce the limit or even go back to the 100 per device limit. I never saw this much of a problem before, unless there was a server crash like the scheduler a while back. All seemed to run regularly smooth when there were no problems server side and we had the 100 limit.

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2023715 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2023726 - Posted: 19 Dec 2019, 16:56:45 UTC

Seems we are slowly coming back to life.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2023726 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 2023734 - Posted: 19 Dec 2019, 18:07:20 UTC

Yes, but with stuck downloads as a result...
ID: 2023734 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 2023736 - Posted: 19 Dec 2019, 18:17:33 UTC

This is like playing whack-a-mole.
ID: 2023736 · Report as offensive
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 2023759 - Posted: 19 Dec 2019, 20:05:58 UTC

is it really down again?!?

12/19/2019 2:03:54 PM | SETI@home | Sending scheduler request: To fetch work.
12/19/2019 2:03:54 PM | SETI@home | Requesting new tasks for CPU
12/19/2019 2:03:55 PM | SETI@home | Scheduler request completed: got 0 new tasks
12/19/2019 2:03:55 PM | SETI@home | Project is temporarily shut down for maintenance
ID: 2023759 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2023760 - Posted: 19 Dec 2019, 20:15:25 UTC - in response to Message 2023759.  

is it really down again?!?

12/19/2019 2:03:54 PM | SETI@home | Sending scheduler request: To fetch work.
12/19/2019 2:03:54 PM | SETI@home | Requesting new tasks for CPU
12/19/2019 2:03:55 PM | SETI@home | Scheduler request completed: got 0 new tasks
12/19/2019 2:03:55 PM | SETI@home | Project is temporarily shut down for maintenance


. . That's what the servers are telling us ...

Stephen

??
ID: 2023760 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2023761 - Posted: 19 Dec 2019, 20:15:29 UTC - in response to Message 2023759.  

Yup - the scheduler (and a lot of the workunit related daemons) are showing 'disabled' now, although - as you can see - the message boards are still up. Must be trying to sort something out in the data flow.
ID: 2023761 · Report as offensive
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 2023762 - Posted: 19 Dec 2019, 20:16:55 UTC

It is down for now. Lots of disabled servers. Aw well.....s--happens
ID: 2023762 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2023763 - Posted: 19 Dec 2019, 20:17:36 UTC

Ok Back to Suspend Network Activity until fixed. Again...
ID: 2023763 · Report as offensive
Previous · 1 . . . 41 · 42 · 43 · 44 · 45 · 46 · 47 . . . 52 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.