The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 78 · 79 · 80 · 81 · 82 · 83 · 84 . . . 94 · Next

AuthorMessage
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2031555 - Posted: 9 Feb 2020, 10:41:28 UTC - in response to Message 2031554.  

Sunday, and the replica if falling behind again.
As of now, 3,176 seconds (53 minutes) behind, and that number is getting bigger fast.
(no wonder it looks as if I had not returned anything for a while, when looking at my task list)
This has happened on many weekends. At approximately the same time the total outages we experienced on many consecutive Sundays in last September and October. I wonder if the causes are related.
ID: 2031555 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13904
Credit: 208,696,464
RAC: 304
Australia
Message 2031755 - Posted: 10 Feb 2020, 6:09:50 UTC

The Replica has had it's break & is now catching up again.
Grant
Darwin NT
ID: 2031755 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13904
Credit: 208,696,464
RAC: 304
Australia
Message 2031766 - Posted: 10 Feb 2020, 9:03:04 UTC

A few noisy WUs in the current Arecibo group, and more than the usual number of uploads timing out instantly.
Grant
Darwin NT
ID: 2031766 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2031817 - Posted: 10 Feb 2020, 16:21:19 UTC

Assimilation queue is still going down at a steady rate. If the same rate continues, the backlog is gone in about 8 days.
ID: 2031817 · Report as offensive
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2031876 - Posted: 10 Feb 2020, 21:07:00 UTC
Last modified: 10 Feb 2020, 21:14:11 UTC

Did anyone get anything other than noise bombs for ap_29ja16ad? Must have been something going on that day.

Edit: only thing I see on any cosmic calendars is the moon being at its apogee.
ID: 2031876 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031888 - Posted: 10 Feb 2020, 22:04:51 UTC - in response to Message 2031876.  

Did anyone get anything other than noise bombs for ap_29ja16ad? Must have been something going on that day.

Edit: only thing I see on any cosmic calendars is the moon being at its apogee.

Haven't done any of them yet. Still in progress.

It could be caused by any number of things. The radar could have been on. Terrestrial interference. Any number of transmitting satellites could have passed by in the aperture capture window. etc. etc.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031888 · Report as offensive
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2031896 - Posted: 10 Feb 2020, 22:35:51 UTC - in response to Message 2031888.  

It could be caused by any number of things. The radar could have been on. Terrestrial interference. Any number of transmitting satellites could have passed by in the aperture capture window. etc. etc.


Thanks, just unusual to get an sse, sse2, and 4 OpenCL_ati_mac WUs, just to have them all bomb out.
ID: 2031896 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 37608
Credit: 261,360,520
RAC: 489
Australia
Message 2031899 - Posted: 10 Feb 2020, 22:44:41 UTC - in response to Message 2031896.  

It could be caused by any number of things. The radar could have been on. Terrestrial interference. Any number of transmitting satellites could have passed by in the aperture capture window. etc. etc.
Thanks, just unusual to get an sse, sse2, and 4 OpenCL_ati_mac WUs, just to have them all bomb out.
It's nothing uncommon when the tasks are 100% blanked.

Cheers.
ID: 2031899 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2031938 - Posted: 11 Feb 2020, 4:18:32 UTC - in response to Message 2031888.  

It could be caused by any number of things. The radar could have been on. Terrestrial interference. Any number of transmitting satellites could have passed by in the aperture capture window. etc. etc.
This will become more and more common over time as Elon Musk and his competitors are spamming the low Earth orbit with thousands of internet satellites.
ID: 2031938 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19553
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2031951 - Posted: 11 Feb 2020, 7:48:15 UTC

Milestone As of 11 Feb 2020, 7:30:05 UTC

"Results returned and awaiting validation" is below 10,000,000.
exact figure 9,990,238.
ID: 2031951 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13904
Credit: 208,696,464
RAC: 304
Australia
Message 2031952 - Posted: 11 Feb 2020, 7:53:25 UTC - in response to Message 2031951.  
Last modified: 11 Feb 2020, 7:56:11 UTC

Milestone As of 11 Feb 2020, 7:30:05 UTC

"Results returned and awaiting validation" is below 10,000,000.
exact figure 9,990,238.
Only another 5.2 million to go (and another 2.41 million to go to clear the Assimilation backlog).
Grant
Darwin NT
ID: 2031952 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2031970 - Posted: 11 Feb 2020, 13:26:03 UTC - in response to Message 2031952.  

Only another 5.2 million to go (and another 2.41 million to go to clear the Assimilation backlog).
Those are essentially the same thing. Each workunit in assimilation queue is preventing on average about 2.2 results from transitioning to 'waiting for db purging' state. 5.2/2.41=2.16
ID: 2031970 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 2031971 - Posted: 11 Feb 2020, 22:00:49 UTC

and we are back....
ID: 2031971 · Report as offensive
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2031985 - Posted: 11 Feb 2020, 23:12:17 UTC

That didn't take too long today.
ID: 2031985 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031987 - Posted: 11 Feb 2020, 23:27:15 UTC - in response to Message 2031985.  

That didn't take too long today.

About double what it should be from what it was in the past.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031987 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13904
Credit: 208,696,464
RAC: 304
Australia
Message 2032038 - Posted: 12 Feb 2020, 4:05:25 UTC - in response to Message 2031987.  
Last modified: 12 Feb 2020, 4:08:49 UTC

That didn't take too long today.
About double what it should be from what it was in the past.
But still way better than it has been recently.
Got home to find one system with a full cache, and the other system with a bunch of downloads in extended backoff mode, but one Retry & everything came down OK And since then the Scheduler has been dishing out work and it hasn't taken any effort on my part to download it. First time for several weeks.


Edit- although it looks like we are about to run out of work; the splitters are still having issues getting going again after an outage.
And one of my systems has a tonne of Shorties in it's cache- that's not going to help the Validation & Assimilation backlogs clear.
Grant
Darwin NT
ID: 2032038 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2032040 - Posted: 12 Feb 2020, 4:16:25 UTC - in response to Message 2032038.  

True, better than the past couple of weeks, but nowhere near the Tuesday outage boilerplate of a 4-5 hour outage for database backup.

I still have one system that stubbornly refuses to get any cpu work even though it requests it and only finally will get cpu tasks once the gpu cache is filled. Infuriating when the cpus provide a good portion of the house heating during Winter.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2032040 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1859
Credit: 268,616,081
RAC: 1,349
United States
Message 2032048 - Posted: 12 Feb 2020, 6:08:21 UTC

Best recovery I've seen in quite a while.
Hopefully this means they've gotten a handle on the issues.
Perfect? No, but I can live with this ...
ID: 2032048 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13904
Credit: 208,696,464
RAC: 304
Australia
Message 2032051 - Posted: 12 Feb 2020, 6:47:12 UTC - in response to Message 2032048.  

Best recovery I've seen in quite a while.
Hopefully this means they've gotten a handle on the issues.
I'ts just a case of finally no more BLC35 files being split & putting out pretty much nothing but noise bombs. The fact is the backlogs from that (and the added replication for the RX5000 series issues) is still to be cleared, luckily they're presently low enough not to cause everything to fall over or come to a grinding halt.
Grant
Darwin NT
ID: 2032051 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2032052 - Posted: 12 Feb 2020, 6:54:05 UTC - in response to Message 2032048.  

Best recovery I've seen in quite a while.
Hopefully this means they've gotten a handle on the issues.
Perfect? No, but I can live with this ...


. . Hi Jimbo,

. . Less than 9 hours is a pleasant change from recent outages.

. . But the recovery has been very smooth with only a few niggling 'http internal error' messages.

Stephen

:)
ID: 2032052 · Report as offensive
Previous · 1 . . . 78 · 79 · 80 · 81 · 82 · 83 · 84 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.