The Server Issues / Outages Thread - Panic Mode On! (117)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 52 · Next

AuthorMessage
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2012088 - Posted: 15 Sep 2019, 19:12:57 UTC

Greetings,

Ok, now my other Linux PC is out of GPU WUs. I have 36 CPU WUs left on my Linux/Winders PC. My other Linux PC has twice that many. Minty-Winders will be out of work in the next few hours. :(

Have a great day, if'n you can! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2012088 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 2012091 - Posted: 15 Sep 2019, 19:48:45 UTC

Haven't been here in a while. What's going on with Sundays now?

SSP looks like things should be running.. but I haven't had a good scheduler contact for 10 hours now. Uploads go through just fine though.

(times are in UTC -4)
2019-09-15 15:44:59 SETI@home update requested by user
2019-09-15 15:45:00 SETI@home [sched_op_debug] Fetching master file
2019-09-15 15:45:00 SETI@home Fetching scheduler list
2019-09-15 15:45:02 Project communication failed: attempting access to reference site
2019-09-15 15:45:02 SETI@home [sched_op_debug] Deferring communication for 1 days 0 hr 0 min 0 sec
2019-09-15 15:45:02 SETI@home [sched_op_debug] Reason: 5 consecutive failures fetching scheduler list
2019-09-15 15:45:03 Internet access OK - project servers may be temporarily down.


I've still got a good bit of cache left, but I don't know what's going on.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 2012091 · Report as offensive
Profile mohavewolfpup
Avatar

Send message
Joined: 20 Oct 18
Posts: 32
Credit: 3,666,574
RAC: 24
United States
Message 2012107 - Posted: 15 Sep 2019, 20:59:07 UTC - in response to Message 2012091.  

I've got lots of "Ready to report" sitting in queue currently and communication keeps getting put off. No reports or unit downloads, so something is up
Historian for the Defunct Riviera Hotel and Casino, Former Classic Seti@home user for Team Art Bell. Greetings from the High Desert!
ID: 2012107 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2012112 - Posted: 15 Sep 2019, 21:40:42 UTC

Greetings,

@Richard: Somehow I don't believe the "goin' to the pub" routine is gonna work this time. ;) Oh well. lol :(

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2012112 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 2012114 - Posted: 15 Sep 2019, 21:59:33 UTC

It is beginning to look like this is the new normal, practicing for the Tuesday outage.
ID: 2012114 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 2012116 - Posted: 15 Sep 2019, 22:31:23 UTC

Reporting finished work still doesn't work and there is very little we can do about it. All I can think of is set some hair an fire and howl at the moon.


ID: 2012116 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 2012119 - Posted: 15 Sep 2019, 22:51:47 UTC

It worked!
ID: 2012119 · Report as offensive
Profile HAL
Avatar

Send message
Joined: 18 May 99
Posts: 535
Credit: 8,246,955
RAC: 3
United States
Message 2012127 - Posted: 15 Sep 2019, 23:30:33 UTC

Coming back to life here. :-)
I'm putting myself to the fullest possible use, which is all, I think, that any conscious entity can ever hope to do.
ID: 2012127 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2012130 - Posted: 15 Sep 2019, 23:50:40 UTC

Staggering back to life, slowly.
Know the feeling ... :|
ID: 2012130 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2012139 - Posted: 16 Sep 2019, 0:10:01 UTC

Caches here are now full and heat is being generated again (it was a cold morning here).

Cheers.
ID: 2012139 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2012146 - Posted: 16 Sep 2019, 1:26:52 UTC - in response to Message 2012062.  

Oh dear. You're not saying I have to go and force down another Sunday roast, are you? The sacrifices I go through for this project...

Your technique works perfect. Just arrived from the bar, and the work is flowing normally.

I know how hard are the sacrifices you do for the project.
ID: 2012146 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 2012268 - Posted: 17 Sep 2019, 2:07:16 UTC - in response to Message 2012091.  

Haven't been here in a while. What's going on with Sundays now?

SSP looks like things should be running.. but I haven't had a good scheduler contact for 10 hours now. Uploads go through just fine though.

(times are in UTC -4)
2019-09-15 15:44:59 SETI@home update requested by user
2019-09-15 15:45:00 SETI@home [sched_op_debug] Fetching master file
2019-09-15 15:45:00 SETI@home Fetching scheduler list
2019-09-15 15:45:02 Project communication failed: attempting access to reference site
2019-09-15 15:45:02 SETI@home [sched_op_debug] Deferring communication for 1 days 0 hr 0 min 0 sec
2019-09-15 15:45:02 SETI@home [sched_op_debug] Reason: 5 consecutive failures fetching scheduler list
2019-09-15 15:45:03 Internet access OK - project servers may be temporarily down.


I've still got a good bit of cache left, but I don't know what's going on.

Hmm.. this is still going on for me. I'm not sure what the problem is now.

I've flushed the DNS cache, done a release/renew of DHCP. That machine can get to the internet just fine. But for some reason, it fails at fetching the scheduler list. Windows machine gets the scheduler just fine.

Might just be time for a restart.. it's only been up for 194 days.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 2012268 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 2012271 - Posted: 17 Sep 2019, 2:26:11 UTC - in response to Message 2012268.  

Hmm.. this is still going on for me. I'm not sure what the problem is now.

Obviously this is a training run for the Tuesday outage.
ID: 2012271 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 2012276 - Posted: 17 Sep 2019, 3:41:17 UTC
Last modified: 17 Sep 2019, 3:59:03 UTC

Alright, so I don't know what the problem is now. I really don't.

I've restarted, I've done all the updates that were available, i've restarted again.

I flushed DNS cache.

I even found some guide about disabling the default of using localhost as a DNS server to *force* it to use outside servers.. and after a restart.. it STILL defaults to using localhost for queries.

I think there's something flawed in how it is trying to look the scheduler address up and that's causing it to not be able to find it.

That's what I think is going on, is what this write-up explains: https://moss.sh/name-resolution-issue-systemd-resolved/

I was able to upload all the finished results, but I have a full cache of 100 to report, but the scheduler can't be reached for some reason.

All other websites work in a browser, or just pinging them in a terminal.

I miss when network config was so much simpler in older linux. When you could just put three DNS servers in /etc/resolv.conf and be done with it. Now there's 15 layers of complexity that all try to take control over everything. Urg.

I don't know what else to do now. Maybe there's just a bad DNS record or something? The default time-out on those is 7 days. Maybe I just have to let the clock run out on that.

edit: Okay, I did manage to disable the localhost being a server.. it is now actually obeying the addresses listed by DHCP. But it still isn't working. Can't fetch scheduler list.

edit2: so in linux, nslookup returns 208.68.240.126 for setiboinc.ssl.berkeley.edu, which is what is listed in client_state.xml for scheduler.

Can't ping in either linux or windows.. but windows machine connects just fine to it? Linux machine doesn't.

And then boinc2.ssl.berkeley.edu behaves the same way.

Can we not ping those anymore or something?

But I see the placeholder page for Apache if I go to http://setiboinc.ssl.berkeley.edu , so... the site exists.

Is that what is going on, or what?
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 2012276 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 2012282 - Posted: 17 Sep 2019, 5:33:04 UTC

I think it's a SSL certificate problem.


That's what an 'update now' looks like in wireshark.

That's the only thing I can think of at this point is that it's a bad certificate. It looks like the server IS being contacted and responding, so it's not a network problem.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 2012282 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2012283 - Posted: 17 Sep 2019, 5:37:12 UTC

Cosmic Ocean...read this thread in the recent past...someone else just had this problem
ID: 2012283 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 2012286 - Posted: 17 Sep 2019, 7:13:14 UTC - in response to Message 2012283.  
Last modified: 17 Sep 2019, 7:46:02 UTC

Cosmic Ocean...read this thread in the recent past...someone else just had this problem

Yeah, I see that now. Looks like some older clients are starting to act up with expiring certificates.

I went and grabbed the current latest one from: http://curl.haxx.se/ and overwrote the old one that was in /etc/ssl/certs, but I'm still getting a/the same problem.

2019-09-17 02:47:49 SETI@home update requested by user
2019-09-17 02:47:54 SETI@home [sched_op_debug] Fetching master file
2019-09-17 02:47:54 [http_debug] HTTP_OP::init_get(): http://setiathome.berkeley.edu/
2019-09-17 02:47:54 SETI@home Fetching scheduler list
2019-09-17 02:47:54 [http_debug] [ID#1] Info: About to connect() to setiathome.berkeley.edu port 80 (#0)
2019-09-17 02:47:54 [http_debug] [ID#1] Info: Trying 208.68.240.110...
2019-09-17 02:47:54 [http_debug] [ID#1] Info: Connected to setiathome.berkeley.edu (208.68.240.110) port 80 (#0)
2019-09-17 02:47:54 [http_debug] [ID#1] Sent header to server: GET / HTTP/1.1

2019-09-17 02:47:54 [http_debug] [ID#1] Info: Connected to setiathome.berkeley.edu (208.68.240.110) port 443 (#0)
2019-09-17 02:47:54 [http_debug] [ID#1] Info: successfully set certificate verify locations:
2019-09-17 02:47:54 [http_debug] [ID#1] Info: CAfile: ca-bundle.crt
2019-09-17 02:47:54 [http_debug] [ID#1] Info: CApath: none
2019-09-17 02:47:54 [http_debug] [ID#1] Info: SSLv2, Client hello (1):
2019-09-17 02:47:54 [http_debug] [ID#1] Info: SSLv3, TLS handshake, Server hello (2):
2019-09-17 02:47:54 [http_debug] [ID#1] Info: SSLv3, TLS handshake, CERT (11):
2019-09-17 02:47:54 [http_debug] [ID#1] Info: SSLv3, TLS alert, Server hello (2):
2019-09-17 02:47:54 [http_debug] [ID#1] Info: error:0D0C50A1:asn1 encoding routines:ASN1_item_verify:unknown message digest algorithm
2019-09-17 02:47:54 [http_debug] HTTP error: SSL connect error


In the boinc data directory, ca-bundle.crt a symbolic link to /etc/ssl/certs/ca-certificates.crt. I changed the link and the CAfile listed in the output there changed to a non-existent path for where it would be if curl was being used.. so it DOES rely on it being in the data dir.

And as a test, I downloaded the latest BOINC for x64 linux and pulled ca-bundle.crt out of that, and it still does the same thing.

So now what?
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 2012286 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2012287 - Posted: 17 Sep 2019, 7:15:45 UTC - in response to Message 2012283.  

Cosmic Ocean...read this thread in the recent past...someone else just had this problem

https://manpages.ubuntu.com/manpages/bionic/man8/update-ca-certificates.8.html
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2012287 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 2012289 - Posted: 17 Sep 2019, 7:45:28 UTC - in response to Message 2012287.  

Cosmic Ocean...read this thread in the recent past...someone else just had this problem

https://manpages.ubuntu.com/manpages/bionic/man8/update-ca-certificates.8.html

root@taurus:~# update-ca-certificates -v
Updating certificates in /etc/ssl/certs...
0 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.
root@taurus:~#

Don't know what that did, if anything. I see the file size and modified date of /etc/ssl/certs/ca-certificates.crt changed, and I restarted boinc-client, and it still gives the same error: unknown message digest algorithm.

I don't know if it means anything, but in my googling, I found a boinc forum post from 2009 where someone was asking about this very thing for WCG, and one of the devs replied and said that the cert WCG relies-on is good until "2019".

Maybe we've reached the point in "2019" for that particular cert, perhaps? I know we're not WCG, but it seems to line up.

here's that particular thread: https://boinc.berkeley.edu/dev/forum_thread.php?id=4133
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 2012289 · Report as offensive
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 2012290 - Posted: 17 Sep 2019, 7:45:30 UTC - in response to Message 2012286.  

So now what?
I had a problem very similar to this on my Mint hosts. That problem went away when I installed the libnsspem library.
sudo apt-get install libnsspem
As I recall it, the http_debug output in my case complained that it couldn't find libnsspem.so, so it may not have been exactly the same as your problem...
ID: 2012290 · Report as offensive
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 52 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.