Panic Mode On (106) Server Problems?

Message boards : Number crunching : Panic Mode On (106) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 29 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1868306 - Posted: 20 May 2017, 0:58:52 UTC - in response to Message 1868303.  

It would seem that the staff in the "centre" are not quite up to speed because, like Grant, my caches were COMPLETELY EMPTY. Zero tasks of any kind. Not for GPU nor CPU. Zilch. Every request met with "No tasks available". Whatever the cause it was NOT a shorty storm.

They were talking about the return times for completed work after they got the servers going again.
Grant
Darwin NT
ID: 1868306 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1868308 - Posted: 20 May 2017, 1:02:20 UTC - in response to Message 1868280.  


As you may recall, I'm typically the guy saying that I'm experiencing none of these issues, and generally have no trouble getting the caches full.
Since I did the whole GuppiRescheduler/QOpt thing, I use Windoze' Task Scheduler to shut down BOINC periodically (either every 4 hrs or every 8 hrs, depending on the machine) to run GR.
Wonder if there's any correlation there? Might be worth a look-see, for anyone experiencing this. Just a thought.


Generally I just run the rescheduler in place and let it do its thing of stopping BOINC and continuing on. Its only one computer that doesn't like that if it has an Einstein task running. That will almost certainly cause a TDR fault in the video driver. That also means I lose the overclock on the video card driving the monitor. I've learned to fully exit BOINC before running the rescheduler on that machine if it has a Einstein task running. I never got around to running the Windows Task Scheduler or creating a PowerScript to fully automate the process. I've been hands on so far and it has worked out for me.

I can't say that I can correlate fully exiting BOINC and causing that computer to get starting tasks again after any prolonged period of receiving " no tasks available" messages. The preference flip has almost always worked for me. In fact today I think was the first time I've experienced any ineffectiveness with that method.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1868308 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1868314 - Posted: 20 May 2017, 1:12:41 UTC - in response to Message 1868306.  


They were talking about the return times for completed work after they got the servers going again.


. . OK, my bad ...

. . Put it down to frustration ......... :(

Stephen

:(
ID: 1868314 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1868340 - Posted: 20 May 2017, 5:25:09 UTC
Last modified: 20 May 2017, 5:40:29 UTC

OK, the web site is back, but i'm still getting "Project has no tasks available" on my work requests. That's when it doesn't result in a Scheduler error.

EDIT- changed application preferences & down the work came.
Since installing the AP application this had been a very minor issue & didn't occur too often, or very severely, now it's becoming as bad as it was before installing AP.
Grant
Darwin NT
ID: 1868340 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1868358 - Posted: 20 May 2017, 8:37:13 UTC
Last modified: 20 May 2017, 8:40:43 UTC

"Couldn't connect to server" on one system, "HTTP service unavailable" then "Couldn't connect to server" on the other for the last 2 Scheduler requests.
Third time lucky?

EDIT-
Third time lucky.
Grant
Darwin NT
ID: 1868358 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1868670 - Posted: 21 May 2017, 20:25:11 UTC

The kittyman is alive and 'well' in Kittyland, Wisconsin, USof A.
Going through a few changes and just kinda layin' low and chillin' with the kitties for a bit.
Thanks to those who noticed.
If you are inclined, you may read a little more in my kittyman thread in the cafe.

Meowfornow.
Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1868670 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1868683 - Posted: 21 May 2017, 21:24:02 UTC
Last modified: 21 May 2017, 21:25:37 UTC

Anybody else notice that the Seti server timebase is almost 2 minutes ahead of UTC?
Not the one used for the timestamps on these forum posts, but when I look at 'all tasks' and the time sent.............almost 2 minutes fast.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1868683 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1868701 - Posted: 21 May 2017, 22:30:17 UTC - in response to Message 1868670.  

. . Well meow and brrrp!

Stephen

..
ID: 1868701 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1868718 - Posted: 22 May 2017, 1:29:12 UTC - in response to Message 1868256.  

I also note there are four BLC splitter jobs that haven't progressed in several hours,

More like a couple of weeks.


blc02_2bit_guppi_57835_15340_HIP48113_0051	52.39 GB	 (66) 	
blc02_2bit_guppi_57835_15675_HIP49197_0052	52.39 GB	 (40) 	
blc02_2bit_guppi_57835_16015_HIP48183_0053	52.39 GB	 (20) 	
blc02_2bit_guppi_57835_16355_HIP49197_0054	52.39 GB	 (3) 


bump
ID: 1868718 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1868740 - Posted: 22 May 2017, 4:38:13 UTC - in response to Message 1868718.  

I also note there are four BLC splitter jobs that haven't progressed in several hours,

More like a couple of weeks.


blc02_2bit_guppi_57835_15340_HIP48113_0051	52.39 GB	 (66) 	
blc02_2bit_guppi_57835_15675_HIP49197_0052	52.39 GB	 (40) 	
blc02_2bit_guppi_57835_16015_HIP48183_0053	52.39 GB	 (20) 	
blc02_2bit_guppi_57835_16355_HIP49197_0054	52.39 GB	 (3) 


bump

I think might be a case of processing order for the splitters. Those files that have been sitting there for a while now partly processed got dropped when new files were loaded; for some reason the splitters decided the newer files need processing first. And for whatever reason all the later files since then have been the ones chosen to split, hence the partially split files sitting up the top by themselves.
Grant
Darwin NT
ID: 1868740 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1868751 - Posted: 22 May 2017, 8:23:12 UTC

Well, that was interesting.
The Seti web site went AWOL, so did the data servers. BOINC went AWOL, as did the Berkeley IST pages & web site, as did Berkeley.edu itself.
Grant
Darwin NT
ID: 1868751 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1868753 - Posted: 22 May 2017, 8:42:38 UTC - in response to Message 1868751.  

Yea, we need a "Berkeley is Down" café when The Seti is Down Café is unreahable,
LOL
ID: 1868753 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1868755 - Posted: 22 May 2017, 8:56:35 UTC - in response to Message 1868753.  
Last modified: 22 May 2017, 9:07:45 UTC

Yea, we need a "Berkeley is Down" café when The Seti is Down Café is unreahable,
LOL


. . There is a "SETI is down Cafe"? You learn something everyday :)

Stephen

:)

[edit] I had a look and I don't think I will visit there very much, the current topic is rather sad.
ID: 1868755 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1868757 - Posted: 22 May 2017, 9:14:40 UTC - in response to Message 1868755.  

When seti is down it becomes active - usually.
ID: 1868757 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1868758 - Posted: 22 May 2017, 9:16:17 UTC - in response to Message 1868757.  
Last modified: 22 May 2017, 9:16:33 UTC

When seti is down it becomes active - usually.

...as usually it's only Seti that is down, not everything like this last (luckily short) outage.
Grant
Darwin NT
ID: 1868758 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1868768 - Posted: 22 May 2017, 10:23:08 UTC

Well I wonder if that server MIA put anymore completed w/u's into pending waiting mode.

I did a check of my pendings on 1 rig and found 8 batches (of varying sizes) in that state, though I'll have near on 100 of them clearing in the next 18hrs.

Cheers.
ID: 1868768 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30648
Credit: 53,134,872
RAC: 32
United States
Message 1868835 - Posted: 22 May 2017, 19:35:54 UTC - in response to Message 1868751.  

Well, that was interesting.
The Seti web site went AWOL, so did the data servers. BOINC went AWOL, as did the Berkeley IST pages & web site, as did Berkeley.edu itself.

I do see that the Earl Warren Data Center load balancers are due for some work. Someone might have been poking around making sure that they have a written config to fall back on and while in, frozen a box by accident. Anyway on Thursday if it all crashes we have had a warning.
http://systemstatus.berkeley.edu/
ID: 1868835 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1868945 - Posted: 23 May 2017, 12:09:16 UTC
Last modified: 23 May 2017, 12:09:32 UTC

Time to see if I have enough tasks to make it this week - see you in 13 hours.
LOL
ID: 1868945 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1868963 - Posted: 23 May 2017, 23:45:39 UTC - in response to Message 1868945.  

Time to see if I have enough tasks to make it this week - see you in 13 hours.
LOL



. . . And ... we're back!!

. . Less than 12 hours this week, I am amazed :)

Stephen

:)
ID: 1868963 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1868967 - Posted: 23 May 2017, 23:57:52 UTC - in response to Message 1868963.  

Yes, amazed too. Caught me off guard. It looks like I banked enough tasks to get through the outage finally. And didn't make any ghosts. I barely squeaked through with the Ryzen system for CPU task. Probably wouldn't have made for the typical 13 hour long outages of late.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1868967 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 29 · Next

Message boards : Number crunching : Panic Mode On (106) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.