Panic Mode On (111) Server Problems?

Message boards : Number crunching : Panic Mode On (111) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 31 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1929210 - Posted: 10 Apr 2018, 7:55:45 UTC - in response to Message 1929209.  

The host needs reboot or maintenance.

Internal self-checking within Raistmer's SETI application has spotted something odd. Sometimes that check is a little too twitchy, and it goes through at the next attempt with no problems. But it can also mean that your hardware is running on the ragged margin, probably because of over-heating or over-clocking. The first maintenance task would be blowing the dust bunnies out of the cooling fans - take it from there.
ID: 1929210 · Report as offensive
Profile Stargate (SA)
Volunteer tester
Avatar

Send message
Joined: 4 Mar 10
Posts: 1854
Credit: 2,258,721
RAC: 0
Australia
Message 1929211 - Posted: 10 Apr 2018, 8:04:49 UTC

Thanks Richard
ID: 1929211 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1929213 - Posted: 10 Apr 2018, 8:26:58 UTC - in response to Message 1929195.  

I think he pretty much gave up after the download error and just decided to prepare for tomorrow and go to bed. Sounds good to me.
Most of my new CPU tasks are Arecibo VLARs, and that means there are quite a few of them around. We could be VLARed at any time.

I've been getting a few stuck downloads. If a download gets stuck while sleeping I could wake up to an empty cache and a down project :-(

There seems to be another one in progress;
6408360733 	8028144 	15 Feb 2018, 19:49:52 UTC 	10 Apr 2018, 0:49:34 UTC 	Timed out - no response 	0.00 	0.00 	--- 	SETI@home v8 v8.22 (opencl_nvidia_SoG) windows_intelx86
6408360734 	8230200 	15 Feb 2018, 19:49:44 UTC 	15 Feb 2018, 22:11:54 UTC 	Completed, waiting for validation 646.30 628.58 pending SETI@home v8 v8.22 (opencl_nvidia_SoG) windows_intelx86
6553586461 	8061725 	10 Apr 2018, 3:19:07 UTC 	10 Apr 2018, 4:19:59 UTC 	Error while downloading 	0.00 	0.00 	--- 	SETI@home v8 v8.22 (opencl_nvidia_SoG) windows_intelx86
6554106299 	6796479 	10 Apr 2018, 7:12:24 UTC 	10 Apr 2018, 7:17:39 UTC 	Error while downloading 	0.00 	0.00 	--- 	SETI@home v8 Anonymous platform (NVIDIA GPU)
6554499124 	--- 	--- 	--- 	Unsent 	--- 	--- 	--- 	---
Maybe more of these than expected.... Here in Feb, gone by Apr.
I'm going to bed.
ID: 1929213 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1929215 - Posted: 10 Apr 2018, 8:55:23 UTC - in response to Message 1929213.  

'Getting stuck' is one thing. It will be retried (admittedly at longer and longer intervals) until it eventually gets through. It's very unusual these days for SETI to have single task comms failures: sometimes we lose an entire download server, but that effects all users, all tasks.

The tasks you linked are different.

WU download error: couldn't get input files:
<file_xfer_error>
  <file_name>blc11_2bit_guppi_58137_40453_HIP63529_0040.7731.409.22.45.231.vlar</file_name>
  <error_code>-224 (permanent HTTP error)</error_code>
  <error_message>permanent HTTP error</error_message>
</file_xfer_error>
The data file is MIA - it has gone for ever. That task will be abandoned immediately, and a replacement requested at the next work fetch. No time will be lost, and nothing will be 'stuck'.
ID: 1929215 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1929216 - Posted: 10 Apr 2018, 9:29:17 UTC - in response to Message 1929194.  

I suggest that Keith is not telling the WHOLE truth with his complaint - as I type he has the following numbers of tasks in progress:

All tasks for computer 5741129 = In progress (1000) (expected = 400)
All tasks for computer 8306366 = In progress (1381) (expected = 400)
All tasks for computer 8480062 = In progress (1318) (expected = 500)
ll tasks for computer 6279633 = In progress (1020) (expected = 400)
All tasks for computer 8030022 = In progress (1014) (expected = 400)

Bunkering and rescheduling can have some very strange effects on the servers and you appear to have triggered one of them - work outside the rules and expect to get bitten.
Thanks Rob - that's very illuminating.

I gave my test machine a rest overnight, but set it back up to fetch 200 this morning - and there it's stayed. My fears about a coding error have subsided. So, if Keith is trying to game the system, and failing, I'm going to put it down to an EBCAK and relax.

If Keith wants to institute a policy change at SETI, he can ask Eric direct. But judging by how busy Eric (SETIguy) was on his other machine during a short but very intense video conference call last night (#2447), I don't think it'll be high up his priority list.
ID: 1929216 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1929217 - Posted: 10 Apr 2018, 9:29:31 UTC - in response to Message 1929215.  

Yes Richard, I'm getting Stuck downloads as well.
Lost Tasks
Stuck Downloads
Reported tasks Not being replaced
I'm seeing ALL of the above. Quite a Cluster I'd say.
Hopefully it will be somewhat fixed by around 2200 EDT. See you then.

BTW, that's my SECOND Lost task, I wonder how many more are lurking out there.
ID: 1929217 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1929218 - Posted: 10 Apr 2018, 9:38:09 UTC - in response to Message 1929217.  

Stuck Downloads
I thought you were "sitting on top of a major FiOS cable" (message 1928603). What does <http_debug> say? If there's a 'transient http error', what's the logged cause?
ID: 1929218 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1929224 - Posted: 10 Apr 2018, 12:11:53 UTC - in response to Message 1929185.  

I'm set for 1.0 day + 0.01 day additional cache limit. No I am nowhere near my caches limit.

Just picked up my first task error since I reset the project back in March


. . I very rarely get errors and almost never on Donkey (C2D) but I have had two today, one compute error and one download error.

Stephen

?
ID: 1929224 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1929242 - Posted: 10 Apr 2018, 13:53:29 UTC

Almost 10h00 EDT and no outrage yet?


ID: 1929242 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1929249 - Posted: 10 Apr 2018, 14:21:25 UTC - in response to Message 1929242.  

Almost 10h00 EDT and no outrage yet?



The evil elephant hide the Illudium, Pu-36, Explosive, Space Modulator..
ID: 1929249 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1929253 - Posted: 10 Apr 2018, 14:52:54 UTC

Perhaps the outage procedures or routines are needing some adjustments due to recent reconfigurations?
Or perhaps this is the 'new outage-less Seti'? Oh, yeah.....LOL.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1929253 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1929258 - Posted: 10 Apr 2018, 15:07:58 UTC - in response to Message 1929257.  

Perhaps the outage procedures or routines are needing some adjustments due to recent reconfigurations?
Or perhaps this is the 'new outage-less Seti'? Oh, yeah.....LOL.

They need a long and hard outage free burn in week, after last weeks database reorg. :-)

Well, that would be the cat's meow! Not that I am anticipating such a thing.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1929258 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1929266 - Posted: 10 Apr 2018, 20:33:51 UTC

Appears we had the shortest Kaboom in a long time... back to that original "four hour" it started as. Nice work on that optimization. :^)
ID: 1929266 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1929267 - Posted: 10 Apr 2018, 20:34:04 UTC

Flash Outage?
ID: 1929267 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1929269 - Posted: 10 Apr 2018, 20:47:21 UTC

Hopefully a new norm.
ID: 1929269 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 1929273 - Posted: 10 Apr 2018, 21:22:33 UTC

10/04/2018 22:21:00 | SETI@home | Sending scheduler request: To fetch work.
10/04/2018 22:21:00 | SETI@home | Reporting 36 completed tasks
10/04/2018 22:21:00 | SETI@home | Requesting new tasks for CPU
10/04/2018 22:21:03 | SETI@home | Scheduler request completed: got 0 new tasks
10/04/2018 22:21:03 | SETI@home | Project has no tasks available

Oops.
ID: 1929273 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1929275 - Posted: 10 Apr 2018, 21:30:37 UTC - in response to Message 1929266.  

Appears we had the shortest Kaboom in a long time... back to that original "four hour" it started as. Nice work on that optimization. :^)


. . Pretty close to it, it didn't start until 1:50am AEST and was over by 6:50 am AEST. That make 5 hours, but with 1 hour backoffs it could have actually ended anytime within that last hour. I thought someone had slipped me a micky in my hot chocolate.

. . Your Marvin piccy sums it up perfectly ...

. . If this is to be the new norm (fingers crossed) lets have a party ...

Stephen

:)
ID: 1929275 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 1929283 - Posted: 10 Apr 2018, 22:07:05 UTC
Last modified: 10 Apr 2018, 22:20:11 UTC

Since reporting the 36, it has reported 3 individual wu, then this...

10/04/2018 23:03:14 | SETI@home | Sending scheduler request: To fetch work.
10/04/2018 23:03:14 | SETI@home | Reporting 1 completed tasks
10/04/2018 23:03:14 | SETI@home | Requesting new tasks for CPU
10/04/2018 23:03:17 | SETI@home | Scheduler request completed: got 3 new tasks

...what about replacing the 36? :-(
Edit: Just noticed this...
10/04/2018 22:34:24 | SETI@home | Scheduler request completed: got 37 new tasks

:-)
ID: 1929283 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1929289 - Posted: 10 Apr 2018, 22:21:51 UTC - in response to Message 1929194.  

I suggest that Keith is not telling the WHOLE truth with his complaint - as I type he has the following numbers of tasks in progress:

All tasks for computer 5741129 = In progress (1000) (expected = 400)
All tasks for computer 8306366 = In progress (1381) (expected = 400)
All tasks for computer 8480062 = In progress (1318) (expected = 500)
ll tasks for computer 6279633 = In progress (1020) (expected = 400)
All tasks for computer 8030022 = In progress (1014) (expected = 400)

Bunkering and rescheduling can have some very strange effects on the servers and you appear to have triggered one of them - work outside the rules and expect to get bitten.

My last posts at and before 03:00 UTC were for just regular requests of work on normal cache limits. Everything went topsy turvy as soon as that Arecibo tape hit the spitters and started dumping into the RTS buffer. Things got a bit better once I turned off AP work.

I don't normally start bunkering until 05:00 UTC so the overnight consumption doesn't eat too far into the cache while I sleep and I can start with enough work to make it through the outage. Didn't need the usual amount obviously since they started the outage late and ended much earlier than usual.

That was the seventh permanent download error task I've had since the database re-organization. I had four the next day after the project returned but they cleared the database quickly. I picked up two more later in the week that I posted to TBar's "Lost Task" thread. Got the latest last night before I started to bunker. And later in the evening before I turned in for bed, the RTS buffer had returned to solid BLC tasks for me and I had no issues getting work to bunker. Only when there was Arecibo work in the buffer did I suffer.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1929289 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1929295 - Posted: 10 Apr 2018, 22:39:37 UTC

Now there's an outrage I could sleep through again, it's been a very long time since that last happened, and all rigs have full caches.

Cheers.
ID: 1929295 · Report as offensive
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 31 · Next

Message boards : Number crunching : Panic Mode On (111) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.