Panic Mode On (114) Server Problems?

Message boards : Number crunching : Panic Mode On (114) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 45 · Next

AuthorMessage
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1969838 - Posted: 10 Dec 2018, 22:48:31 UTC

The RTS is increasing, so something is working at the moment. The out in the field never dropped, so no recovery needed. Whatever happened let's hope it is fixed now.
ID: 1969838 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1969851 - Posted: 11 Dec 2018, 0:41:54 UTC

Was running low, but caches are now full again.
Seems to me that there are some tasks happening in background that now have priority over splitting, especially some that seem to happen around 0500z daily.
ID: 1969851 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1969929 - Posted: 12 Dec 2018, 0:29:04 UTC

So some are getting tasks?

I am not. Yet.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1969929 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969930 - Posted: 12 Dec 2018, 0:31:49 UTC

All reported but nothing coming in.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969930 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1969936 - Posted: 12 Dec 2018, 1:21:06 UTC - in response to Message 1969929.  

So some are getting tasks?

I am not. Yet.

Tom


. . Nor I ..

Stephen
ID: 1969936 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1969937 - Posted: 12 Dec 2018, 1:24:08 UTC

starting to trickle in over here. got my first batch of 11 lol.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1969937 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1969940 - Posted: 12 Dec 2018, 1:58:32 UTC

. . Damn, I finally got 2 whole tasks and they won't download ... :(

. . Out of work very soon ...

Stephen

:(
ID: 1969940 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1969943 - Posted: 12 Dec 2018, 2:09:36 UTC

Full cachies :)
ID: 1969943 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11360
Credit: 29,581,041
RAC: 66
United States
Message 1969946 - Posted: 12 Dec 2018, 2:18:08 UTC - in response to Message 1969940.  

ATM downloading is an issue here also.
ID: 1969946 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969947 - Posted: 12 Dec 2018, 2:24:46 UTC - in response to Message 1969946.  

ATM downloading is an issue here also.

Same here.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969947 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969949 - Posted: 12 Dec 2018, 2:53:15 UTC

Looks like the downloads are getting unstuck. Starting to refill caches.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969949 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1969956 - Posted: 12 Dec 2018, 3:56:46 UTC

Mine refilled up quite quickly here. :-)

Cheers.
ID: 1969956 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1969957 - Posted: 12 Dec 2018, 3:58:58 UTC

My caches have refilled, but 3 splitters are offline [As of 12 Dec 2018, 3:50:04 UTC]
~Chris

ID: 1969957 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969958 - Posted: 12 Dec 2018, 4:01:30 UTC - in response to Message 1969957.  

My caches have refilled, but 3 splitters are offline [As of 12 Dec 2018, 3:50:04 UTC]

Maybe those were the ones that were causing the splitting errors earlier.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969958 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1969966 - Posted: 12 Dec 2018, 4:37:41 UTC - in response to Message 1969956.  

Mine refilled up quite quickly here. :-)

Cheers.


. . Maybe I should move to the highlands ... :)

Stephen

:)
ID: 1969966 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1969976 - Posted: 12 Dec 2018, 6:37:27 UTC - in response to Message 1969812.  

One possible explanation of that is the client request behaviour.

1) If the client requests work, and gets none, it goes into (increasing) backoffs
2) If the client requests work, and gets some, it's free to ask again
3) If the client completes an allocated task, all backoffs are cleared, and it can report the completed task immediately and request more work at the same time.

Collectively, these mean that if you're completely dry, it takes ages to get started.

But if you have a few dregs to complete, or you get a few tasks at an early request, you have a much better chance of asking, and asking, and asking, until you get more.


But, alas, it gets even worse when one has two projects running on the same host. When, for example, the Seti cache runs empty then the "other" project is free to fill the cache to capacity and the host will not even ask for more Seti work because the cache is full. Today I forgot to set Einstein to NNT while the Seti maintenance took place; as a result I have 58 Einstein GPU tasks and 95 Einstein CPU tasks in the cache. The good news - Einstein GPU cache is (roughly) 58 hours of work with a 14-day deadline so Seti scheduler is picking up enough GPU work to keep busy; the bad news - Einstein CPU cache is (roughly) 760 cpu-hours with a 12-day deadline, i.e. heavily over-committed, and in this condition the host/client will not even ask for any CPU tasks. (Resource allocation Seti/Einstein 92%/8% and cache limits of 1.0+0.5) Of course this is NOT a SETI server problem. It is just the way Boinc Manager works. After the 12 days have passed, the cache will fall under the configuration limit and Seti CPU work fetch will resume - unless I forget again(!) to set Einstein NNT on Tuesdays to stop this cycle. My hunch - having not looked at the boinc-mgr code - is that the project work fetch considers only the cc_config count of cores/cpus and does not look at the resource allocation, with the result for me that Einstein fetches 12 times more tasks that it really should. (Long ago there was a thread elsewhere regarding exactly what "resource allocation" should do.)

I have been admonished many times to "not try to micro manage Boinc Manager." But sometimes I just can't help myself.... :(
ID: 1969976 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1969988 - Posted: 12 Dec 2018, 10:21:15 UTC - in response to Message 1969976.  


I have been admonished many times to "not try to micro manage Boinc Manager." But sometimes I just can't help myself.... :(



When I end up with lots of Eienstein@home I simply go to NNT period. Not even letting it run during maintenance. It probably doesn't "help" but at least the tasks don't keep refiling.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1969988 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970026 - Posted: 12 Dec 2018, 20:17:27 UTC

The replica database lag is making a VERY slow crawl downwards after the outage. I think for it to reduce to zero is going to need one of the cron job glitches in the other server processes.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970026 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1970044 - Posted: 13 Dec 2018, 0:06:25 UTC

The replica is almost caught up. Should be easier to view your tasks now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1970044 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1970096 - Posted: 13 Dec 2018, 5:54:14 UTC

Now, if only the Validator would get on top of things...
Grant
Darwin NT
ID: 1970096 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 45 · Next

Message boards : Number crunching : Panic Mode On (114) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.