Panic Mode On (97) Server Problems?

Message boards : Number crunching : Panic Mode On (97) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 33 · Next

AuthorMessage
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22199
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1666782 - Posted: 18 Apr 2015, 11:24:00 UTC

With very few tasks being delivered my crunchers are heading back to their back-up projects
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1666782 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1666818 - Posted: 18 Apr 2015, 14:56:16 UTC - in response to Message 1666803.  

Totally unresponsive servers all over the project now. This is a meaningless waste of electricity. I will shut down for a couple of days, or a couple of months.

The last DB fix, wasn't a fix at all, as I suspected. The reliability of this project went downhill 4 Nov 2014, 23:30:05 UTC, and haven't recovered since then.

Geeze...

I am not fretting about it....
If they run out of work, the GPUs ramp down and I save on the power bill a bit.
I am not quite sure if the DB fix is considered complete yet.
If it is, I am sure Matt will continue to whack away at problems as they come up.
I think some more DB tinkering is on the way in the future.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1666818 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1666823 - Posted: 18 Apr 2015, 15:09:33 UTC

Panic Mode On!

My #1 host is running it's last workunits...

Backup projects are ready...
"Please keep Your signature under four lines so Internet traffic doesn't go up too much"

- In 1992 when I had my first e-mail address -
ID: 1666823 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1666826 - Posted: 18 Apr 2015, 15:27:33 UTC - in response to Message 1666803.  

Totally unresponsive servers all over the project now. This is a meaningless waste of electricity. I will shut down for a couple of days, or a couple of months.

The last DB fix, wasn't a fix at all, as I suspected. The reliability of this project went downhill 4 Nov 2014, 23:30:05 UTC, and haven't recovered since then.

Geeze...

Actually it began a few weeks earlier with the introduction of Server version 705. It had been going downhill all summer, but Server 705 caused bad reactions on my Mac. As usual, you have a tendency to just see the tip of the iceberg. The unidentified problems you see are just an indication of things lurking below. One item has been changed back, now you receive GPU APs first, so now it's the CPUs that sit and wait for the GPU cache to fill. Unfortunately, my Mac GPU cache hasn't been filled since shortly after Server 705, so now, unless you play with settings you won't see any AP CPU work. Macs still have to have at least BOINC 7.2.42 to receive stock Beta work, so, I'm assuming they still require 7.2.42 to receive Main stock GPU tasks as well. Totally bogus, no other platform requires 7.2.42 to receive GPU work or to attach to Beta. What's lurking below?
Too bad you couldn't just drag Server 704 off the shelf and reinstall that...
ID: 1666826 · Report as offensive
Profile Cactus Bob
Avatar

Send message
Joined: 19 May 99
Posts: 209
Credit: 10,924,287
RAC: 29
Canada
Message 1666932 - Posted: 18 Apr 2015, 18:51:19 UTC

In the last hour I got a chuck of 32 MB's (V7 7.07 opencl_ati_cat132) and 10 minutes later another chunk of 7 (V7 7.07 opencl_ati5_nocal). So it is spitting out some just not sure why it reports 0 tasks available most of the time. Haven't gotten any CPU tasks they are all GPU's. It will give me a couple more hours of crunching before I head to another project, which is what I was doing when the MB's downloaded. I had been at 0 tasks for 15 mins and was configuring my GPU for the backup project.

Bob
-----------
Sig file - under construction
Sometimes I wonder, what happened to all the people I gave directions to?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SETI@home classic workunits 4,321
SETI@home classic CPU time 22,169 hours
ID: 1666932 · Report as offensive
Phil Burden

Send message
Joined: 26 Oct 00
Posts: 264
Credit: 22,303,899
RAC: 0
United Kingdom
Message 1666949 - Posted: 18 Apr 2015, 19:33:36 UTC - in response to Message 1666932.  
Last modified: 18 Apr 2015, 19:34:39 UTC

[quote]In the last hour I got a chuck of 32 MB's (V7 7.07 opencl_ati_cat132) and 10 minutes later another chunk of 7 (V7 7.07 opencl_ati5_nocal). So it is spitting out some just not sure why it reports 0 tasks available most of the time. Haven't gotten any CPU tasks they are all GPU's. It will give me a couple more hours of crunching before I head to another project, which is what I was doing when the MB's downloaded. I had been at 0 tasks for 15 mins and was configuring my GPU for the backup project.
/quote]

Aye, just got 20 wu's (GPU) myself, and back to "no tasks available"

P.

ps. and every web page is loading slow, like wading through molasses.
ID: 1666949 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1666955 - Posted: 18 Apr 2015, 19:52:06 UTC - in response to Message 1666949.  

It seems I can't get new work, my CPU is down to it's last task, Einstein will benefit and I'm getting ready for a small panic.
ID: 1666955 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1667010 - Posted: 18 Apr 2015, 21:33:20 UTC - in response to Message 1666932.  

In the last hour I got a chuck of 32 MB's (V7 7.07 opencl_ati_cat132) and 10 minutes later another chunk of 7 (V7 7.07 opencl_ati5_nocal). So it is spitting out some just not sure why it reports 0 tasks available most of the time. Haven't gotten any CPU tasks they are all GPU's. It will give me a couple more hours of crunching before I head to another project, which is what I was doing when the MB's downloaded. I had been at 0 tasks for 15 mins and was configuring my GPU for the backup project.

Remember, the flow of Tasks from the Scheduler is not a continuous feed. The Feeder only holds about 200 Tasks at a time. It refills every 5 seconds or so, but with so many hosts asking for work at any given time, once that 200 is allocated, everybody else gets the "No Tasks Available" message until the Feeder refills. It's all a matter of timing.
Donald
Infernal Optimist / Submariner, retired
ID: 1667010 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1667017 - Posted: 18 Apr 2015, 21:48:21 UTC - in response to Message 1667010.  
Last modified: 18 Apr 2015, 22:04:15 UTC

Remember, the flow of Tasks from the Scheduler is not a continuous feed. The Feeder only holds about 200 Tasks at a time. It refills every 5 seconds or so, but with so many hosts asking for work at any given time, once that 200 is allocated, everybody else gets the "No Tasks Available" message until the Feeder refills. It's all a matter of timing.

Very much so, due to the present issues.
One of my systems still has GPU work, the other is out- the only difference being one was lucky enough to get some GPU work on some of it's requests, the other didn't so the backoffs increased until it pretty much stops asking for work.
Whatever it is, it's seriously broken.


EDIT- After hitting retry every 6 minutes for way too long I finally got some GPU work on that system.
1 single, solitary WU.
Grant
Darwin NT
ID: 1667017 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1667019 - Posted: 18 Apr 2015, 21:49:26 UTC - in response to Message 1667010.  

And my timing has been bad all day.
ID: 1667019 · Report as offensive
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 1667051 - Posted: 18 Apr 2015, 23:47:14 UTC - in response to Message 1667010.  
Last modified: 19 Apr 2015, 0:09:13 UTC

Remember, the flow of Tasks from the Scheduler is not a continuous feed. The Feeder only holds about 200 Tasks at a time. It refills every 5 seconds or so, but with so many hosts asking for work at any given time, once that 200 is allocated, everybody else gets the "No Tasks Available" message until the Feeder refills. It's all a matter of timing.


So the system saturates at 200*12*60 or 144,000 WU/hour? Results returned is currently ~48K/hour leaving excess capacity of nearly 100K/hour currently. If the feeder empties before all requests are filled perhaps it should refresh more rapidly than about every 5 seconds... (it does not escape me that this is approximately equal to the maximum result creation rate)
Also, there is 35-65MB/sec of data leaving the server and not (visibly) going to the lab - that's a lot of work units going somewhere considering all of my boxes are getting the "no work available" message, as is nearly everyone else - maybe Charles Long bought all the servers in new mexico...
Member of the 20 Year Club



ID: 1667051 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1667062 - Posted: 19 Apr 2015, 1:00:57 UTC

My cache has gone from a low of 119 this morning back to 189 just now, I guess I'm a 'lucky' updater. My one running rig updates about every 5 to 10 minutes as it goes through CUDA 50 tasks about that fast.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1667062 · Report as offensive
Victor Wedge
Avatar

Send message
Joined: 3 Apr 04
Posts: 28
Credit: 12,569,503
RAC: 0
Message 1667065 - Posted: 19 Apr 2015, 1:11:37 UTC

I ran out my CUDA WUs almost nine hours ago. I just happened to have the TThrottle graph running and it caught my eye. Otherwise it might have be a day or so before I knew. Dang, one month since I upgraded my GPU and already two outages.
ID: 1667065 · Report as offensive
Admiral Gloval
Avatar

Send message
Joined: 31 Mar 13
Posts: 20257
Credit: 5,308,449
RAC: 0
United States
Message 1667080 - Posted: 19 Apr 2015, 3:07:24 UTC

Have 7 vlar tasks left. Running 4 at a time on cpu. Then the last three. Have almost 8 hours left before backup time.

ID: 1667080 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1667104 - Posted: 19 Apr 2015, 6:09:28 UTC

Definitely something funky going on. Can't say I've ever seen that many ready to send and yet getting nothing ready. Has to be other network traffic throwing a wrench in the works ...
ID: 1667104 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1667109 - Posted: 19 Apr 2015, 6:34:34 UTC - in response to Message 1667104.  
Last modified: 19 Apr 2015, 6:35:17 UTC

Has to be other network traffic throwing a wrench in the works ...

It's well less than 50% of the available bandwidth.
It's a server-side issue.

Although it has picked up some work since things fell over again, my system that still has GPU work will be out of it in the next few hours. My other system that's already out of GPU work will be out of CPU work before morning.
At least the Validator/Assimilator cleanup is still chugging along.
Grant
Darwin NT
ID: 1667109 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1667115 - Posted: 19 Apr 2015, 6:47:28 UTC

I'm starting to think it might be worth considering that after processing 50k task, gracefully stop tasks in progress, reboot (scripted) and come back ready to kick some ass :)
ID: 1667115 · Report as offensive
Profile Cactus Bob
Avatar

Send message
Joined: 19 May 99
Posts: 209
Credit: 10,924,287
RAC: 29
Canada
Message 1667148 - Posted: 19 Apr 2015, 8:48:37 UTC
Last modified: 19 Apr 2015, 8:49:45 UTC

No more WU's.
Am doing backup projects now.
No matter what I try I can not get any tasks from Seti. The web pages are running like molasses as Phil Burden mentioned about 10 posts ago.
Its a sad day in Mudville the mighty Casey has struck out.

Good night all.
Bob

--------
Too tired to care about the sig file
Sometimes I wonder, what happened to all the people I gave directions to?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SETI@home classic workunits 4,321
SETI@home classic CPU time 22,169 hours
ID: 1667148 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1667150 - Posted: 19 Apr 2015, 9:22:44 UTC
Last modified: 19 Apr 2015, 9:26:54 UTC

Accordingly server's statistic there are plenty of tasks available:
Results ready to send 361,603 1 3m
Current result creation rate 0.2828/sec 0.0310/sec 5m

But it constantly replies no tasks:
19/04/2015 12:15:00 | SETI@home | [sched_op] Starting scheduler request
19/04/2015 12:15:00 | SETI@home | Sending scheduler request: To fetch work.
19/04/2015 12:15:00 | SETI@home | Requesting new tasks for CPU and ATI
19/04/2015 12:15:00 | SETI@home | [sched_op] CPU work request: 2391244.91 seconds; 0.00 devices
19/04/2015 12:15:00 | SETI@home | [sched_op] ATI work request: 614471.59 seconds; 0.00 devices
19/04/2015 12:15:03 | SETI@home | Scheduler request completed: got 0 new tasks
19/04/2015 12:15:03 | SETI@home | [sched_op] Server version 705
19/04/2015 12:15:03 | SETI@home | Project has no tasks available
19/04/2015 12:15:03 | SETI@home | Project requested delay of 303 seconds

Seems scripted reboot is right idea. Servers just can't work flawless too long.
P.S. And with current reliability rate local cache of only 100 tasks per device definitely hurts project performance. Such cache can't survive long enough on average modern PC to be useful in case of server issues.
ID: 1667150 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1667160 - Posted: 19 Apr 2015, 10:09:07 UTC

My laptop was almost dry, but then

19/04/2015 10:19:16 | SETI@home | [sched_op] NVIDIA GPU work request: 27162.41 seconds; 0.00 devices
19/04/2015 10:19:18 | SETI@home | Scheduler request completed: got 6 new tasks
19/04/2015 10:19:18 | SETI@home | [sched_op] estimated total NVIDIA GPU task duration: 27485 seconds

(my TZ UTC+1, so less than an hour ago)

Work exists, work is flowing - but slowly/rarely. A problem re-filling the feeder from the database?
ID: 1667160 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 33 · Next

Message boards : Number crunching : Panic Mode On (97) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.