Panic Mode On (54) Server problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (54) Server problems?

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next
Author Message
Profile Firehawk
Volunteer tester
Avatar
Send message
Joined: 21 May 99
Posts: 1730
Credit: 257,174,787
RAC: 148,229
Brazil
Message 1153589 - Posted: 18 Sep 2011, 14:06:44 UTC

It´s not just the client problem. Even if you request work and it´s available, it´s not beeing assigned. It keeps just sending 0 or 1 even if you are drain.
____________

Profile Khangollo
Avatar
Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1153592 - Posted: 18 Sep 2011, 14:21:48 UTC
Last modified: 18 Sep 2011, 14:22:58 UTC

I wouldn't be surprised if a lot of feeder slots are permanently occupied by APs no one wants to have (because no doubt AP task duration is badly F-ed up, too).
____________

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8630
Credit: 23,717,435
RAC: 19,020
United Kingdom
Message 1153597 - Posted: 18 Sep 2011, 14:28:55 UTC
Last modified: 18 Sep 2011, 14:29:18 UTC

If you are only getting 1 or 2 tasks on your requests when you have a large shortfall then take note of Richards post 1153104 in request issues.

I'm pretty sure that request for 1 second, when there's a near-30,000 second shortfall, is a DCF safety.

Edit - confirmed: I edited DCF by a factor of ten - took out a zero, so 0.013... became 0.13...

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3404
Credit: 19,614,037
RAC: 18,573
Sweden
Message 1153600 - Posted: 18 Sep 2011, 14:47:16 UTC - in response to Message 1152837.
Last modified: 18 Sep 2011, 14:50:17 UTC

Asks for work for days, get nothing, still happy, refuse to whine :-)

END

Edit, added: Come to think of it, if I don't get any work by Sunday evening (local time), I might start some mini whining.



So there, it's Sunday evening and still not one single AP received.

As promised: Whine, whine, whine.


LOL
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8465
Credit: 48,923,262
RAC: 76,017
United Kingdom
Message 1153602 - Posted: 18 Sep 2011, 14:52:23 UTC - in response to Message 1153597.

And conveniently, just as I was reading that, a host I've been monitoring confirmed it:

18/09/2011 15:35:36 | | [work_fetch] NVIDIA GPU: shortfall 132825.49 nidle 0.00 saturated 44294.51 busy 0.00 RS fetchable 100.00 runnable 100.00
18/09/2011 15:35:36 | SETI@home | [work_fetch] NVIDIA GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
18/09/2011 15:35:36 | | [work_fetch] No project chosen for work fetch
18/09/2011 15:35:50 | SETI@home | Computation for task 12jl11ad.18873.476.9.10.189_0 finished
18/09/2011 15:35:50 | SETI@home | [dcf] DCF: 0.016567->0.021427, raw_ratio 0.021427, adj_ratio 1.293360
18/09/2011 15:36:02 | | [work_fetch] NVIDIA GPU: shortfall 119854.60 nidle 0.00 saturated 57265.40 busy 0.00 RS fetchable 100.00 runnable 100.00
18/09/2011 15:36:02 | SETI@home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (119854.60 sec, 0.00 inst)
18/09/2011 15:36:02 | SETI@home | Reporting 10 completed tasks, requesting new tasks for NVIDIA GPU
18/09/2011 15:36:02 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs
18/09/2011 15:36:02 | SETI@home | [sched_op] NVIDIA GPU work request: 119854.60 seconds; 0.00 GPUs

I reckon this weekend goes down as 'revenge of the little guys'.

That host is a 9800GT - as you see, it's teetering above and below the 0.02 DCF 'work fetch' cutoff value - I'm still clearing out some work assigned with stock estimates, VHAR drives DCF below 0.02, mid-AR takes it back above.

In about 20 minutes, the optimised app APR kicks in, with tasks given twice the stock speed estimate. That'll do nicely, and I've got a good big run of shorties lined up (the best part of 200) - they'll be reporting, one or two every six minutes, all evening I reckon.

That's why the "results returned per hour" is so high. Stock crunchers, and the people with lesser CUDA cards, are having a field-day with a download pipe mercifully clear of AP, and plenty of shorties between the VLARs.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5791
Credit: 58,011,663
RAC: 47,915
Australia
Message 1153661 - Posted: 18 Sep 2011, 18:46:27 UTC - in response to Message 1153589.

It´s not just the client problem. Even if you request work and it´s available, it´s not beeing assigned. It keeps just sending 0 or 1 even if you are drain.

My problem isn't getting 1 or 2 for the GPU, it's getting any at all. Sometimes i get a couple, sometimes a dozen, sometimes a couple of dozen. But invariably they're all crunched before i can download anymore.
"No tasks sent" is the usual message, but there are plenty of "Project has no tasks available" there as well.
____________
Grant
Darwin NT.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2245
Credit: 8,596,056
RAC: 4,332
United States
Message 1153666 - Posted: 18 Sep 2011, 19:13:46 UTC - in response to Message 1153450.

When BOINC asks for new work, the return message has been "no work available" for AP since last Tuesday.

Either the estimated completion times for new AP work are so huge you can only get a couple at a time, or something got borked with the Scheduler when they put in the patch. The AP Raedy to Send buffer just continues to grow, as the Work in Progress gets less & less.

I'm thinking this is probably the case. Does anyone with a nearly-empty 10+10 cache of AP-only get any new APs? I know the ETA would be astronomical, but if you can normally run through one in ~15 hours, surely you should be able to pick up at least one with a 480-hour work request, unless the ETA is up by 30x.

If it's not that, then feeder is borked.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,631,059
RAC: 94
United States
Message 1153680 - Posted: 18 Sep 2011, 20:11:46 UTC - in response to Message 1153553.

It is on automatic, I have not messed with it. Machine is not having "too full" issues at all. Just getting measly amounts of work irregularly. I keep asking, server keeps saying 0. or 1. Occasional 30-40.. after it has gone completely dry again.

I wasn't suggesting you had fiddled with it, I was just asking what it is.

It could be that stopping BOINC, editting DCF to a realistic value, then re-starting BOINC might fix it.

I'm not having a problem d/loading, not as many as normal agreed, but getting more than you. But that is due to the, now, publicised problem I am having with the AP APR. Because as soon as an AP task completes it punches my DCF up to 1.5 min.


I see no flops entry in the app_info.
____________

Janice

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4067
Credit: 32,897,427
RAC: 7,597
United Kingdom
Message 1153721 - Posted: 18 Sep 2011, 23:37:54 UTC

Uploads have dropped to Zero.

Claggy

Kevin Olley
Send message
Joined: 3 Aug 99
Posts: 368
Credit: 35,233,741
RAC: 1,619
United Kingdom
Message 1153722 - Posted: 18 Sep 2011, 23:39:57 UTC

Uploads have stalled and server status page is not updating.


____________
Kevin


Profile SliverProject donor
Avatar
Send message
Joined: 18 May 11
Posts: 281
Credit: 7,058,946
RAC: 915
United States
Message 1153723 - Posted: 18 Sep 2011, 23:40:30 UTC - in response to Message 1153721.

Aye. Uploads are nil. Cricket has plummeted.
____________

Profile Robert Pick
Send message
Joined: 21 May 05
Posts: 11
Credit: 2,237,491
RAC: 196
United States
Message 1153724 - Posted: 18 Sep 2011, 23:40:55 UTC

Same here!!!!!
____________

Wembley
Volunteer tester
Avatar
Send message
Joined: 16 Sep 09
Posts: 415
Credit: 888,257
RAC: 0
United States
Message 1153726 - Posted: 18 Sep 2011, 23:48:29 UTC

Yay! The upload server has died again! Which means my BOINC will soon stop requesting work because of the 2*numprocessors limit!

____________


Donate with your searches and online buys:
http://www.goodsearch.com/toolbar/university-of-california-setihome

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3622
Credit: 48,546,981
RAC: 30,899
United States
Message 1153727 - Posted: 18 Sep 2011, 23:55:38 UTC - in response to Message 1153680.

It is on automatic, I have not messed with it. Machine is not having "too full" issues at all. Just getting measly amounts of work irregularly. I keep asking, server keeps saying 0. or 1. Occasional 30-40.. after it has gone completely dry again.

I wasn't suggesting you had fiddled with it, I was just asking what it is.

It could be that stopping BOINC, editting DCF to a realistic value, then re-starting BOINC might fix it.

I'm not having a problem d/loading, not as many as normal agreed, but getting more than you. But that is due to the, now, publicised problem I am having with the AP APR. Because as soon as an AP task completes it punches my DCF up to 1.5 min.


I see no flops entry in the app_info.


You will have to manually add the info.
http://setiathome.berkeley.edu/forum_thread.php?id=62293#1055179

Seems to be fairly close after I changed my DCF back to 1.000000 again.
____________

Iona
Avatar
Send message
Joined: 12 Jul 07
Posts: 551
Credit: 2,769,915
RAC: 2,173
United Kingdom
Message 1153728 - Posted: 18 Sep 2011, 23:59:45 UTC

Is it time for that dreaded water-fowl to present itself?



____________
Don't take life too seriously, as you'll never come out of it alive!

W5DMG - Dave
Send message
Joined: 19 May 99
Posts: 155
Credit: 32,508,360
RAC: 10,759
United States
Message 1153740 - Posted: 19 Sep 2011, 0:59:42 UTC - in response to Message 1153728.

Uploads not working.. :(

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2245
Credit: 8,596,056
RAC: 4,332
United States
Message 1153750 - Posted: 19 Sep 2011, 1:39:42 UTC

Maybe uploads died because APs aren't being handed out and the storage got full? That's happened numerous times. Or did they go and put uploads and WU storage on separate volumes? I thought I remembered reading something about that.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

clive G1FYE
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,054,144
RAC: 0
United Kingdom
Message 1153751 - Posted: 19 Sep 2011, 1:48:09 UTC - in response to Message 1153728.

Is it time for that dreaded water-fowl to present itself?


If its the one i think you mean, our fowl watery fiend only comes out to play when the grass is green :¬)

But i think our crickets have turned into locusts and there will be nothing green left before long, the uploads line seems to have crashed :¬(

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38921
Credit: 578,552,517
RAC: 516,837
United States
Message 1153765 - Posted: 19 Sep 2011, 3:32:36 UTC

I am actually surprised it held as long as it did.
Luckily tomorrow is Monday, and somebody should be in the lab to set things upright again....
Then we wait for the Boinc server code to be straightened out.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 23799
Credit: 32,617,636
RAC: 23,694
Germany
Message 1153776 - Posted: 19 Sep 2011, 4:06:47 UTC

Astropuleses are turned off and the servers dont survive the weekend.
I´m wondering.......

____________

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (54) Server problems?

Copyright © 2014 University of California