Panic Mode On (54) Server problems?

Message boards : Number crunching : Panic Mode On (54) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

AuthorMessage
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19043
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1153597 - Posted: 18 Sep 2011, 14:28:55 UTC
Last modified: 18 Sep 2011, 14:29:18 UTC

If you are only getting 1 or 2 tasks on your requests when you have a large shortfall then take note of Richards post 1153104 in request issues.

I'm pretty sure that request for 1 second, when there's a near-30,000 second shortfall, is a DCF safety.

Edit - confirmed: I edited DCF by a factor of ten - took out a zero, so 0.013... became 0.13...
ID: 1153597 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1153602 - Posted: 18 Sep 2011, 14:52:23 UTC - in response to Message 1153597.  

And conveniently, just as I was reading that, a host I've been monitoring confirmed it:

18/09/2011 15:35:36 | | [work_fetch] NVIDIA GPU: shortfall 132825.49 nidle 0.00 saturated 44294.51 busy 0.00 RS fetchable 100.00 runnable 100.00
18/09/2011 15:35:36 | SETI@home | [work_fetch] NVIDIA GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
18/09/2011 15:35:36 | | [work_fetch] No project chosen for work fetch
18/09/2011 15:35:50 | SETI@home | Computation for task 12jl11ad.18873.476.9.10.189_0 finished
18/09/2011 15:35:50 | SETI@home | [dcf] DCF: 0.016567->0.021427, raw_ratio 0.021427, adj_ratio 1.293360
18/09/2011 15:36:02 | | [work_fetch] NVIDIA GPU: shortfall 119854.60 nidle 0.00 saturated 57265.40 busy 0.00 RS fetchable 100.00 runnable 100.00
18/09/2011 15:36:02 | SETI@home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (119854.60 sec, 0.00 inst)
18/09/2011 15:36:02 | SETI@home | Reporting 10 completed tasks, requesting new tasks for NVIDIA GPU
18/09/2011 15:36:02 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs
18/09/2011 15:36:02 | SETI@home | [sched_op] NVIDIA GPU work request: 119854.60 seconds; 0.00 GPUs

I reckon this weekend goes down as 'revenge of the little guys'.

That host is a 9800GT - as you see, it's teetering above and below the 0.02 DCF 'work fetch' cutoff value - I'm still clearing out some work assigned with stock estimates, VHAR drives DCF below 0.02, mid-AR takes it back above.

In about 20 minutes, the optimised app APR kicks in, with tasks given twice the stock speed estimate. That'll do nicely, and I've got a good big run of shorties lined up (the best part of 200) - they'll be reporting, one or two every six minutes, all evening I reckon.

That's why the "results returned per hour" is so high. Stock crunchers, and the people with lesser CUDA cards, are having a field-day with a download pipe mercifully clear of AP, and plenty of shorties between the VLARs.
ID: 1153602 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1153661 - Posted: 18 Sep 2011, 18:46:27 UTC - in response to Message 1153589.  

It´s not just the client problem. Even if you request work and it´s available, it´s not beeing assigned. It keeps just sending 0 or 1 even if you are drain.

My problem isn't getting 1 or 2 for the GPU, it's getting any at all. Sometimes i get a couple, sometimes a dozen, sometimes a couple of dozen. But invariably they're all crunched before i can download anymore.
"No tasks sent" is the usual message, but there are plenty of "Project has no tasks available" there as well.
Grant
Darwin NT
ID: 1153661 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1153666 - Posted: 18 Sep 2011, 19:13:46 UTC - in response to Message 1153450.  

When BOINC asks for new work, the return message has been "no work available" for AP since last Tuesday.

Either the estimated completion times for new AP work are so huge you can only get a couple at a time, or something got borked with the Scheduler when they put in the patch. The AP Raedy to Send buffer just continues to grow, as the Work in Progress gets less & less.

I'm thinking this is probably the case. Does anyone with a nearly-empty 10+10 cache of AP-only get any new APs? I know the ETA would be astronomical, but if you can normally run through one in ~15 hours, surely you should be able to pick up at least one with a 480-hour work request, unless the ETA is up by 30x.

If it's not that, then feeder is borked.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1153666 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1153680 - Posted: 18 Sep 2011, 20:11:46 UTC - in response to Message 1153553.  

It is on automatic, I have not messed with it. Machine is not having "too full" issues at all. Just getting measly amounts of work irregularly. I keep asking, server keeps saying 0. or 1. Occasional 30-40.. after it has gone completely dry again.

I wasn't suggesting you had fiddled with it, I was just asking what it is.

It could be that stopping BOINC, editting DCF to a realistic value, then re-starting BOINC might fix it.

I'm not having a problem d/loading, not as many as normal agreed, but getting more than you. But that is due to the, now, publicised problem I am having with the AP APR. Because as soon as an AP task completes it punches my DCF up to 1.5 min.


I see no flops entry in the app_info.
Janice
ID: 1153680 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1153721 - Posted: 18 Sep 2011, 23:37:54 UTC

Uploads have dropped to Zero.

Claggy
ID: 1153721 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1153722 - Posted: 18 Sep 2011, 23:39:57 UTC

Uploads have stalled and server status page is not updating.


Kevin


ID: 1153722 · Report as offensive
Profile Akio
Avatar

Send message
Joined: 18 May 11
Posts: 375
Credit: 32,129,242
RAC: 0
United States
Message 1153723 - Posted: 18 Sep 2011, 23:40:30 UTC - in response to Message 1153721.  

Aye. Uploads are nil. Cricket has plummeted.
ID: 1153723 · Report as offensive
Profile Robert Pick

Send message
Joined: 21 May 05
Posts: 11
Credit: 6,592,540
RAC: 18
United States
Message 1153724 - Posted: 18 Sep 2011, 23:40:55 UTC

Same here!!!!!
ID: 1153724 · Report as offensive
Wembley
Volunteer tester
Avatar

Send message
Joined: 16 Sep 09
Posts: 429
Credit: 1,844,293
RAC: 0
United States
Message 1153726 - Posted: 18 Sep 2011, 23:48:29 UTC

Yay! The upload server has died again! Which means my BOINC will soon stop requesting work because of the 2*numprocessors limit!

ID: 1153726 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1153727 - Posted: 18 Sep 2011, 23:55:38 UTC - in response to Message 1153680.  

It is on automatic, I have not messed with it. Machine is not having "too full" issues at all. Just getting measly amounts of work irregularly. I keep asking, server keeps saying 0. or 1. Occasional 30-40.. after it has gone completely dry again.

I wasn't suggesting you had fiddled with it, I was just asking what it is.

It could be that stopping BOINC, editting DCF to a realistic value, then re-starting BOINC might fix it.

I'm not having a problem d/loading, not as many as normal agreed, but getting more than you. But that is due to the, now, publicised problem I am having with the AP APR. Because as soon as an AP task completes it punches my DCF up to 1.5 min.


I see no flops entry in the app_info.


You will have to manually add the info.
http://setiathome.berkeley.edu/forum_thread.php?id=62293#1055179

Seems to be fairly close after I changed my DCF back to 1.000000 again.

ID: 1153727 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 1153728 - Posted: 18 Sep 2011, 23:59:45 UTC

Is it time for that dreaded water-fowl to present itself?



Don't take life too seriously, as you'll never come out of it alive!
ID: 1153728 · Report as offensive
W5DMG - Dave

Send message
Joined: 19 May 99
Posts: 155
Credit: 33,162,251
RAC: 0
United States
Message 1153740 - Posted: 19 Sep 2011, 0:59:42 UTC - in response to Message 1153728.  

Uploads not working.. :(
ID: 1153740 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1153750 - Posted: 19 Sep 2011, 1:39:42 UTC

Maybe uploads died because APs aren't being handed out and the storage got full? That's happened numerous times. Or did they go and put uploads and WU storage on separate volumes? I thought I remembered reading something about that.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1153750 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1153751 - Posted: 19 Sep 2011, 1:48:09 UTC - in response to Message 1153728.  

Is it time for that dreaded water-fowl to present itself?


If its the one i think you mean, our fowl watery fiend only comes out to play when the grass is green :¬)

But i think our crickets have turned into locusts and there will be nothing green left before long, the uploads line seems to have crashed :¬(
ID: 1153751 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1153765 - Posted: 19 Sep 2011, 3:32:36 UTC

I am actually surprised it held as long as it did.
Luckily tomorrow is Monday, and somebody should be in the lab to set things upright again....
Then we wait for the Boinc server code to be straightened out.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1153765 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1153776 - Posted: 19 Sep 2011, 4:06:47 UTC

Astropuleses are turned off and the servers dont survive the weekend.
I´m wondering.......



With each crime and every kindness we birth our future.
ID: 1153776 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19043
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1153802 - Posted: 19 Sep 2011, 6:13:25 UTC - in response to Message 1153776.  

Astropuleses are turned off and the servers dont survive the weekend.
I´m wondering.......

Too many tasks stored in the database, maybe.
Lots more MB taks have been downloaded and returned, and a high number of AP;s not even sent out.
ID: 1153802 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1153824 - Posted: 19 Sep 2011, 10:12:01 UTC - in response to Message 1153802.  


The MB Assimilators didn't appear to be working- the backlog was growing minute by minute.
Grant
Darwin NT
ID: 1153824 · Report as offensive
geoff

Send message
Joined: 25 Apr 00
Posts: 123
Credit: 34,100,351
RAC: 18
United Kingdom
Message 1153829 - Posted: 19 Sep 2011, 10:36:27 UTC

I am in the process of downloading 6 AP WUs with completion times of 10x. I thought AP downloads were turned off.
ID: 1153829 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (54) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.