Panic Mode On (54) Server problems?

Message boards : Number crunching : Panic Mode On (54) Server problems?

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

AuthorMessage
IFRS
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 1731
Credit: 258,892,465
RAC: 0
Brazil
Message 1153589 - Posted: 18 Sep 2011, 14:06:44 UTC

It´s not just the client problem. Even if you request work and it´s available, it´s not beeing assigned. It keeps just sending 0 or 1 even if you are drain.
ID: 1153589 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1153592 - Posted: 18 Sep 2011, 14:21:48 UTC
Last modified: 18 Sep 2011, 14:22:58 UTC

I wouldn't be surprised if a lot of feeder slots are permanently occupied by APs no one wants to have (because no doubt AP task duration is badly F-ed up, too).
ID: 1153592 · Report as offensive
WinterKnight
Volunteer tester

Send message
Joined: 18 May 99
Posts: 10194
Credit: 30,545,402
RAC: 3,336
United Kingdom
Message 1153597 - Posted: 18 Sep 2011, 14:28:55 UTC
Last modified: 18 Sep 2011, 14:29:18 UTC

If you are only getting 1 or 2 tasks on your requests when you have a large shortfall then take note of Richards post 1153104 in request issues.

I'm pretty sure that request for 1 second, when there's a near-30,000 second shortfall, is a DCF safety.

Edit - confirmed: I edited DCF by a factor of ten - took out a zero, so 0.013... became 0.13...
ID: 1153597 · Report as offensive
Tutankhamon "Communist"
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6091
Credit: 37,739,432
RAC: 17,815
Sweden
Message 1153600 - Posted: 18 Sep 2011, 14:47:16 UTC - in response to Message 1152837.  
Last modified: 18 Sep 2011, 14:50:17 UTC

Asks for work for days, get nothing, still happy, refuse to whine :-)

END

Edit, added: Come to think of it, if I don't get any work by Sunday evening (local time), I might start some mini whining.



So there, it's Sunday evening and still not one single AP received.

As promised: Whine, whine, whine.


LOL
This is a test of the Emergency Moron System. Had there been a real moron in the room, there would've been a small mushroom cloud in the place where the idiot had been standing.
ID: 1153600 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11143
Credit: 83,851,479
RAC: 46,532
United Kingdom
Message 1153602 - Posted: 18 Sep 2011, 14:52:23 UTC - in response to Message 1153597.  

And conveniently, just as I was reading that, a host I've been monitoring confirmed it:

18/09/2011 15:35:36 | | [work_fetch] NVIDIA GPU: shortfall 132825.49 nidle 0.00 saturated 44294.51 busy 0.00 RS fetchable 100.00 runnable 100.00
18/09/2011 15:35:36 | SETI@home | [work_fetch] NVIDIA GPU: fetch share 1.00 LTD 0.00 backoff dt 0.00 int 0.00
18/09/2011 15:35:36 | | [work_fetch] No project chosen for work fetch
18/09/2011 15:35:50 | SETI@home | Computation for task 12jl11ad.18873.476.9.10.189_0 finished
18/09/2011 15:35:50 | SETI@home | [dcf] DCF: 0.016567->0.021427, raw_ratio 0.021427, adj_ratio 1.293360
18/09/2011 15:36:02 | | [work_fetch] NVIDIA GPU: shortfall 119854.60 nidle 0.00 saturated 57265.40 busy 0.00 RS fetchable 100.00 runnable 100.00
18/09/2011 15:36:02 | SETI@home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (119854.60 sec, 0.00 inst)
18/09/2011 15:36:02 | SETI@home | Reporting 10 completed tasks, requesting new tasks for NVIDIA GPU
18/09/2011 15:36:02 | SETI@home | [sched_op] CPU work request: 0.00 seconds; 0.00 CPUs
18/09/2011 15:36:02 | SETI@home | [sched_op] NVIDIA GPU work request: 119854.60 seconds; 0.00 GPUs

I reckon this weekend goes down as 'revenge of the little guys'.

That host is a 9800GT - as you see, it's teetering above and below the 0.02 DCF 'work fetch' cutoff value - I'm still clearing out some work assigned with stock estimates, VHAR drives DCF below 0.02, mid-AR takes it back above.

In about 20 minutes, the optimised app APR kicks in, with tasks given twice the stock speed estimate. That'll do nicely, and I've got a good big run of shorties lined up (the best part of 200) - they'll be reporting, one or two every six minutes, all evening I reckon.

That's why the "results returned per hour" is so high. Stock crunchers, and the people with lesser CUDA cards, are having a field-day with a download pipe mercifully clear of AP, and plenty of shorties between the VLARs.
ID: 1153602 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7495
Credit: 91,182,298
RAC: 46,074
Australia
Message 1153661 - Posted: 18 Sep 2011, 18:46:27 UTC - in response to Message 1153589.  

It´s not just the client problem. Even if you request work and it´s available, it´s not beeing assigned. It keeps just sending 0 or 1 even if you are drain.

My problem isn't getting 1 or 2 for the GPU, it's getting any at all. Sometimes i get a couple, sometimes a dozen, sometimes a couple of dozen. But invariably they're all crunched before i can download anymore.
"No tasks sent" is the usual message, but there are plenty of "Project has no tasks available" there as well.
Grant
Darwin NT
ID: 1153661 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,622,529
RAC: 334
United States
Message 1153666 - Posted: 18 Sep 2011, 19:13:46 UTC - in response to Message 1153450.  

When BOINC asks for new work, the return message has been "no work available" for AP since last Tuesday.

Either the estimated completion times for new AP work are so huge you can only get a couple at a time, or something got borked with the Scheduler when they put in the patch. The AP Raedy to Send buffer just continues to grow, as the Work in Progress gets less & less.

I'm thinking this is probably the case. Does anyone with a nearly-empty 10+10 cache of AP-only get any new APs? I know the ETA would be astronomical, but if you can normally run through one in ~15 hours, surely you should be able to pick up at least one with a 480-hour work request, unless the ETA is up by 30x.

If it's not that, then feeder is borked.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1153666 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6438
Credit: 31,852,385
RAC: 6,801
United States
Message 1153680 - Posted: 18 Sep 2011, 20:11:46 UTC - in response to Message 1153553.  

It is on automatic, I have not messed with it. Machine is not having "too full" issues at all. Just getting measly amounts of work irregularly. I keep asking, server keeps saying 0. or 1. Occasional 30-40.. after it has gone completely dry again.

I wasn't suggesting you had fiddled with it, I was just asking what it is.

It could be that stopping BOINC, editting DCF to a realistic value, then re-starting BOINC might fix it.

I'm not having a problem d/loading, not as many as normal agreed, but getting more than you. But that is due to the, now, publicised problem I am having with the AP APR. Because as soon as an AP task completes it punches my DCF up to 1.5 min.


I see no flops entry in the app_info.

Janice
ID: 1153680 · Report as offensive
ClaggyProject Donor
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4623
Credit: 46,353,695
RAC: 2,927
United Kingdom
Message 1153721 - Posted: 18 Sep 2011, 23:37:54 UTC

Uploads have dropped to Zero.

Claggy
ID: 1153721 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 502
Credit: 46,959,610
RAC: 13,286
United Kingdom
Message 1153722 - Posted: 18 Sep 2011, 23:39:57 UTC

Uploads have stalled and server status page is not updating.


Kevin


ID: 1153722 · Report as offensive
Profile AkioProject Donor
Avatar

Send message
Joined: 18 May 11
Posts: 373
Credit: 26,523,588
RAC: 12,894
United States
Message 1153723 - Posted: 18 Sep 2011, 23:40:30 UTC - in response to Message 1153721.  

Aye. Uploads are nil. Cricket has plummeted.
ID: 1153723 · Report as offensive
Profile Robert Pick

Send message
Joined: 21 May 05
Posts: 11
Credit: 3,350,959
RAC: 786
United States
Message 1153724 - Posted: 18 Sep 2011, 23:40:55 UTC

Same here!!!!!
ID: 1153724 · Report as offensive
Wembley
Volunteer tester
Avatar

Send message
Joined: 16 Sep 09
Posts: 429
Credit: 1,613,642
RAC: 1,363
United States
Message 1153726 - Posted: 18 Sep 2011, 23:48:29 UTC

Yay! The upload server has died again! Which means my BOINC will soon stop requesting work because of the 2*numprocessors limit!

ID: 1153726 · Report as offensive
Profile arkaynProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4098
Credit: 51,576,341
RAC: 968
United States
Message 1153727 - Posted: 18 Sep 2011, 23:55:38 UTC - in response to Message 1153680.  

It is on automatic, I have not messed with it. Machine is not having "too full" issues at all. Just getting measly amounts of work irregularly. I keep asking, server keeps saying 0. or 1. Occasional 30-40.. after it has gone completely dry again.

I wasn't suggesting you had fiddled with it, I was just asking what it is.

It could be that stopping BOINC, editting DCF to a realistic value, then re-starting BOINC might fix it.

I'm not having a problem d/loading, not as many as normal agreed, but getting more than you. But that is due to the, now, publicised problem I am having with the AP APR. Because as soon as an AP task completes it punches my DCF up to 1.5 min.


I see no flops entry in the app_info.


You will have to manually add the info.
http://setiathome.berkeley.edu/forum_thread.php?id=62293#1055179

Seems to be fairly close after I changed my DCF back to 1.000000 again.

ID: 1153727 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 625
Credit: 5,015,755
RAC: 0
United Kingdom
Message 1153728 - Posted: 18 Sep 2011, 23:59:45 UTC

Is it time for that dreaded water-fowl to present itself?



Don't take life too seriously, as you'll never come out of it alive!
ID: 1153728 · Report as offensive
W5DMG - Dave

Send message
Joined: 19 May 99
Posts: 155
Credit: 33,162,251
RAC: 0
United States
Message 1153740 - Posted: 19 Sep 2011, 0:59:42 UTC - in response to Message 1153728.  

Uploads not working.. :(
ID: 1153740 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,622,529
RAC: 334
United States
Message 1153750 - Posted: 19 Sep 2011, 1:39:42 UTC

Maybe uploads died because APs aren't being handed out and the storage got full? That's happened numerous times. Or did they go and put uploads and WU storage on separate volumes? I thought I remembered reading something about that.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1153750 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 39,919,434
RAC: 28,318
United Kingdom
Message 1153751 - Posted: 19 Sep 2011, 1:48:09 UTC - in response to Message 1153728.  

Is it time for that dreaded water-fowl to present itself?


If its the one i think you mean, our fowl watery fiend only comes out to play when the grass is green :¬)

But i think our crickets have turned into locusts and there will be nothing green left before long, the uploads line seems to have crashed :¬(
ID: 1153751 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45949
Credit: 815,454,810
RAC: 124,547
United States
Message 1153765 - Posted: 19 Sep 2011, 3:32:36 UTC

I am actually surprised it held as long as it did.
Luckily tomorrow is Monday, and somebody should be in the lab to set things upright again....
Then we wait for the Boinc server code to be straightened out.
Always remember.....kitties are all Angels with fur.

Have made friends in this life.
Most were cats.
ID: 1153765 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 29588
Credit: 49,133,116
RAC: 17,246
Germany
Message 1153776 - Posted: 19 Sep 2011, 4:06:47 UTC

Astropuleses are turned off and the servers dont survive the weekend.
I´m wondering.......

With each crime and every kindness we birth our future.
ID: 1153776 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (54) Server problems?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.