Short estimated runtimes - don't panic

Message boards : Number crunching : Short estimated runtimes - don't panic
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile red-ray
Avatar

Send message
Joined: 24 Jun 99
Posts: 308
Credit: 9,029,848
RAC: 0
United Kingdom
Message 1222765 - Posted: 24 Apr 2012, 11:51:10 UTC - in response to Message 1222731.  
Last modified: 24 Apr 2012, 11:54:32 UTC

Most of those were run on your GT 520 - I forget where that comes in the speed range. And all of them have 'difficult' ARs, which extend the runtime a long way beyond expectations.

In your rather specialised environment (2 x GTX 460, GT 430, GT 520), you may have to help BOINC out by setting a <flops> value closer to the speed of the slowest, rather than relying on the APR which will be heavily weighted by the two fast cards.

I had flops set to 6.0e09 and given the GT 520 is 156 GFLOPS I would have thought that would be low enough. I have just changed it to 3.0e09 which is starting to move the DCF generally down. What value do you think I need to set please?

Given BOINC knows the Peak GFLOPS for all the GPUs I feel it should automatically allow for the slow GPUs. I suspect the code to fix this is at most 20 lines and would also give a steady DCF and better estimates. On my other system with a GTX 680 and GTX 460 the DCFs are 0.7 and 1.4 which would also be addressed by such a change.

24/04/2012 01:28:43 |  | NVIDIA GPU 0: GeForce GTX 460 (driver version 285.62, CUDA version 4.10, compute capability 2.1, 1024MB, 790MB available, 1025 GFLOPS peak)
24/04/2012 01:28:43 |  | NVIDIA GPU 1: GeForce GT  430 (driver version 285.62, CUDA version 4.10, compute capability 2.1,  512MB, 384MB available,  269 GFLOPS peak)
24/04/2012 01:28:43 |  | NVIDIA GPU 2: GeForce GTX 460 (driver version 285.62, CUDA version 4.10, compute capability 2.1, 1024MB, 835MB available, 1025 GFLOPS peak)
24/04/2012 01:28:43 |  | NVIDIA GPU 3: GeForce GT  520 (driver version 285.62, CUDA version 4.10, compute capability 2.1,  512MB, 362MB available,  156 GFLOPS peak)
ID: 1222765 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1222897 - Posted: 24 Apr 2012, 22:12:32 UTC

Looks like the APR adjustment has gone ahead as planned - I'm back in the land of the 36-second shorties. Just be a bit of a heavy over-fetch for a day or so - apologies while we hog the bandwidth, again.
ID: 1222897 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6658
Credit: 121,090,076
RAC: 0
United States
Message 1222900 - Posted: 24 Apr 2012, 22:15:39 UTC - in response to Message 1222897.  

Looks like the APR adjustment has gone ahead as planned - I'm back in the land of the 36-second shorties. Just be a bit of a heavy over-fetch for a day or so - apologies while we hog the bandwidth, again.

Apoligies are not required. I am so happy for your, and LadyL's efforts to finally begin the process of restoring things to a more normal order.

Thank you!

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1222900 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1222927 - Posted: 24 Apr 2012, 23:15:28 UTC - in response to Message 1222897.  

Looks like the APR adjustment has gone ahead as planned - I'm back in the land of the 36-second shorties. Just be a bit of a heavy over-fetch for a day or so

Did they up the server side limits again, or just tweak the APR numbers?

Grant
Darwin NT
ID: 1222927 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1223029 - Posted: 25 Apr 2012, 3:15:40 UTC - in response to Message 1222927.  


Doesn't appear any different to me- estimated times are about the same (maybe a few minuts less), serverside limits are still the same as before.
Grant
Darwin NT
ID: 1223029 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1223044 - Posted: 25 Apr 2012, 4:22:20 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=945683558
It is the exceptionally short deadlines that bother me more...


Janice
ID: 1223044 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1223049 - Posted: 25 Apr 2012, 4:31:42 UTC - in response to Message 1223044.  


Short deadlines?
Shortest deadline i've noticed is still 3 weeks away.

What you've linked to look like VLARs that were allocated to the CPU, didn't get downloaded & became ghost. When BOINC tried again, the server re-issued them, but to the GPU. GPU & VLAR don't go together so it was an automatic timeout.
That's my guess anyway.
Grant
Darwin NT
ID: 1223049 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1223118 - Posted: 25 Apr 2012, 10:01:57 UTC - in response to Message 1223044.  

http://setiathome.berkeley.edu/workunit.php?wuid=945683558
It is the exceptionally short deadlines that bother me more..


Grant is correct. See http://lunatics.kwsn.net/1-discussion-forum/faq-read-only.msg47867.html#msg47867

You'll only see shorter estimates if your DCF was below 1, because of a fast GPU.

Limits on tasks in progress have to stay in place some longer, but then we'll lobby for removal.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1223118 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1223241 - Posted: 25 Apr 2012, 18:37:29 UTC - in response to Message 1223049.  


Short deadlines?
Shortest deadline i've noticed is still 3 weeks away.

What you've linked to look like VLARs that were allocated to the CPU, didn't get downloaded & became ghost. When BOINC tried again, the server re-issued them, but to the GPU. GPU & VLAR don't go together so it was an automatic timeout.
That's my guess anyway.

If you will notice it was sent approx 10 minutes before expiration.. and several at once. That was one example.

You will not find these in your boinc program because they already expired.
Janice
ID: 1223241 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1223574 - Posted: 26 Apr 2012, 14:38:41 UTC - in response to Message 1223118.  

http://setiathome.berkeley.edu/workunit.php?wuid=945683558
It is the exceptionally short deadlines that bother me more..


Grant is correct. See http://lunatics.kwsn.net/1-discussion-forum/faq-read-only.msg47867.html#msg47867

You'll only see shorter estimates if your DCF was below 1, because of a fast GPU.

Limits on tasks in progress have to stay in place some longer, but then we'll lobby for removal.

Even if they would consider doubling the current limits for a while before removing them, it would be a welcome step towards normalcy.

Meow.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1223574 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1224079 - Posted: 27 Apr 2012, 20:54:57 UTC - in response to Message 1223574.  


Has there been another change to the APR numbers?
Checked my machines this morning, and one of them had a DCF of 2. Edited it back to one & low & behold- CPU & GPU estimated times to completion are pretty damn close to what it will be. Checked my second machine, DCF is still around 1 and for it too the CPU & GPU estimates are pretty damn close.
Last night they were still way out.
Grant
Darwin NT
ID: 1224079 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1224095 - Posted: 27 Apr 2012, 21:36:56 UTC - in response to Message 1224079.  


Has there been another change to the APR numbers?
Checked my machines this morning, and one of them had a DCF of 2. Edited it back to one & low & behold- CPU & GPU estimated times to completion are pretty damn close to what it will be. Checked my second machine, DCF is still around 1 and for it too the CPU & GPU estimates are pretty damn close.
Last night they were still way out.

I don't think so, and it's too late here to check now - I'll be crashing soon.

What can happen is that a CUDA GPU can encounter a rare job which is close to, but not quite, VLAR. They take much longer than normal to process, and your local BOINC client - protectively - raises DCF in case there are more of the same in your cache. And because of BOINC's design, raising DCF raises the estimates for all jobs, whichever processor they're going to use.

Rogue tasks like this do tend to occur in clumps, so it may not help to reduce DCF manually like you did - it'll just get bumped up again when the next one finishes. But if you can see there is one task with an exceptionally long runtime - which will probably not be reported for a long time - and no more like it, then it may be worth doing. Whichever way you choose to play it, BOINC will probably adjuct DCF back to normal levels - and that means 1.0000 or near enough - gradually over the course of the next 15-20 tasks.
ID: 1224095 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 1224544 - Posted: 28 Apr 2012, 19:51:34 UTC

Well, as I know only too well, according to S@H's wonderfully accurate figures, which were arrived at through some equally enlightened logic, my GPU has an alleged APR of 1390.280653876 for the Lunatics MB app, which probably makes it the fastest GPU that ATI (and anyone else) has ever produced! I admit that I do 'under-clock' it a bit when doing GPU work, but there is no way I can crank it up to about 10GHz to complete any of the GPU work downloaded this week. I've got the latest version of BOINC and the latest Lunatics apps and I'm now seeing estimated completion times of between 1min 30secs or so and 4 mins 30 secs or so. Needless to say, everything I have tried to complete, has errored with the 'timed out', Exit status -177 (0xffffffffffffff4f) ERR_RSC_LIMIT_EXCEEDED. If work cannot be completed, then there is no chance of the overly high APR figure reaching it's proper level. Therefore, respectfully, Captain Mainwaring, would you mind awfully, if I joined Corporal Jones and panicked?

Yours etc, etc, Sgt. Wilson


Don't take life too seriously, as you'll never come out of it alive!
ID: 1224544 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1224546 - Posted: 28 Apr 2012, 19:59:27 UTC - in response to Message 1224544.  

Well, as I know only too well, according to S@H's wonderfully accurate figures, which were arrived at through some equally enlightened logic, my GPU has an alleged APR of 1390.280653876 for the Lunatics MB app, which probably makes it the fastest GPU that ATI (and anyone else) has ever produced! I admit that I do 'under-clock' it a bit when doing GPU work, but there is no way I can crank it up to about 10GHz to complete any of the GPU work downloaded this week. I've got the latest version of BOINC and the latest Lunatics apps and I'm now seeing estimated completion times of between 1min 30secs or so and 4 mins 30 secs or so. Needless to say, everything I have tried to complete, has errored with the 'timed out', Exit status -177 (0xffffffffffffff4f) ERR_RSC_LIMIT_EXCEEDED. If work cannot be completed, then there is no chance of the overly high APR figure reaching it's proper level. Therefore, respectfully, Captain Mainwaring, would you mind awfully, if I joined Corporal Jones and panicked?

Yours etc, etc, Sgt. Wilson


And who's delightfully enlighten'd logic might you be referring to??

LO F'ing Loud.

Rich.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1224546 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1224549 - Posted: 28 Apr 2012, 20:09:33 UTC - in response to Message 1224544.  

Well, as I know only too well, according to S@H's wonderfully accurate figures, which were arrived at through some equally enlightened logic, my GPU has an alleged APR of 1390.280653876 for the Lunatics MB app, which probably makes it the fastest GPU that ATI (and anyone else) has ever produced! I admit that I do 'under-clock' it a bit when doing GPU work, but there is no way I can crank it up to about 10GHz to complete any of the GPU work downloaded this week. I've got the latest version of BOINC and the latest Lunatics apps and I'm now seeing estimated completion times of between 1min 30secs or so and 4 mins 30 secs or so. Needless to say, everything I have tried to complete, has errored with the 'timed out', Exit status -177 (0xffffffffffffff4f) ERR_RSC_LIMIT_EXCEEDED. If work cannot be completed, then there is no chance of the overly high APR figure reaching it's proper level. Therefore, respectfully, Captain Mainwaring, would you mind awfully, if I joined Corporal Jones and panicked?

Yours etc, etc, Sgt. Wilson

LOL.

But if you wanted, I can think of one thing you could do: fool BOINC into thinking that you're trying to cheat.

Then, it would give you a new HostID, and a brand new APR record to train up from scratch.

The easiest way I know of deliberately changing HostID, is this -

Stop BOINC
Edit client_state.xml
Find the setiathome <project> section
About a dozen lines into it, find the tag <rpc_seqno>
Reduce the numeric value, and save the file
Restart BOINC
Update the project

You shouldn't need to re-install any programs - you'll still have Lunatics on board. But you should get fresh work, probably over-estimated at first, and the opportunity to return it without error.
ID: 1224549 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 1224826 - Posted: 29 Apr 2012, 9:21:47 UTC - in response to Message 1224549.  

I had a feeling thats what I would have to do - in fact, I was looking around in the Client State, last night with 'sabotage' in mind. I'll complete the CPU WUs and the GPU AP WU and do just that and give the PC another name for good measure, afterwards. On the other PC, which has the Quad core in it, since MB and AP GPU work was only enabled via the Lunatics app approximately three months ago, there is no APR at all for any of the GPU apps...APR is not even mentioned! Needless to say, that machine has no problems at all. A pity about having to 'cheat' my way to a new Host ID - I'd almost got attached to the current one. Thanks awfully, for your responses and solutions; I'm off down to the local to have a drink with Corporal Jones, the butcher - the poor chap has gone right off his head, yelling, "don't panic, don't panic, the weekend isn't over"! Whatever could he mean?

Yours etc, etc, respectfully, Sgt. F. Wilson (Home Guard, retd)


Don't take life too seriously, as you'll never come out of it alive!
ID: 1224826 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1224838 - Posted: 29 Apr 2012, 10:26:33 UTC - in response to Message 1224826.  

I had a feeling thats what I would have to do - in fact, I was looking around in the Client State, last night with 'sabotage' in mind. I'll complete the CPU WUs and the GPU AP WU and do just that and give the PC another name for good measure, afterwards. On the other PC, which has the Quad core in it, since MB and AP GPU work was only enabled via the Lunatics app approximately three months ago, there is no APR at all for any of the GPU apps...APR is not even mentioned! Needless to say, that machine has no problems at all. A pity about having to 'cheat' my way to a new Host ID - I'd almost got attached to the current one. Thanks awfully, for your responses and solutions; I'm off down to the local to have a drink with Corporal Jones, the butcher - the poor chap has gone right off his head, yelling, "don't panic, don't panic, the weekend isn't over"! Whatever could he mean?

Yours etc, etc, respectfully, Sgt. F. Wilson (Home Guard, retd)


Ahh.....another of my dear gurly friends has posted.

How are Ya?

The kitties don't worry 'bout flips and flops.
And they seem to be doing just fine.
All time high RACs here.
What're we doing wrong? LOL.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1224838 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1227469 - Posted: 5 May 2012, 0:27:01 UTC - in response to Message 1224838.  


Are we hopefull of another tweak to the APR numbers & the possibility of the serverside limits being raised again next week?
Grant
Darwin NT
ID: 1227469 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1227474 - Posted: 5 May 2012, 0:49:50 UTC - in response to Message 1227469.  


Are we hopefull of another tweak to the APR numbers & the possibility of the serverside limits being raised again next week?

IF they do another tweak to the APR, I suspect they would still have to wait at least another 2-3 weeks for hosts to settle before adjusting the limits.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1227474 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1227481 - Posted: 5 May 2012, 1:19:54 UTC - in response to Message 1227474.  


Are we hopefull of another tweak to the APR numbers & the possibility of the serverside limits being raised again next week?

If they do another tweak to the APR, I suspect they would still have to wait at least another 2-3 weeks for hosts to settle before adjusting the limits.

But there was no adjustment to the severside limits with the last tweak (which as very, very small). So i can't see any problem doubling the present limits, or at the very least upping them by half.
Even if they were to tripple the current limits, i couldn't imagine even the slowest of hosts running into missed deadline issues.
Grant
Darwin NT
ID: 1227481 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Short estimated runtimes - don't panic


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.