Message boards :
Number crunching :
To Many ERRORS
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 55,390,408 RAC: 69 |
Every system has it`s own best setting, Alocate an extra thread every day or two until the errors stop, Then work with it around that piont, Are you using the -hp (-high_priority switch) in app_info command line. |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
No I don't have -hp set Would that solve some of the problem In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Mike Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80 |
No. With each crime and every kindness we birth our future. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
First of all you quoted my reply to skildude. Sorry for the misunderstanding, tried different values for period_iterations , 2; 4; 5 ;8; 10 ;12 ;15;16; 18; 20; 30; 40 ;50 which doesn't help. Although GPU-load is ~90%. So I'll free up 2 CPU cores for the GPUs and see how this behaves. With period_iterations 50. Changed that to 20, again and let it run till I see any change to the better. GPU use is ~95%. Also switched all, unnecessary progs, like CoreTemp and Clock, off. |
.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 55,390,408 RAC: 69 |
@ Mike, I thought that the -hp switch was to help the gpu grab some more cpu time, not as a cure for the errors but a little bit of help. |
Mike Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80 |
@ Mike, Yes, thats true. But with new synching method inside drivers it doesn´t help for the low GPU usage bug. Lets say it doesn´t hurt if you set it. I dont use it anymore on my card. With multiple cards it might help a little. With each crime and every kindness we birth our future. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
@ Mike, Still using the Cat.12.4 driver and now using 2 cores for 2 GPUs, it might help to use the -hp switch. GPU usage is ~95%. VLAR and ~0.4AR WUs still give about 40% errors and take ~3500 seconds runtime. And 150 to 200 seconds on the CPU. B.t.w. the other 2 cores, (is 4 threads) are doing 4 SETI MB WUs. |
Mike Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80 |
Its getting a bit complicated now. The times are nearly normal for your cards. The question is at what percentage the units errs. You can reduce more cores until the erros stop. Also modifying DCF in client_state.xml would help. But careful have you ever tried that ? Stop Boinc before !!!!!!!!! How many PCI lanes does the second slot have ? With each crime and every kindness we birth our future. |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
BTW my 7970 is completing non blanked AP WU's in about 30-45 minutes 3 at a time. My wingman ran an i7 980 at 100,000 seconds. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
KneeDeep Send message Joined: 27 Sep 99 Posts: 131 Credit: 4,887,778 RAC: 0 |
LadyL asked ... I don't see where this was answered and it seems the most likely reason to me. |
Mike Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80 |
Downclocking is not an issue in that case. With each crime and every kindness we birth our future. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Gonna give it one more try with 1 instance_per_device and 1 core, out of 4, for the GPUs?! Error rate is still climbing and is a waste of resources. Until I find what's going terrebly wrong. Another thing, all 3 rigs are running in High Priority, again and also making errors, except the GTX470 running at 800MHz, in stead of 1400MHz. |
Mike Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80 |
Gonna give it one more try with 1 instance_per_device and 1 core, out of 4, for Did you read my earlier comment ? With each crime and every kindness we birth our future. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Gonna give it one more try with 1 instance_per_device and 1 core, out of 4, for @ Mike about modifying DCF, to 1 f.i., yes it's in the client_state.xml file. Found, it looks like it, what caused these Time exeeded, I have to use period_iteration 2(!), also (?) produces almost no lag. Doing 2 instances_per_device gives a load of 98% (device 0) and 97-47%, swinging for device 1. Estimates also were shorter as the runtime, DCF=9.011675! CPU estimates are way, a VHAR is estimated 9000 seconds. Estimates on GPUs and runtime are more ~equal. 1 core, 2 threads for 2 GPUs appear to be enough, during the first 20 second this core is @ 100%, then GPU load rises to 98%. Let it run for now, have seen the first, after the change to p_i 2, validated. When more failiars occur, shall I change DCF to 1? |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
Mike is correct you need the 2 cores if you wish to run multiple WU's on the ATI GPU. I found this out the hard way. It works just do it. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Mike Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80 |
@ Fred Please answer a few questions first. What motherboard are you using ? How many PCI-E slots ? I will look into the details but dont change so much. Period_iterations_num is fine when you dont have any lags. Dont change it anymore. At what percentage are the units erring (fail) ? With each crime and every kindness we birth our future. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
@ Fred Mike, sorry for my late reply, I'm using an INTEL DP67BG mobo, 2 PCIe (2.0)x16 x8, if both are used. Using period_iterations 32, giving the least lag. Errors : Valid (213) · Invalid (0) · Error (166) (Even set base clock from 100 to 102MHz giving higher FLOPS from CPU, maybe I should OC the GPUs?!) Errors all with: Exit status 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED Also leaving 1 core (2 threads) free for the GPUs. |
Mike Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80 |
@Fred I have calculated something for you. Please add this line into your appinfo. <flops>509408724.212160</flops> Below your command line. This should increase estimates, reduce your DCF and limit the errors. Beware its very low value on purpose. With each crime and every kindness we birth our future. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
@Fred I'll try it, value is 6x lower as 1 CPU core, FLOPS (Whetstone)values, but since it's the CPU having ~6 times higher estimates, it could/should work, thanks for figuring this out. GPU estimates are within +/- 10% of actual runtime! Since I've set 1 instance_per_device and no_cpu_lock together with 2 free cores, each feeding a GPU, errors have stopped and runtimes have decreased to 50% compaired to doing 2 instances_per_ device. But APR doesn't change so this might help. |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
Here's how well my 7970 is working so far. It and the 6 cores for the Fx-8150 are currently doubling the production of my second best machine (AMD 630 w/ ati 5850 gpu) I also play a lot of Video games on the 7970 rig so it ends up having less running time than my other rigs. Still a smashing success and again a great big thanks to Mike In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.