To Many ERRORS

Author	Message
.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 55,390,408 RAC: 69	Message 1263495 - Posted: 22 Jul 2012, 21:25:54 UTC Every system has it`s own best setting, Alocate an extra thread every day or two until the errors stop, Then work with it around that piont, Are you using the -hp (-high_priority switch) in app_info command line. ID: 1263495 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 1263552 - Posted: 23 Jul 2012, 1:53:07 UTC - in response to Message 1263495. No I don't have -hp set Would that solve some of the problem In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 1263552 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80	Message 1263562 - Posted: 23 Jul 2012, 3:38:47 UTC No. With each crime and every kindness we birth our future. ID: 1263562 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1263613 - Posted: 23 Jul 2012, 9:10:51 UTC - in response to Message 1263463. Last modified: 23 Jul 2012, 10:09:48 UTC First of all you quoted my reply to skildude. So i got confused. Anyways. I fear you need to free 1 physical core per GPU. Not one thread. Try it please to see if this helps. It certainly should. Sorry for the misunderstanding, tried different values for period_iterations , 2; 4; 5 ;8; 10 ;12 ;15;16; 18; 20; 30; 40 ;50 which doesn't help. Although GPU-load is ~90%. So I'll free up 2 CPU cores for the GPUs and see how this behaves. With period_iterations 50. Changed that to 20, again and let it run till I see any change to the better. GPU use is ~95%. Also switched all, unnecessary progs, like CoreTemp and Clock, off. ID: 1263613 ·

.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 55,390,408 RAC: 69	Message 1263628 - Posted: 23 Jul 2012, 10:55:56 UTC @ Mike, I thought that the -hp switch was to help the gpu grab some more cpu time, not as a cure for the errors but a little bit of help. ID: 1263628 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80	Message 1263648 - Posted: 23 Jul 2012, 12:10:26 UTC - in response to Message 1263628. @ Mike, I thought that the -hp switch was to help the gpu grab some more cpu time, not as a cure for the errors but a little bit of help. Yes, thats true. But with new synching method inside drivers it doesnÂ´t help for the low GPU usage bug. Lets say it doesnÂ´t hurt if you set it. I dont use it anymore on my card. With multiple cards it might help a little. With each crime and every kindness we birth our future. ID: 1263648 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1263723 - Posted: 23 Jul 2012, 15:24:29 UTC - in response to Message 1263648. Last modified: 23 Jul 2012, 15:32:17 UTC @ Mike, I thought that the -hp switch was to help the gpu grab some more cpu time, not as a cure for the errors but a little bit of help. Yes, thats true. But with new synching method inside drivers it doesnÂ´t help for the low GPU usage bug. Lets say it doesnÂ´t hurt if you set it. I dont use it anymore on my card. With multiple cards it might help a little. Still using the Cat.12.4 driver and now using 2 cores for 2 GPUs, it might help to use the -hp switch. GPU usage is ~95%. VLAR and ~0.4AR WUs still give about 40% errors and take ~3500 seconds runtime. And 150 to 200 seconds on the CPU. B.t.w. the other 2 cores, (is 4 threads) are doing 4 SETI MB WUs. ID: 1263723 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80	Message 1263752 - Posted: 23 Jul 2012, 16:43:33 UTC Last modified: 23 Jul 2012, 16:44:36 UTC Its getting a bit complicated now. The times are nearly normal for your cards. The question is at what percentage the units errs. You can reduce more cores until the erros stop. Also modifying DCF in client_state.xml would help. But careful have you ever tried that ? Stop Boinc before !!!!!!!!! How many PCI lanes does the second slot have ? With each crime and every kindness we birth our future. ID: 1263752 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 1263775 - Posted: 23 Jul 2012, 17:24:19 UTC Last modified: 23 Jul 2012, 17:26:20 UTC BTW my 7970 is completing non blanked AP WU's in about 30-45 minutes 3 at a time. My wingman ran an i7 980 at 100,000 seconds. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 1263775 ·

KneeDeep Send message Joined: 27 Sep 99 Posts: 131 Credit: 4,887,778 RAC: 0	Message 1263803 - Posted: 23 Jul 2012, 18:29:31 UTC - in response to Message 1262218. LadyL asked ... The card may be intermittently downclocking for some reason - any chance you can monitor that host to see if tasks are actually progressing and check the system for anomalies once a task goes past normal runtimes? I don't see where this was answered and it seems the most likely reason to me. ID: 1263803 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80	Message 1263827 - Posted: 23 Jul 2012, 20:04:00 UTC Downclocking is not an issue in that case. With each crime and every kindness we birth our future. ID: 1263827 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1263840 - Posted: 23 Jul 2012, 20:56:05 UTC - in response to Message 1263723. Gonna give it one more try with 1 instance_per_device and 1 core, out of 4, for the GPUs?! Error rate is still climbing and is a waste of resources. Until I find what's going terrebly wrong. Another thing, all 3 rigs are running in High Priority, again and also making errors, except the GTX470 running at 800MHz, in stead of 1400MHz. ID: 1263840 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80	Message 1263844 - Posted: 23 Jul 2012, 21:06:21 UTC - in response to Message 1263840. Gonna give it one more try with 1 instance_per_device and 1 core, out of 4, for the GPUs?! Error rate is still climbing and is a waste of resources. Until I find what's going terrebly wrong. Another thing, all 3 rigs are running in High Priority, again and also making errors, except the GTX470 running at 800MHz, in stead of 1400MHz. Did you read my earlier comment ? With each crime and every kindness we birth our future. ID: 1263844 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1264058 - Posted: 24 Jul 2012, 12:30:28 UTC - in response to Message 1263844. Last modified: 24 Jul 2012, 13:05:48 UTC Gonna give it one more try with 1 instance_per_device and 1 core, out of 4, for the GPUs?! Error rate is still climbing and is a waste of resources. Until I find what's going terrebly wrong. Another thing, all 3 rigs are running in High Priority, again and also making errors, except the GTX470 running at 800MHz, in stead of 1400MHz. Did you read my earlier comment ? @ Mike about modifying DCF, to 1 f.i., yes it's in the client_state.xml file. Found, it looks like it, what caused these Time exeeded, I have to use period_iteration 2(!), also (?) produces almost no lag. Doing 2 instances_per_device gives a load of 98% (device 0) and 97-47%, swinging for device 1. Estimates also were shorter as the runtime, DCF=9.011675! CPU estimates are way, a VHAR is estimated 9000 seconds. Estimates on GPUs and runtime are more ~equal. 1 core, 2 threads for 2 GPUs appear to be enough, during the first 20 second this core is @ 100%, then GPU load rises to 98%. Let it run for now, have seen the first, after the change to p_i 2, validated. When more failiars occur, shall I change DCF to 1? ID: 1264058 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 1264082 - Posted: 24 Jul 2012, 14:01:59 UTC - in response to Message 1264058. Mike is correct you need the 2 cores if you wish to run multiple WU's on the ATI GPU. I found this out the hard way. It works just do it. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 1264082 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80	Message 1264084 - Posted: 24 Jul 2012, 14:05:24 UTC @ Fred Please answer a few questions first. What motherboard are you using ? How many PCI-E slots ? I will look into the details but dont change so much. Period_iterations_num is fine when you dont have any lags. Dont change it anymore. At what percentage are the units erring (fail) ? With each crime and every kindness we birth our future. ID: 1264084 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1266158 - Posted: 31 Jul 2012, 11:06:08 UTC - in response to Message 1264084. Last modified: 31 Jul 2012, 11:11:25 UTC @ Fred Please answer a few questions first. What motherboard are you using ? How many PCI-E slots ? I will look into the details but dont change so much. Period_iterations_num is fine when you dont have any lags. Dont change it anymore. At what percentage are the units erring (fail) ? Mike, sorry for my late reply, I'm using an INTEL DP67BG mobo, 2 PCIe (2.0)x16 x8, if both are used. Using period_iterations 32, giving the least lag. Errors : Valid (213) Â· Invalid (0) Â· Error (166) (Even set base clock from 100 to 102MHz giving higher FLOPS from CPU, maybe I should OC the GPUs?!) Errors all with: Exit status 197 (0xc5) EXIT_TIME_LIMIT_EXCEEDED Also leaving 1 core (2 threads) free for the GPUs. ID: 1266158 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34283 Credit: 79,922,639 RAC: 80	Message 1267499 - Posted: 4 Aug 2012, 12:29:39 UTC Last modified: 4 Aug 2012, 12:32:25 UTC @Fred I have calculated something for you. Please add this line into your appinfo. <flops>509408724.212160</flops> Below your command line. This should increase estimates, reduce your DCF and limit the errors. Beware its very low value on purpose. With each crime and every kindness we birth our future. ID: 1267499 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1268349 - Posted: 6 Aug 2012, 12:28:31 UTC - in response to Message 1267499. Last modified: 6 Aug 2012, 12:37:27 UTC @Fred I have calculated something for you. Please add this line into your appinfo. <flops>509408724.212160</flops> Below your command line. This should increase estimates, reduce your DCF and limit the errors. Beware its very low value on purpose. I'll try it, value is 6x lower as 1 CPU core, FLOPS (Whetstone)values, but since it's the CPU having ~6 times higher estimates, it could/should work, thanks for figuring this out. GPU estimates are within +/- 10% of actual runtime! Since I've set 1 instance_per_device and no_cpu_lock together with 2 free cores, each feeding a GPU, errors have stopped and runtimes have decreased to 50% compaired to doing 2 instances_per_ device. But APR doesn't change so this might help. ID: 1268349 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 1268387 - Posted: 6 Aug 2012, 14:55:29 UTC - in response to Message 1268349. Here's how well my 7970 is working so far. It and the 6 cores for the Fx-8150 are currently doubling the production of my second best machine (AMD 630 w/ ati 5850 gpu) I also play a lot of Video games on the 7970 rig so it ends up having less running time than my other rigs. Still a smashing success and again a great big thanks to Mike In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 1268387 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.