Message boards :
Number crunching :
CUDA Versions
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 · Next
Author | Message |
---|---|
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34559 Credit: 79,922,639 RAC: 80 ![]() ![]() |
Yes, I'm running all NVidia cards with the cmdline_switch examples suggested in the ReadMe according to the GPU capabilities now (both AP and mbcuda config files). You neither use the tune switch nor use_sleep. Example. -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 64 4 1 -use_sleep. This should reduce CPU usage significantly and speed up processing time. With each crime and every kindness we birth our future. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
@Mike I use -unroll 12 on the 670/690/780 did you sugest i use diferent settings for each model or the 12 is ok for all of them. ![]() |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34559 Credit: 79,922,639 RAC: 80 ![]() ![]() |
@Mike Its O.K. on all of them. Of course you could increase ffa_fetch on the 780 to 16384 8192. With each crime and every kindness we birth our future. |
![]() ![]() Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 ![]() |
You neither use the tune switch nor use_sleep. Alright, will give that combo a shot after reverting the GTX750ti/GTX780 to an older Driver in a few hours :) (since my only GTX780 is in a mixed System with one of the GTX750ti, the 750's set the pacing for all tweaks I used so far) ![]() |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
@Mike Thanks. And on the 670/690 keep the 12288/6144 or use a little less? ![]() |
![]() ![]() Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 ![]() |
Hmpf... Even with the older Driver, I get the same results when using -use_sleep : The CPU load drops as advertised - but so does GPU load and presumably performance (?) ![]() |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34559 Credit: 79,922,639 RAC: 80 ![]() ![]() |
Hmpf... Shouldn`t be be much difference on real tasks. You can also try -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 128 2 1 -use_sleep or -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 -tune 1 32 8 1 -use_sleep. With each crime and every kindness we birth our future. |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34559 Credit: 79,922,639 RAC: 80 ![]() ![]() |
@Mike If you dont have issues just keep it. With each crime and every kindness we birth our future. |
![]() ![]() Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 ![]() |
Hmpf... So you're saying despite the significant drop in GPU load, the performance remains unaffected? (I was looking at real tasks running while monitoring GPU loads) ![]() |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34559 Credit: 79,922,639 RAC: 80 ![]() ![]() |
Hmpf... Lets say most hosts that have changed to those settings didn`t notice a big difference. A little bit fine tuning might be necessary. Finnish a few tasks and i will check. With each crime and every kindness we birth our future. |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34559 Credit: 79,922,639 RAC: 80 ![]() ![]() |
@Falconfly Use_sleep isn`t in place check for typos. With each crime and every kindness we birth our future. |
![]() ![]() Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67 ![]() ![]() |
Here's my 2cents using 2 x GTX750Ti FTW @ 2048MB each on a i7/4770K, NVidia driver 340.52, % of processers is ignored. While still using v0.41, the ap_config.xml specified 3 AP tasks w/ 1 core for each task and 4 MB tasks w/ .5 core for each task. AP tasks were using r1843 and the CPU & GPU run times were almost the same. I manually installed r2399 and inserted the sleep option into the AP command line file. After restarting BOINC, the CPU run times dropped considerably from 1+ hr. to 5-10 min. per task and total run times remained approx. the same. After v0.42 total run times stayed approx. the same. Because of the speed of the processer I've reduced the CPU spec from 1 to .5 in the ap_config.xml. I consider the total run times to be appropriate and something I can live with. With the change more cores are available for CPU processing. Some would say that I'm not running at peak total run times with 3 AP tasks as they average 1.45 to 2 hrs., but my thinking is that my AP queue lasts longer between feeding times and my RAC does not drop as much when crunching on v7 only. I'm still not sure how the tune option fits into all of this as I haven't tried it yet. IMHO, I think that the sleep option should be required for higher end processors that run higher end GPUs. ![]() ![]() I don't buy computers, I build them!! |
![]() ![]() Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 ![]() |
@Falconfly Just updated, so all results coming in from now on should reflect them. CPU load seems very low now, went from 98%/Task to more like 10%/Task on Astropulse. Visible performance at least seems normal just by looking at Workunits in progress. ![]() |
![]() ![]() Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 ![]() |
Hmpf, why do I have to come up with new questions after the edit period is over? :p I've searched the Forums for their use of the commandline switches and found only CUDA/NVidia related examples. Since I have 2 AMD/ATI based hosts as well, this is what I'm using so far (based on the ReadMe recommendations) : -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 (mix system with HD7970 + HD7790) -unroll 10 -ffa_block 6144 -ffa_block_fetch 1536 -hp (mix system with HD7850 + HD7750) Any kernel tuning sets known good for these GPU combinations at hand? I would assume -tune 1 64 4 1 should work at least on the first combo running the more potent cards, not sure though about the weaker combo (don't want to "break" the running config by feeding it bad tuning parameters). That's why I haven't implemented any so far. Just don't know what difference the ATI cards make on that matter vs. NVidia card, at least CPU usage isn't a factor with them. PS. Sorry for the n00bish questions, I just got back too late into SETI to experiment myself (which I normally do extensively) ![]() |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Back on the driver load for SSD ... Windows picked up the Samsung SSD without driver load on this new system ... just like a hard drive. Loaded it up without difficulty and proceeded to work with the GPU's. |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34559 Credit: 79,922,639 RAC: 80 ![]() ![]() |
@Falconfly Yes, looks good now. With each crime and every kindness we birth our future. |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34559 Credit: 79,922,639 RAC: 80 ![]() ![]() |
Hmpf, why do I have to come up with new questions after the edit period is over? :p Yes, thats correct. You probably can increase ffa_fetch on the first host. -tune 1 64 4 1 is correct for those hosts. On slower cards -tune 1 32 8 1 can be better. Dont worry i`m here to help. With each crime and every kindness we birth our future. |
![]() ![]() Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 ![]() |
Thanks, I'll update to those values on my ATI rigs. And the NVidia hosts look fine, now I only see occasional tasks still taking unusually high (>50%) CPU time and lasting generally much longer. I guess those might just be "blanked" WorkUnits I've heard some chatter about (?) ![]() |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34559 Credit: 79,922,639 RAC: 80 ![]() ![]() |
Thanks, I'll update to those values on my ATI rigs. Yes, thats the reason. With each crime and every kindness we birth our future. |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Ok lets try one step at a time... and please forgiveme for any mistakes... Use this instructions with NV GPUs only. Well, have taken several actions. Moved nv driver to 337.88 (working with drivers always has to be a hassle), now working on Lunatics 0.42, and have dropped the number of wu's per GPU to 4 (vs 5). Per your advice to others, been reading up on parameters as described in ReadMe_AstroPulse_OpenCL_NV.txt and decided, again per your advice, to use your set -use_sleep -unroll 12 -ffa_block 12288 -ffa_block_fetch 6144 -tune 1 64 4 1 as a starting point. However, I do not yet have a handle on the parameters since I lack an understanding about "kernal call" (to what?) and FFA as in ffa_block_fetch. I'm therefore unable to determine the relevance of a command line switch value change. Perhaps that isn't necessary or will surface with experimentation but it would be useful to know at least the most relevant value to work with. Additionally, I'm unsure how the command switch construct above as inserted in ap_cmdline_win_x86_SSE2_OpenCL_NV.txt is engaged. I assume that it finds the proper place (in app_info.xml?) using aimerge.cmd or, as you suggest, with an exit and restart of Boinc. I may give the latter a try awaiting your response but would appreciate your thoughts on the above - command switch values in your construct and how the construct is engaged. Finally, I had been advised elsewhere to use EVGA PrecisionX 15 to monitor GPU performance but it has been withdrawn due to some plagiarism issues. How do you measure GPU and CPU performance changes? I assume that wu feeds to CPU must be turned off in order to see changes from the -use_sleep command otherwise the CPU will be running full bore (100%) executing wu's. Alternatively, I suppose I could wait for an average of CPU times once the -use_sleep command is turned on. As always, responses valued ... |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.