Message boards :
Number crunching :
CUDA Versions
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next
Author | Message |
---|---|
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
I'm presently running 5 on the dual ti version but have noticed display stalls and at least one driver failure (that auto-recovered). Think I'll be dropping back to 4 after seeing where it peaks at 5 (just brought it online). Interesting input and I hope Zalster is reading this as well. Just did some averaging of about 20 mp (all GPU) wu's both Run Time and CPU Time. Would seem that the closer the Run Time is to CPU Time (on the average), the more efficient wu turn-around becomes. Does that make sense? Right now the Run Time average for the 20 mp wu's is about 25 minutes; average CPU Time is 3.9 minutes. Would like to hear from others on this for I suspect, as highlighted above, the GPU execution stream is being excessively interrupted. |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Exactly because that problem the dev of the crunching program develop the -use_sleep switch. For some reason the NV GPU´s waste the CPU time on a some kingd of a waiting loop (that´s not happening on ATI or IGPU). To avoid that he develop a new version of the software and we use now the -use_sleep switch. Look my allready crunched WU times, you will see it´s totaly diferent, the CPU times is about 10-20% of the total time only (an not forget my CPU´s a slow I5) ![]() |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 ![]() ![]() |
Hi Bill, I was following but it has been a long night and day last 24 hours so it has taken me a while to get back here. First, I was doing 2 APs or 2 MBs at a time with the 750s. I increased my percentage of CPU usage because of what I thought was due to them being AMD vs Intel chips. I just purchase several GTX780s (4 of them, think I must have caught a virus,hmm..inside joke) and now I run 3 APs on each of those or 3 MBs. Since the 750s share space and time with the 780s I've increased the number of APs and MBs on them to 3 at a time as well. I still use a larger percentage of CPU than what is recommended but again, I think I am compensating for the AMD chips. (maybe I'm not but it seems like the APs finish faster) My ave time is 53 mins for APs now and 19 minutes for MBs (cuda 50). Of course those 780s are punching out faster and the 750s are slower so overall it's a win. I use the modification for the Command line that Juan mentioned. It really did cut down on the overall CPU usage. If I still did CPU crunching that would have freed up at least 3 cores for other things, but I prefer to leave these 2 machines as pure GPU crunchers. Will check back later, got to run..Busy day |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13936 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Interesting input and I hope Zalster is reading this as well. Just did some averaging of about 20 mp (all GPU) wu's both Run Time and CPU Time. Would seem that the closer the Run Time is to CPU Time (on the average), the more efficient wu turn-around becomes. Does that make sense? Right now the Run Time average for the 20 mp wu's is about 25 minutes; average CPU Time is 3.9 minutes. Would like to hear from others on this for I suspect, as highlighted above, the GPU execution stream is being excessively interrupted. mp= MB= MultiBeam? I have 2 systems. E6600 (Dual core, no HT) with 2 GTX 750Tis. I7 2600k (Quad core, HT on) with 2 GTX 750Tis. Both are MB only & both are using the latest optimised applications, CUDA50 for the GPUs, and both are running the same video drivers (335.23) The E6600 runs Vista, the i7 Win7. The video cards on the E6600 use about twice the CPU time that the ones on the i7 do. E6600 17% peak, usually around 11-12% i7 7% peak, usually around 4-5% The video cards on the E6600 put out more work, but not a lot more. Run times are about 2min less for both shorties & longer running WUs than on the i7. When I added the 2nd video card to each system, the processing time for CPU WUs increased, the biggest impact being on the E6600. I've played around with the mbcuda.cgf files- had no effect. I also reduced CPU crunching, freeing up cores. Made no difference. Given the similarity in hardware, applications & drivers I suspect the differences are due to the underlying display driver model used by the OS. Grant Darwin NT |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Exactly because that problem the dev of the crunching program develop the -use_sleep switch. For some reason the NV GPU´s waste the CPU time on a some kingd of a waiting loop (that´s not happening on ATI or IGPU). To avoid that he develop a new version of the software and we use now the -use_sleep switch. Look my allready crunched WU times, you will see it´s totaly diferent, the CPU times is about 10-20% of the total time only (an not forget my CPU´s a slow I5) I'm taking notes on possible tuning actions and, while I understand the wait loop issue being generated by the NV GPU's, I'm at a loss on how the -use_sleep switch is engaged. Must I take specific action there or is that part of the recent Lunatics upgrade? Thanks ... |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Hi Bill, I gather that you now have a mix of 750's and 780's in those 2 machines. That's a big chunk of power you've brought on - 4 X 780's. I want to make sure I'm making proper comparisons ... is the ave times you cite above come from CPU Time, Run Time, or both? Will be interesting to see where these 2 machines end up after some wind-up time. |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 ![]() ![]() |
It's a command line that you can add to your app_info to help. I don't know the specifics, I only know that it works, lol. You would need to edit the file called ap_cmdline_win_x86_SSE2_OpenCL_NV.txt in the seti@home project folder. Right click it and use Notepad to open it. There are several different settings High end cards (more than 12 compute units) -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 -hp Mid range cards (less than 12 compute units) -unroll 10 -ffa_block 6144 -ffa_block_fetch 1536 -hp entry level GPU (less than 6 compute units) -unroll 4 -ffa_block 2048 -ffa_block_fetch 1024 -hp Since I have the 780(along with the 750s in there) I use the high end code along with the -use_sleep switch so mine looks like -use_sleep -unroll 12 -ffa_block 12288 -ffa_block_fetch 6144 -tune 1 64 4 1 One thing I did notice is that this computer is slower to response to doing other things. Not a big thing as all this computer does is crunch so I could care less. The 750s seem to do ok, however... I used this same line in my home computer (it only has 1-750) and it became almost unresponsive. It was taking so long that I ended up removing this and will be replacing the command line with the middle sequence when I get home later today. So I would suggest using the middle sequence along with the -use_sleep if you plan to go this route. Zalster Edit.. just saw your post...That is Run time. CPU time is much less. You can look at some of my finished APs and see the CPU time in the Stderr of the work unit. The 2 machines are both have 2 780s, and 1 750 FTW. Only the #1 machine has an extra 750 in there as well. My #2 is slowly climbing but i'm going to have to switch out the PSU and Case on Sunday. |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Interesting input and I hope Zalster is reading this as well. Just did some averaging of about 20 mp (all GPU) wu's both Run Time and CPU Time. Would seem that the closer the Run Time is to CPU Time (on the average), the more efficient wu turn-around becomes. Does that make sense? Right now the Run Time average for the 20 mp wu's is about 25 minutes; average CPU Time is 3.9 minutes. Would like to hear from others on this for I suspect, as highlighted above, the GPU execution stream is being excessively interrupted. Well, I've certainly learned to pay closer attention to Run Time/CPU Time figures rather than wait for change in RAC when making adjustments. I will probably roll back the GPU driver from the most current 340.52 based responses from others. Pretty clear from your improvements that add'l video cards noticeably affect cpu performance. But I find your last 2 statements most interesting. While I suspect that adjustments such as you mention in those 2 statements vary with system configurations, we shouldn't be surprised that such adjustments have no effect. And based on my experience on large machines, it is unlikely that 2 identical machines will ever produce identical results. Thanks for the insight. |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
It's a command line that you can add to your app_info to help. I don't know the specifics, I only know that it works, lol. You would need to edit the file called Great ... but before I make this adjustment, please tell me which NV driver version you are using. I may need to roll back from the most current driver before making the change. Bill |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 ![]() ![]() |
Lol. That's just it. I'm using the most current one. These changes reduce the CPU demand of the GPU. Only 1 computer is still using the last version but I'll be updating it here in an hour and adding the command line soon afterwards. Good luck ;) and Happy Crunching... Zalster |
![]() ![]() Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 ![]() |
Can someone give me a quick brushup how to see if the 340.52 Driver/-use_sleep combination is causing problems ? What are the symptoms/results if that problem occurs ? I've just used the switch on two of my hosts and the CPU load immediately dropped by almost 40%. So far, it looks good to me (and I have the 340.52 Driver). Hardware it's used on : GT610, GTX750ti and GTX780 ![]() |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
|
![]() ![]() Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 ![]() |
Hm, I think I see the problem now. With the 340.52 Driver installed and 2 AP tasks running per GPU, this is what GPU-Z shows for my GTX780 : -use_sleep Disabled ... 98% GPU load -use_sleep Enabled .... 49-69% GPU load (fluctuating) That certainly equals significantly higher runtimes, so I'll better leave the switch left alone with my installed 340.52 driver. ![]() |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
No on the contrary, is better downgrade the driver and keep the -use_sleep, I use 337,88. The problem was allready reported and sure there is a fix on the way, but that could thake some time. ![]() |
![]() ![]() ![]() Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 ![]() ![]() |
Bill go with Juans suggestion and stay with the old driver |
Bill Greene Send message Joined: 3 Jul 99 Posts: 80 Credit: 116,047,529 RAC: 61 ![]() ![]() |
Interesting dialog this has created ... and helpful. Will take one step at a time, first rolling back the driver then adding the sleep switch constructs. But this all starts tomorrow when I'll hopefully take you up on that Happy Crunching. |
![]() ![]() Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 ![]() |
Since the only systems affected run an equal amount of CPU cores to number of AP tasks exclusively, I think I'm okay... Unless someone can say that there's a performance improvement coming from the -use_sleep option.... Then I could be tempted to change the setup one last time *g* ![]() |
![]() ![]() ![]() Send message Joined: 17 Feb 01 Posts: 34559 Credit: 79,922,639 RAC: 80 ![]() ![]() |
Since the only systems affected run an equal amount of CPU cores to number of AP tasks exclusively, I think I'm okay... Of course there is a improvement possible. But that requires some testing. Did you read my tips in the OpenCL readme ? It gives some ideas and i`m alsways here to help. With each crime and every kindness we birth our future. |
![]() ![]() Send message Joined: 5 Oct 99 Posts: 394 Credit: 18,053,892 RAC: 0 ![]() |
Yes, I'm running all NVidia cards with the cmdline_switch examples suggested in the ReadMe according to the GPU capabilities now (both AP and mbcuda config files). The use_sleep switch was the last one I was experimenting with. For any advanced kernel tuning and longer running benches I don't have time anymore (WOW 2014 race starting within hours ;) ) The two affected systems do have have iGPUs (AMD APUs) though, which for now is not used anymore, as it was competing as a 5th task with the 4 fast GPU tasks over CPU time on the quadcore CPUs. Those iGPUs I could still activate, if I can manage to free some additional CPU time. I'll likely give that a shot today with a previous NVidia Driver. Other than that, I think I'm set and have gotten just about the most out of the systems in the little time I had setting them up & finding the sweet spot for performance. ![]() |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Mike could explain better why, but from my user perspective free the core = more production since i could attach more GPUs to the same host (or crunch on the CPU something i don´t do). Look my hosts for example, a single (slow and with 4 cores only) I5 powering 2x690 (actualy 4 GPU´s) running up to 3 WU at a time on each GPU for a total of up to 12 simultaneus AP WU it´s amazing, something i not imagine possible without the -use_sleep switch and even allowing i use the host for other jobs. ![]() |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.