Message boards :
Number crunching :
AstroPulse v7 v7.10 (opencl_nvidia_100)
Message board moderation
Author | Message |
---|---|
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I finally got some AP tasks today and have been trying to figure out what the best CPU/GPU usage is best for them. I know previous AP GPU tasks were CPU intensive so I shut down a CPU if 1 of 2 tasks running were AP. No problem there. But I must say ... holly crap, the CPU usage is nuts with v7.10! It's taking a complete core (100% use of 2.9GHz) to feed each of 2 tasks to a 750ti. So that's 2 full cores out of 4. On my XP 4200+ 2 core, it's impossible to run 2 GPU + 1 CPU task. The main reason is v7.10 is set at 'below normal' priority, and CPU tasks are all 'low' priority. So 2 GPU tasks completely choke out any CPU task (0-2% use), and there is no way for them to run. My CPU tasks would all sit idle until I run out of AP GPU tasks. And likely time out. So I lowered it to 1 task only if AP is running. Not in favor of that but it's all I can do. Is it bad programming or what that v7.10 needs that much CPU to feed the GPU? And why is the priority set higher than CPU tasks? |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Are you using any commandlines for the APs? I know I had discussed this before about CPU usage of the new APs. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Current config for my i5, same command line for MB and AP. GPU 750ti.
|
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
For a test I completely removed the command line for AP (yes I waited for new tasks to start) Running 2 tasks on 750ti i5 2.9Ghz CPU. It still uses 2 full cores at 100% usage to feed 2 GPU tasks. That is just crazy CPU usage for a stock config! |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
It's not the OpenCL app for NVidia GPUs. AFAIK, it's the decision of NVidia to make their drivers buggy if you let run OpenCL apps (ATI/AMD's 'kingdom'). The latest good driver for OpenCL apps was 263.06 for (just pre-Fermi?)* NV GPUs. Because of this you 'must' use the '-use_sleep' at least in cmdline, for to let be freeing your CPU. BTW. The SETIv7 app for NVidia GPUs, it's the CUDA app, don't have (work with) this kind of settings which you use in your app_config.xml file. [* this was in my case, WinXP x86 and GTX285] |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
BTW, I forgot, the priority of the CUDA or OpenCL GPU app should be higher than for the CPU apps. Because, normally the GPU have more performance, than the installed CPU. So it's good to get the GPU more support of the CPU. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Current config for my i5, same command line for MB and AP. GPU 750ti. It might be a good idea to read the manuals about now. Most people place those command lines in different files named similar to; ap_cmdline_win_x86_SSE2_OpenCL_NV.txt and mb_cmdline_win_x86_SSE2_OpenCL_ATI.txt Or mbcuda.cfg for CUDA MB AP ReadMe, http://lunatics.kwsn.info/downloads/v0.43a_ReadMe_AstroPulse_OpenCL_NV.txt <app_config> Instructions, http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration There also should be ReadMes in your setiathome.berkeley.edu folder. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
It might be a good idea to read the manuals about now. But as Brent Norman said, a stock app should run 'adequately' (my word, not his) straight out of the box, like any other BOINC application. All this messing around with ReadMes in what would normally be hidden data folders, manual configuration of complex command lines etc. etc. betrays the application's origin as "for advanced users only" - geeks and enthusiasts. It would be good to concentrate for a while on the 'stockness' of the applications, and try to find a default distribution model which allows it to run with the minimum of interference with either the user or with other projects' BOINC (CPU) applications. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It might be a good idea to read the manuals about now. As soon as Brent Norman added a flawed <app_config> to his 'hidden folder' it became something other than 'stock'. If you are going to add something to 'stock', you Should read the manual first, or at least after it doesn't work as expected. If you want 'stock', then don't complain about a nVidia driver 'feature' that has existed since Driver 266.58. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Yes I understand some but not all of the configs, this command line was recommended to me for this card. And the options make sense to me for what I'm running. I'm at a command line of "-use_sleep" and nothing more. CPU usage hasn't changed from a full command line, to nothing, to this one. And thanks Richard for understanding what I'm saying. My main point is ... Why does the app need a full core of 2.9Ghz to feed 1 task? That is just silly. Sure MBs take 2-5%, AP is more aggressive so 20-25% would be OK, but a full CPU? I would bet if I had 8Ghz, it would take all that too for 1 GPU task. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I think we should leave the MB / AP comparison out of this discussion for the time being - that comes from using different underlying programming languages for the two applications. If you want 'stock', then don't complain about a nVidia driver 'feature' that has existed since Driver 266.58. And I have never seen an adequate exploration of why that seems to have happened, and what steps have been taken to understand and mitigate it. I'm not thinking specifically of AP, or SETI, or BOINC, here. OpenCL for GPUs is a general-purpose tool. There must be a wider user community, developing other applications for other uses. Is this CPU usage since 266.58 universal across all developments? Does it apply to both Windows XP and the Windows Vista/7/8 ranges, which use very different underlying driver models? Somebody, somewhere, must have asked themselves those questions, and come up with some (at least speculative) answers. Where are they - can anyone link? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, Raistmer has explained it a few times here, as you are aware. He appears to be lurking about, so I'll let him explain it again. I have run across other people complaining about the CPU use in other locations. Here's a few posts about it, this one sounds related; Increased CPU usage with last drivers starting from... Oh my, it's Raistmer... Why does FireFox keep reverting to the Yahoo search engine??? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Well, Raistmer has explained it a few times here, as you are aware. He appears to be lurking about, so I'll let him explain it again. I have run across other people complaining about the CPU use in other locations. Here's a few posts about it, this one sounds related; Increased CPU usage with last drivers starting from... That's the problem. Only Raistmer seems to be experiencing/talking about it. And using the word "bug" in your search term is pre-determining the outcome. I'm seeking a more fundamental understanding, open to the possibility that a new 'feature' (in the true sense) might require an update to previously-established programming techniques. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
No comments. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
No comments. Innit. With each crime and every kindness we birth our future. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
No comments. OK, I can see how you may have read it differently than intended. I'll rephrase it. When I searched for 'bugs' the 4th post down was "Increased CPU usage with last drivers starting from...". I clicked on it but didn't read the author. I thought it would be a good link and made the post. Later I went back and read it and saw how it was your post from 3 years ago. What I should have said was, Oh My, it's Raistmer from three years ago trying to get nVidia to suggest a Fix for this problem that a number of different Developers are having... Happy? ;-) |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I think we should leave the MB / AP comparison out of this discussion for the time being - that comes from using different underlying programming languages for the two applications. IMO, NVIDIA simply chose to assume that those running OpenCL apps on their GPUs would want the ultimate possible performance. So if OpenCL enqueues another kernel before a previous kernel is complete their implementation uses something like a CPU spin loop to provide the least possible delay in getting the new kernel started. With -use_sleep Raistmer's apps delay enqueueing a new kernel until after the previous kernel is done (for the most time-consuming kernels), thereby avoiding most of that extra CPU "usage". But the sleep doesn't end just at the right time so there's added latency, adding some tuning can minimize that. Brent's problem is that although <cmdline> can be used within an app_config.xml it is only valid within an <app_version> section. I'd be guessing if I tried to suggest the exact content of that <app_version> section, so I won't. Joe |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Brent's problem is that although <cmdline> can be used within an app_config.xml it is only valid within an <app_version> section. I'd be guessing if I tried to suggest the exact content of that <app_version> section, so I won't.Joe It's documented at http://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration Most, except <app_name>, is optional, and <plan_class> can be copied from the apps page, or re-typed from the BOINC Manager display. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I think we should leave the MB / AP comparison out of this discussion for the time being - that comes from using different underlying programming languages for the two applications. That's the closest so far to what I'd expect the real choices/decisions might have involved, considering that kindof decision-making goes on behind closed doors. A subtle irony in all this, is that I've been working toward getting rid of 'old-school-cuda-blocking-synch' in the MB Cuda application, because frankly it's not the most efficient way to go either. Unsurprisingly, the cheapest/most-efficient synchronisation methods seem to be turning out to be graphics-api like ones, such as precision multimedia timer-based or frame renderloop based ones. It's easy to forget these devices evolved from, and are, graphics devices, drivers and apis. So expecting them to behave otherwise as 'pure abstract opencl virtual machines' is probably a little unrealistic. With something as low level as OpenCL, as with Cuda driver api, you're expected to be able to 'roll your own', or use a higher level api & libraries. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I find this interesting... 2 computers XP 4200+ 2 core 8,1 2.9GHz 4 core Both with 750ti running identical configs, 1 AP task on GPU, same command line. As expected they both took over 1 core for the GPU task with v7.10 But, run times were also identical. So it leads me to believe that v7.10 is hungry for CPU time and not processing power. For the hell of it I tried 2AP 7.10 + 1 MB Cuda50 on my 4200+ 2core, and the 2 AP tasks completely took over the CPU, and choked out the Cuda50 task along with the CPU. Even though the Cuda50 task had the same CPU priority as the v7.10 tasks, it could not get any processing time to run. Something is wrong with v7.10 I say. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.