21)
Message boards :
Number crunching :
Lunatics Windows Installer v0.42 Release Notes
(Message 1559709)
Posted 20 Aug 2014 by ![]() Post: I need a quick advice for a team member : He's running a Mobility Radeon HD5470 on a Win7 Notebook (intel i3 M330 CPU) and encountered errored out/abandoned AP Workunits after installing the v0.42. It being a mobile GPU, is there any known limitation of the Mobility HD5470 "Cedar" GPU concerning AP or OpenCL in general ? I assume it should be able to run : - AP CPU SSE3 r2163 - MB CPU SSE4.2 r2549 - AP GPU SSE2 OpenCL_ATI r2399 - MB GPU SSE OpenCL_ATI r2489 (HD4xxx I presume, since it's a mobile GPU that was apparently still based on the same older Mobility HD4xxx generation GPUs) |
22)
Message boards :
Number crunching :
CUDA Versions
(Message 1559163)
Posted 19 Aug 2014 by ![]() Post: From my understanding, that switch removes a significant bunch of CPU load (which does effectively nothing for the processing but waits, blocking it as a resource for other CPU loads). It doesn't work as expected on the latest 340.xx Drivers, so to use that switch it's highly recommended to use the previous (337.88) Driver instead. Once it runs, it frees up alot of CPU time; in most standard configurations that will also remove the competition for CPU time on the AP tasks in general, as they're very CPU intensive to keep the GPUs loaded. In effect, I think that will speed up overall performance, especially when running multiple AP tasks/GPU or in Multi-GPU configurations or anytime running AP workunits with heavy blanking in general (which are the most extreme CPU/GPU hogs and take by far the longest time to complete). |
23)
Message boards :
Number crunching :
CUDA Versions
(Message 1558222)
Posted 17 Aug 2014 by ![]() Post: @Falconfly I just lowered it into the mainstream standard settings to see if it made any performance difference. But then, maybe I'm just getting paranoid about any cards running way too slow. At times, it seems whenever I'm monitoring a System to check vs. expected performance - it always seems runs very slow workunits at that time, freaking me out :p After another night of troubleshooting hardware configs, I'm possibly also just too tired. I guess I'm better off letting the running systems just run their course and simply stop meddling around with them. -- edit -- Alright... - reverted the 750Ti's back to their former 12/8192/4096 - set the R9 290 and GXT780 to 12/16384/8192 On the MB config files, I don't know if any further change could do anything. Mainly just set them to -sbs 256 (on many cards I could go far higher than that with 2 tasks/GPU... but I've read nowhere that this would be effective much) |
24)
Message boards :
Number crunching :
CUDA Versions
(Message 1558170)
Posted 17 Aug 2014 by ![]() Post: I actually did - but then refrained from using the numbers as I never knew how many WorkUnits/GPU that user had running at the time (and if the parameters might reflect that). To me it seems impossible to judge the effect of the visible parameters vs. achieved runtimes without knowing that; the variable WorkUnit runtimes add to that. It could look awesome, but just be the result from only running 1 Task/GPU and/or a quick Workunit - or terrible but someone ran alot of tasks in parallel. I just didn't have enough time to dig very deep into all these details. Some of my older hardware was giving me terribly headaches in getting it back to work after reassembling what I had left after years of inactivity. That cost me a shitload of unplanned time debugging those configs ;) And just to show how my luck goes : Just yesterday, I was able to get my hands on 2 R9 290 cards. ...just to find out one did not fit the intended case (my bad, should have known these beasts are long :p ) ...and the other one had a rare manufacturing error with one of the PCIe power plugs being soldered ~1/4 inch misplaced, making it impossible to accept power over that plug :p Took me all night to reconfigure cards where they fit and get at least the one R9 290 to work ....just to find out the System being stripped of its GTX780 (750TI remained in place) saw itself now overcommitted on CUDA tasks and did not load a single fresh workunit for the R9 290 - which sat completely idle all night until an hour ago. 30 Mins ago, I saw one running task vanished in the network. Turns out BOINC on one host had continously refreshed itself on CUDA tasks but fully depleted all tasks for the running AMD APU - which is now sitting idle (machine of course states "Host has reached its daily limits on tasks"). Just another typical day in my hardware lab *g* |
25)
Message boards :
Number crunching :
CUDA Versions
(Message 1558166)
Posted 17 Aug 2014 by ![]() Post: Oh, *lol*... I never looked there for this figure - as when I was setting all those GPUs up, they naturally hadn't a single Result done yet ;) Maybe that's something that could be included into a future Lunatics reference document. (I do remember searching hours for "Compute Units" of the GTX 750ti (actually didn't find it), GT610 and GT720M. That's likely why I eventually just used 'high performance defaults should so' on the 750ti ;) ) There doesn't seem to be a universal/fixed Cuda Cores / X = Compute Units formular or rule of thumb. If there is, that would help alot as well. |
26)
Message boards :
Number crunching :
CUDA Versions
(Message 1558161)
Posted 17 Aug 2014 by ![]() Post: @falconfly - Please forgiveme my intrusion but I still belive unroll -12 is to high for the 750Ti since it apears to have only 5 CU (12 is ok on the 780 who have 12 CU). Maybe that is the reason of the random hung tasks. Anyone could confirm if i´m right or wrong please? Hm, not too sure about that. From my few days of experience with the GTX750i's, they did seem to deliver what was expected. So far, I didn't run into problems anymore (since setting Windows to high performance profile) Lacking good comparison numbers for expected runtimes vs. mine, that's all I can tell so far. Overall, I also found it very difficult to find out how many CUs each Nvidida GPU has; almost all specifications I read on review pages or on NVidia site itself never speak of Compute Units, only Cuda or Shader cores. Therefor I just went with how powerful the NVidia GPU is in general. Since the GTX 750ti has 640 Cuda Cores, I initially went for the Lunatics ReadMe recommendation for the high performance cards (which still list last-gen gen cards as a reference examples). I do see your point though, comparing the 750 to the 780. Running 4 instead of 2 tasks/GPU on them also resulted in a significant slowdown (distinct overall performance loss), that I think can be credited to the -12 unroll figure indeed being too high on the 750's. -------------- I did try 16384/8192 -ffa_block 16384 -ffa_block_fetch 8192 on the GTX 780 once (as it has a massive 2304 Cuda Cores, to whatever number of Compute Units that translates into). But I noted a drastic slowdown using these figures, and performance went right back to normal after reverting back to the old figures. Not sure if that was an exception, as I also seem to have observed that WorkUnits in progress apparently (looked like it to me) react very sensitive to being restarted with different tuning parameters. The only WorkUnits that have errored out on me so far were ones that were resumed with changed settings. Could be that fresh WorkUnits started with abovementioned settings could show the performance improvements you're suggesting (given the power of the 780, at least I had the same idea with the same numbers ;) ) -------------- Overall, from using many different tuning parameters, I got the impression that the possible performance gain is limited (a few % at best I guess) when finding the perfect combination. However, when overdoing it, it appeared to me that the risk of far greater performance loss and possibly even instability is comparably significant and often outweighs any potential gains; especially with so little time to build experience with it. (the highly variable and relatively difficult-to-compare runtimes of boths MB and AP WorkUnits don't help that case either) That's why I eventually reverted to known, more or less failsafe figures as they are stated in the various Lunatics ReadMe files. I already lost far more output already due to bad tuning settings than newer, perfect settings could recover for me by now. I'm now giving the 750's -unroll 10 -ffa_block 6144 -ffa_block_fetch 1536, will see how that works. (acc. to the readme, that should suit midrange cards, which I'd count the 750ti into, I initially misjudged the Maxwell GPU as more powerful) They're still running 2 AP or MB Tasks/GPU, they should be able to handle that. |
27)
Message boards :
Number crunching :
CUDA Versions
(Message 1558139)
Posted 17 Aug 2014 by ![]() Post: Thats the point. Hmmm... So what about the Turbo of modern AMD CPUs? Although it's only a few hundred MHz, they do tend to still vary in clock even in the high performance profile. So far, I haven't encountered any hung Tasks anymore... so for now I think I'm well set. |
28)
Message boards :
Number crunching :
CUDA Versions
(Message 1557918)
Posted 16 Aug 2014 by ![]() Post: Hmm, I stumbled over a nasty surprise this morning : On this Host I found two AP workunits got stuck overnight, that had been wasting GPU time on the GTX750ti for many hours without making any further progress. Is is that a known contingency requiring frequent monitoring/caretaking ? Upon quitting and restarting BOINC, they finished normal with and with expected performance. Paramaters for the Lun v0.42 App I'm using on that host : -unroll 12 -ffa_block 8192 -ffa_block_fetch -hp -tune 1 64 4 1 -use_sleep 2 AP were Task running parallel, one using >40% CPU and one using only 0.5% CPU but both definitely stuck. Anything I could do to prevent that in the future ? GPUs are very well cooled and GPUs hardly exceed 50deg C. For now, I've set the Windows power management to high performance profile (basically already was, I ran on balanced profile with every power-saving turned off with the exception of the CPU allowed to clock down if able to) |
29)
Message boards :
Number crunching :
CUDA Versions
(Message 1557555)
Posted 15 Aug 2014 by ![]() Post: Thanks, I'll update to those values on my ATI rigs. And the NVidia hosts look fine, now I only see occasional tasks still taking unusually high (>50%) CPU time and lasting generally much longer. I guess those might just be "blanked" WorkUnits I've heard some chatter about (?) |
30)
Message boards :
Number crunching :
CUDA Versions
(Message 1557497)
Posted 15 Aug 2014 by ![]() Post: Hmpf, why do I have to come up with new questions after the edit period is over? :p I've searched the Forums for their use of the commandline switches and found only CUDA/NVidia related examples. Since I have 2 AMD/ATI based hosts as well, this is what I'm using so far (based on the ReadMe recommendations) : -unroll 12 -ffa_block 8192 -ffa_block_fetch 4096 (mix system with HD7970 + HD7790) -unroll 10 -ffa_block 6144 -ffa_block_fetch 1536 -hp (mix system with HD7850 + HD7750) Any kernel tuning sets known good for these GPU combinations at hand? I would assume -tune 1 64 4 1 should work at least on the first combo running the more potent cards, not sure though about the weaker combo (don't want to "break" the running config by feeding it bad tuning parameters). That's why I haven't implemented any so far. Just don't know what difference the ATI cards make on that matter vs. NVidia card, at least CPU usage isn't a factor with them. PS. Sorry for the n00bish questions, I just got back too late into SETI to experiment myself (which I normally do extensively) |
31)
Message boards :
Number crunching :
CUDA Versions
(Message 1557446)
Posted 15 Aug 2014 by ![]() Post: @Falconfly Just updated, so all results coming in from now on should reflect them. CPU load seems very low now, went from 98%/Task to more like 10%/Task on Astropulse. Visible performance at least seems normal just by looking at Workunits in progress. |
32)
Message boards :
Number crunching :
CUDA Versions
(Message 1557372)
Posted 15 Aug 2014 by ![]() Post: Hmpf... So you're saying despite the significant drop in GPU load, the performance remains unaffected? (I was looking at real tasks running while monitoring GPU loads) |
33)
Message boards :
Number crunching :
CUDA Versions
(Message 1557356)
Posted 15 Aug 2014 by ![]() Post: Hmpf... Even with the older Driver, I get the same results when using -use_sleep : The CPU load drops as advertised - but so does GPU load and presumably performance (?) |
34)
Message boards :
Number crunching :
CUDA Versions
(Message 1557304)
Posted 15 Aug 2014 by ![]() Post: You neither use the tune switch nor use_sleep. Alright, will give that combo a shot after reverting the GTX750ti/GTX780 to an older Driver in a few hours :) (since my only GTX780 is in a mixed System with one of the GTX750ti, the 750's set the pacing for all tweaks I used so far) |
35)
Message boards :
Number crunching :
CUDA Versions
(Message 1557298)
Posted 15 Aug 2014 by ![]() Post: Yes, I'm running all NVidia cards with the cmdline_switch examples suggested in the ReadMe according to the GPU capabilities now (both AP and mbcuda config files). The use_sleep switch was the last one I was experimenting with. For any advanced kernel tuning and longer running benches I don't have time anymore (WOW 2014 race starting within hours ;) ) The two affected systems do have have iGPUs (AMD APUs) though, which for now is not used anymore, as it was competing as a 5th task with the 4 fast GPU tasks over CPU time on the quadcore CPUs. Those iGPUs I could still activate, if I can manage to free some additional CPU time. I'll likely give that a shot today with a previous NVidia Driver. Other than that, I think I'm set and have gotten just about the most out of the systems in the little time I had setting them up & finding the sweet spot for performance. |
36)
Message boards :
Number crunching :
CUDA Versions
(Message 1557285)
Posted 15 Aug 2014 by ![]() Post: Since the only systems affected run an equal amount of CPU cores to number of AP tasks exclusively, I think I'm okay... Unless someone can say that there's a performance improvement coming from the -use_sleep option.... Then I could be tempted to change the setup one last time *g* |
37)
Message boards :
Number crunching :
CUDA Versions
(Message 1557091)
Posted 15 Aug 2014 by ![]() Post: Hm, I think I see the problem now. With the 340.52 Driver installed and 2 AP tasks running per GPU, this is what GPU-Z shows for my GTX780 : -use_sleep Disabled ... 98% GPU load -use_sleep Enabled .... 49-69% GPU load (fluctuating) That certainly equals significantly higher runtimes, so I'll better leave the switch left alone with my installed 340.52 driver. |
38)
Message boards :
Number crunching :
CUDA Versions
(Message 1557084)
Posted 15 Aug 2014 by ![]() Post: Can someone give me a quick brushup how to see if the 340.52 Driver/-use_sleep combination is causing problems ? What are the symptoms/results if that problem occurs ? I've just used the switch on two of my hosts and the CPU load immediately dropped by almost 40%. So far, it looks good to me (and I have the 340.52 Driver). Hardware it's used on : GT610, GTX750ti and GTX780 |
39)
Message boards :
Number crunching :
SETI@Home Wow!-Event 2014
(Message 1556767)
Posted 14 Aug 2014 by ![]() Post: I thought this was starting today. Until I realized that the 15th was Friday. Doh! hehe, happens ;) This is going to be a really good race I think, can't wait for it to start :D PS. 646 contestants as of right now - still some ~23hrs time to join the frenzy. |
40)
Message boards :
Number crunching :
Lunatics Windows Installer v0.42 Release Notes
(Message 1555416)
Posted 12 Aug 2014 by ![]() Post: -- edit -- Disregard, found it :) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.