Message boards :
Number crunching :
Summarize Cuda?
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Wow..........this is going to be way over my head to write an app_info that works. I'll leave it to Jasson and the pro's. Boinc....Boinc....Boinc....Boinc.... |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Wow..........this is going to be way over my head to write an app_info that works. I'll leave it to Jasson and the pro's. Nah, not that hard, though I have managed to flush my cache twice trying things out :C. Here's the relevant bits you need to insert that *appear* to be working for me, (No warranty, explicit or implied, Use at your own risk) ... ... <file_info> <name>setiathome_6.05_windows_intelx86__cuda.exe</name> <executable/> </file_info> <file_info> <name>cudart.dll</name> <executable/> </file_info> <file_info> <name>cufft.dll</name> <executable/> </file_info> <file_info> <name>libfftw3f-3-1-1a_upx.dll</name> <executable/> </file_info> .. .. <app_version> <app_name>setiathome_enhanced</app_name> <version_num>605</version_num> <plan_class>cuda</plan_class> <avg_ncpus>0.025947</avg_ncpus> <max_ncpus>0.025947</max_ncpus> <flops>3702857142.857143</flops> <api_version>6.3.22</api_version> <coproc> <type>CUDA</type> <count>1</count> </coproc> <file_ref> <file_name>setiathome_6.05_windows_intelx86__cuda.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>cudart.dll</file_name> <open_name>cudart.dll</open_name> </file_ref> <file_ref> <file_name>cufft.dll</file_name> <open_name>cufft.dll</open_name> </file_ref> <file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> <open_name>libfftw3f-3-1-1a_upx.dll</open_name> </file_ref> </app_version> "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Euan Holton Send message Joined: 4 Sep 99 Posts: 65 Credit: 17,441,343 RAC: 0 |
Fair enough about the libfft thing; don't mind being wrong. In that sample code, would you say that the following lines are strictly necessary: <avg_ncpus>0.025947</avg_ncpus> As they seem like they would be pretty machine-specific. And I take it the API line is there because the CUDA uses a different version? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
The ncpus fields, are something to do with boinc's application scheduling that doesn't quite seem to work properly yet, but I gather determines how many apps to run (in concert with the coproc section). Leaving them out would probably imply the app needs a whole cpu core. so with that extra, probably will default to run separate apps on all the cores + the GPU every time properly. Unfortunately it doesn;t seem to work within the same application version domain, so on a quad that would probably mean 4x Astropulse + 1 x MBCuda. I believe the coproc stuff & extra fields have a minimum boicapi version they were introduced, so require functionality from Boinc not found in earlier versions. I dug this one out of the client state, or similar location, but seems the reasonable explanation. The <flops> has no effect I can discern yet, but I'd expect it to ultimately, in an updated boinc, be used for scheduling between multiple apps for the same & other projects, by calculating the best throughput combo available with the given CPUs & coprocessors installed. ... doesn't work yet AFAICT. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Could someone knowledgeable summarize how cuda is architected, to save the rest of us some time? The short answer is: NVIDIA publishes an API, and the SETI application calls that API with the appropriate data. A decent API (and I expect NVIDIA to do a decent API) is language-neutral. They may (and likely do) provide libraries for popular compilers that call their API. The rest depends on how BOINC supports coprocessors, and how SETI wrote the code. It may be possible to run entirely in the GPU, but I'm sure that is a lot more work than having the CPU feed the calculations to the video card. |
Daniel Send message Joined: 21 May 07 Posts: 562 Credit: 437,494 RAC: 0 |
Well, I'm not going to be able to try this new CUDA stuff until my new video card arrives either Friday or Monday. Did you order from Tiger? If so, you might not want to look at this link.. http://www.newegg.com/Product/Product.aspx?Item=N82E16814130391 Daniel |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Why is.......... file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> <open_name>libfftw3f-3-1-1a_upx.dll</open_name> </file_ref> required for the CUDA MB section of the app_info when......... file_ref> <file_name>libfftw3f-3-1-1a_upx.dll</file_name> </file_ref> works just fine in the AP 5.00 section of an app_info file?? Boinc....Boinc....Boinc....Boinc.... |
W-K 666 Send message Joined: 18 May 99 Posts: 19407 Credit: 40,757,560 RAC: 67 |
in the CUDA FAQ it says: Q) Does SETI@home run GPU and CPU versions simultaneously? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Why is.......... I don't think the <open_name> construct is needed unless it's an alias - but I could be wrong. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Not required I think. It's another field that just came out of the stock config files that seems to do no harm. It probably gives boinc some friendly name to refer to in logs etc. It is present, along with the other fields, in the example app_info in the boinc wiki at: https://boinc.berkeley.edu/wiki/Anonymous_platform "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Euan Holton Send message Joined: 4 Sep 99 Posts: 65 Credit: 17,441,343 RAC: 0 |
Thanks for the info, jason_gee. I'll try what you've posted when I get home. |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
Absolutely thanks to jason_gee. For this and all the work you have done for us. Boinc....Boinc....Boinc....Boinc.... |
Andrew Mueller Send message Joined: 29 Jun 06 Posts: 4 Credit: 1,022,895 RAC: 0 |
CUDA was running 3+1 originally, then I updated my user preferences to utilize 5 cores. CUDA then ran 4+1, but now for some reason my client is only running 3+1 again. How do I fix this? :( |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
CUDA was running 3+1 originally, then I updated my user preferences to utilize 5 cores. CUDA then ran 4+1, but now for some reason my client is only running 3+1 again. How do I fix this? :( Sounds like the +1 might be running in EDF, in which case, as I understand it, it claims a whole CPU to ensure that it completes as quickly as possible. You may well find that it returns to 4 + 1 in the fullness of time. F. |
Andrew Mueller Send message Joined: 29 Jun 06 Posts: 4 Credit: 1,022,895 RAC: 0 |
I don't know what EDF is, but I've read in other threads that if work units are due in only a few days, BOINC runs them in "high priority", and dedicates an entire CPU to working with the GPU, so once you have "high priority" work units, you only run 3+1 instead of 4+1. I only have 80% processor utilization. ;( |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I don't know what EDF is, but I've read in other threads that if work units are due in only a few days, BOINC runs them in "high priority", and dedicates an entire CPU to working with the GPU, so once you have "high priority" work units, you only run 3+1 instead of 4+1. I only have 80% processor utilization. ;( EDF is short for Earliest Deadline First.. high priority mode. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Andrew Mueller Send message Joined: 29 Jun 06 Posts: 4 Credit: 1,022,895 RAC: 0 |
Well, NOW it's running 4x high-priority threads on my quad-core, and an extra thread on it for the GPU which is just "running" (not high-priority). Interesting. In other words, it's back to 4+1! |
Euan Holton Send message Joined: 4 Sep 99 Posts: 65 Credit: 17,441,343 RAC: 0 |
In the interest of keeping the total number of CUDA threads manageable, I'll post my CUDA thoughts and experiences from the last couple of days here. First, some background on the box I'm running CUDA on. Last weekend, I finally got the last pieces for my first major upgrade in four years. Core i7, lots of memory, GTX260 (216) GPU. The main aim of the box is gaming, with idle time used for BOINC projects. Unlike my previous main box, an elderly but dignified hyper-threading Pentium 4, I'm happy to leave BOINC running while gaming on the Monolith, as the four cores / eight threads can easily cope with the demands of almost all games with plenty of CPU capacity to spare. I installed the AK Optimised Apps, and saw wonderful throughput. I was intrigued by the CUDA release, but had some reservations when I read the FAQ, specifically on the point that the 6.05 app will only process on the GPU, meaning only one Enhanced workunit at a time would be run on my machine. Still, Astropulse and Einstein could keep the rest of the CPU humming while the GPU knocks out CUDA units. In my haste to experiment with the CUDA app, I accidentally trash the app_info.xml file too soon and nuke a bunch of Enhanced WUs. No biggy, detach and reattach to let the server know I won't be processing them. Machine starts getting CUDA WUs, and processes them in a serial fashion. At this point I note a couple of things: 1) while the CPU time for the WU does accurately reflect how much actual CPU time it consumes, it doesn't accurately reflect how much time it is using compute resources on my machine (180 seconds CPU time during the 10 - 15 minute run-time, and it's undoubtedly exercising the GPU for most of that time). Secondly, and more seriously, expected completion times were way, way off from what I'd come to expect from the previous few days of crunching. I took a peek at SETI's records on the computer and was surprised to see the Duration Correction Factor had changed from the around 0.14 it was to 100! No wonder every unit was being run in EDF mode, and GPU units were consuming - and underusing - an entire thread of CPU resources. Worse, two Astropulse units had been downloaded and - with 'expected durations' of 4400 hours, were also being run in EDF mode and thus were blocking BOINC from downloading any new SETI data! And, of course, was not using an optimised application for reasons I'll get into in a moment. I also had some issues with the CUDA application. The AK SSE4.1 optimised app had proven to be completely reliable; however, the CUDA app aborted some WUs early (-9 too many results issue) and even crashed my video driver with compute errors on a number of occasions. This may be down to the fact that I am using a WHQL 180 series driver, meaning there are possible CUDA 2.1 issues, but I need that driver version for proper X58 functionality with nVidia cards. The video driver crashes appeared to be mitigated by a reboot, though, so there may have been something else up I wasn't aware of. With how new the app was, I decided to hold back a day on investigating what would be needed in app_info.xml to allow optimised Astropulse and stock CUDA to run together. I tried last night to implement jason_gee's suggestions above but could not get it to work (and wasn't able to dedicate the concentration needed on the task for various reasons), accidentally trashing more WUs in the process. After detaching and reattaching again to make sure the affected WUs are quickly farmed out again for processing, I did some thinking about my experiences and came to the following conclusions: 1) The CUDA app is too unstable for production work on my machine, producing a significant number of -9 erroneous results and creating compute error conditions that force Vista into resetting the video driver. 2) DCF issues result in underutilisation of processor threads and, between that and the way that machine resources are reported, I think I may have been getting better Enhanced throughput with the regular AK optimised apps. 3) Until I better understand how the video card resource contention between games and CUDA apps resolves, I won't leave BOINC running while gaming unless I suspend SETI@Home, something I don't want to be forced into doing. The reason I bought a powerful GPU is to give me top notch eye candy and frame rates; CUDA is a bonus, but not much of one if running CUDA science apps and games simultaneously degrades my play experience. 4) Even if I did get it running, I am loathe to use app_info to allow stock CUDA and optimised Astropulse to run together as I am sure the stock CUDA app will get some updates in the days and weeks ahead; as I can't get to the box while I'm at work, there would be a real prospect of missing a new stock CUDA release for a number of hours, resulting in potential unnecessary video driver crashes and unnecessary -9 aborted WUs. 5) The BOINC client science application handling and resource allocation does not appear to be flexible enough to allow what would, for me and I am sure others, be the ideal combination of applications: a stock CUDA application, an optimised Enhanced application to process some units on spare CPU threads, and an optimised Astropulse application. The conclusions led me to decide that CUDA was not ready for prime time - I have to wonder if there was some pressure from nVidia PR which led to the premature release - and I have reverted the Monolith to "traditional" AK optimised applications until the CUDA implementation stabilises and becomes more flexible. |
ML1 Send message Joined: 25 Nov 01 Posts: 21247 Credit: 7,508,002 RAC: 20 |
... The conclusions led me to decide that CUDA was not ready for prime time - I have to wonder if there was some pressure from nVidia PR which led to the premature release... Possibly. However, the present exposure should promote some rapid development and fixes. Note that s@h on Boinc should really be considered as 'experimental'. We were warned (on these forums at least) that the CUDA stuff was very new and 'exciting' (in all ways)! Hang in there, Happy fast crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Crunch3r Send message Joined: 15 Apr 99 Posts: 1546 Credit: 3,438,823 RAC: 0 |
more likely it's a combination of Nvidias PR and the donation drive to attract as much users that fall for the gpu hype and squeeze money out of them ... Join BOINC United now! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.