Summarize Cuda?

Message boards : Number crunching : Summarize Cuda?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 841513 - Posted: 18 Dec 2008, 17:57:02 UTC

Wow..........this is going to be way over my head to write an app_info that works. I'll leave it to Jasson and the pro's.
Boinc....Boinc....Boinc....Boinc....
ID: 841513 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 841518 - Posted: 18 Dec 2008, 18:06:48 UTC - in response to Message 841513.  
Last modified: 18 Dec 2008, 18:07:26 UTC

Wow..........this is going to be way over my head to write an app_info that works. I'll leave it to Jasson and the pro's.


Nah, not that hard, though I have managed to flush my cache twice trying things out :C.

Here's the relevant bits you need to insert that *appear* to be working for me, (No warranty, explicit or implied, Use at your own risk)
...
...
    	<file_info>
        	<name>setiathome_6.05_windows_intelx86__cuda.exe</name>
	        <executable/>
	</file_info>
	<file_info>
        	<name>cudart.dll</name>
		<executable/>
	</file_info>
	<file_info>
           	<name>cufft.dll</name>
		<executable/>
	</file_info>
	<file_info>
        	<name>libfftw3f-3-1-1a_upx.dll</name>
		<executable/>
	</file_info>

..
..

	    <app_version>
		    <app_name>setiathome_enhanced</app_name>
		    <version_num>605</version_num>
		    <plan_class>cuda</plan_class>
		    <avg_ncpus>0.025947</avg_ncpus>
		    <max_ncpus>0.025947</max_ncpus>
		    <flops>3702857142.857143</flops>
		    <api_version>6.3.22</api_version>
		    <coproc>
			<type>CUDA</type>
		        <count>1</count>
		    </coproc>
		    <file_ref>
		        <file_name>setiathome_6.05_windows_intelx86__cuda.exe</file_name>
		        <main_program/>
		    </file_ref>
		    <file_ref>
		        <file_name>cudart.dll</file_name>
			<open_name>cudart.dll</open_name>
		    </file_ref>
		    <file_ref>
		        <file_name>cufft.dll</file_name>
			<open_name>cufft.dll</open_name>
		    </file_ref>
		    <file_ref>
		        <file_name>libfftw3f-3-1-1a_upx.dll</file_name>
			<open_name>libfftw3f-3-1-1a_upx.dll</open_name>
		    </file_ref>
	     </app_version> 

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 841518 · Report as offensive
Profile Euan Holton
Avatar

Send message
Joined: 4 Sep 99
Posts: 65
Credit: 17,441,343
RAC: 0
United Kingdom
Message 841531 - Posted: 18 Dec 2008, 18:43:06 UTC - in response to Message 841518.  

Fair enough about the libfft thing; don't mind being wrong.

In that sample code, would you say that the following lines are strictly necessary:

<avg_ncpus>0.025947</avg_ncpus>

<max_ncpus>0.025947</max_ncpus>

<flops>3702857142.857143</flops>


As they seem like they would be pretty machine-specific.

And I take it the API line is there because the CUDA uses a different version?
ID: 841531 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 841539 - Posted: 18 Dec 2008, 18:55:32 UTC - in response to Message 841531.  

The ncpus fields, are something to do with boinc's application scheduling that doesn't quite seem to work properly yet, but I gather determines how many apps to run (in concert with the coproc section). Leaving them out would probably imply the app needs a whole cpu core. so with that extra, probably will default to run separate apps on all the cores + the GPU every time properly. Unfortunately it doesn;t seem to work within the same application version domain, so on a quad that would probably mean 4x Astropulse + 1 x MBCuda.

I believe the coproc stuff & extra fields have a minimum boicapi version they were introduced, so require functionality from Boinc not found in earlier versions. I dug this one out of the client state, or similar location, but seems the reasonable explanation.

The <flops> has no effect I can discern yet, but I'd expect it to ultimately, in an updated boinc, be used for scheduling between multiple apps for the same & other projects, by calculating the best throughput combo available with the given CPUs & coprocessors installed. ... doesn't work yet AFAICT.

Jason

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 841539 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 841541 - Posted: 18 Dec 2008, 18:57:13 UTC - in response to Message 841426.  

Could someone knowledgeable summarize how cuda is architected, to save the rest of us some time?

For example, it seems that the application is written in VS/C++. So that is a Window's platform. How does it gain access to the GPU resources?

Does the cuda app appear in the task manager as separate process? That is, say, on a quad would I see 4 processes running on the quad and 1 more running, which is the cuda? Or would I see 4 'normal' and 4 'cudas'?

Is there a problem with buying a cuda-capable card and putting it into my oldest machine (dual P-II) for seti-only purposes?

By the way, is the VS/C++ cuda application actually a .NET application, too?

The short answer is: NVIDIA publishes an API, and the SETI application calls that API with the appropriate data.

A decent API (and I expect NVIDIA to do a decent API) is language-neutral. They may (and likely do) provide libraries for popular compilers that call their API.

The rest depends on how BOINC supports coprocessors, and how SETI wrote the code. It may be possible to run entirely in the GPU, but I'm sure that is a lot more work than having the CPU feed the calculations to the video card.
ID: 841541 · Report as offensive
Profile Daniel
Volunteer tester
Avatar

Send message
Joined: 21 May 07
Posts: 562
Credit: 437,494
RAC: 0
United States
Message 841551 - Posted: 18 Dec 2008, 19:10:47 UTC - in response to Message 841504.  
Last modified: 18 Dec 2008, 19:11:26 UTC

Well, I'm not going to be able to try this new CUDA stuff until my new video card arrives either Friday or Monday.


Did you order from Tiger? If so, you might not want to look at this link..

http://www.newegg.com/Product/Product.aspx?Item=N82E16814130391
Daniel

ID: 841551 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 841552 - Posted: 18 Dec 2008, 19:11:00 UTC

Why is..........
file_ref>
    <file_name>libfftw3f-3-1-1a_upx.dll</file_name>
    <open_name>libfftw3f-3-1-1a_upx.dll</open_name>
</file_ref>

required for the CUDA MB section of the app_info when.........
file_ref>
    <file_name>libfftw3f-3-1-1a_upx.dll</file_name>
</file_ref>

works just fine in the AP 5.00 section of an app_info file??
Boinc....Boinc....Boinc....Boinc....
ID: 841552 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19407
Credit: 40,757,560
RAC: 67
United Kingdom
Message 841559 - Posted: 18 Dec 2008, 19:26:28 UTC

in the CUDA FAQ it says:
Q) Does SETI@home run GPU and CPU versions simultaneously?
No. If BOINC determines your CPU is capable of running the CUDA version, only the CUDA version of SETI@home will run. One copy will run on each GPU you have installed. If you want to keep your CPUs occupied at the same time, you can join another BOINC project.
ID: 841559 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 841562 - Posted: 18 Dec 2008, 19:28:05 UTC - in response to Message 841552.  

Why is..........
file_ref>
    <file_name>libfftw3f-3-1-1a_upx.dll</file_name>
    <open_name>libfftw3f-3-1-1a_upx.dll</open_name>
</file_ref>

required for the CUDA MB section of the app_info when.........
file_ref>
    <file_name>libfftw3f-3-1-1a_upx.dll</file_name>
</file_ref>

works just fine in the AP 5.00 section of an app_info file??

I don't think the <open_name> construct is needed unless it's an alias - but I could be wrong.
ID: 841562 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 841565 - Posted: 18 Dec 2008, 19:29:46 UTC - in response to Message 841552.  
Last modified: 18 Dec 2008, 19:35:30 UTC

Not required I think. It's another field that just came out of the stock config files that seems to do no harm. It probably gives boinc some friendly name to refer to in logs etc.

It is present, along with the other fields, in the example app_info in the boinc wiki at:

https://boinc.berkeley.edu/wiki/Anonymous_platform
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 841565 · Report as offensive
Profile Euan Holton
Avatar

Send message
Joined: 4 Sep 99
Posts: 65
Credit: 17,441,343
RAC: 0
United Kingdom
Message 841587 - Posted: 18 Dec 2008, 19:49:59 UTC - in response to Message 841565.  

Thanks for the info, jason_gee. I'll try what you've posted when I get home.
ID: 841587 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 841589 - Posted: 18 Dec 2008, 19:54:39 UTC

Absolutely thanks to jason_gee. For this and all the work you have done for us.

Boinc....Boinc....Boinc....Boinc....
ID: 841589 · Report as offensive
Andrew Mueller

Send message
Joined: 29 Jun 06
Posts: 4
Credit: 1,022,895
RAC: 0
United States
Message 841622 - Posted: 18 Dec 2008, 20:45:10 UTC

CUDA was running 3+1 originally, then I updated my user preferences to utilize 5 cores. CUDA then ran 4+1, but now for some reason my client is only running 3+1 again. How do I fix this? :(
ID: 841622 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 841666 - Posted: 18 Dec 2008, 21:57:55 UTC - in response to Message 841622.  

CUDA was running 3+1 originally, then I updated my user preferences to utilize 5 cores. CUDA then ran 4+1, but now for some reason my client is only running 3+1 again. How do I fix this? :(

Sounds like the +1 might be running in EDF, in which case, as I understand it, it claims a whole CPU to ensure that it completes as quickly as possible. You may well find that it returns to 4 + 1 in the fullness of time.

F.
ID: 841666 · Report as offensive
Andrew Mueller

Send message
Joined: 29 Jun 06
Posts: 4
Credit: 1,022,895
RAC: 0
United States
Message 841675 - Posted: 18 Dec 2008, 22:06:54 UTC

I don't know what EDF is, but I've read in other threads that if work units are due in only a few days, BOINC runs them in "high priority", and dedicates an entire CPU to working with the GPU, so once you have "high priority" work units, you only run 3+1 instead of 4+1. I only have 80% processor utilization. ;(
ID: 841675 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 841708 - Posted: 18 Dec 2008, 23:34:00 UTC - in response to Message 841675.  

I don't know what EDF is, but I've read in other threads that if work units are due in only a few days, BOINC runs them in "high priority", and dedicates an entire CPU to working with the GPU, so once you have "high priority" work units, you only run 3+1 instead of 4+1. I only have 80% processor utilization. ;(

EDF is short for Earliest Deadline First.. high priority mode.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 841708 · Report as offensive
Andrew Mueller

Send message
Joined: 29 Jun 06
Posts: 4
Credit: 1,022,895
RAC: 0
United States
Message 841810 - Posted: 19 Dec 2008, 4:37:34 UTC

Well, NOW it's running 4x high-priority threads on my quad-core, and an extra thread on it for the GPU which is just "running" (not high-priority). Interesting. In other words, it's back to 4+1!
ID: 841810 · Report as offensive
Profile Euan Holton
Avatar

Send message
Joined: 4 Sep 99
Posts: 65
Credit: 17,441,343
RAC: 0
United Kingdom
Message 842116 - Posted: 19 Dec 2008, 18:10:09 UTC

In the interest of keeping the total number of CUDA threads manageable, I'll post my CUDA thoughts and experiences from the last couple of days here.

First, some background on the box I'm running CUDA on. Last weekend, I finally got the last pieces for my first major upgrade in four years. Core i7, lots of memory, GTX260 (216) GPU. The main aim of the box is gaming, with idle time used for BOINC projects. Unlike my previous main box, an elderly but dignified hyper-threading Pentium 4, I'm happy to leave BOINC running while gaming on the Monolith, as the four cores / eight threads can easily cope with the demands of almost all games with plenty of CPU capacity to spare. I installed the AK Optimised Apps, and saw wonderful throughput.

I was intrigued by the CUDA release, but had some reservations when I read the FAQ, specifically on the point that the 6.05 app will only process on the GPU, meaning only one Enhanced workunit at a time would be run on my machine. Still, Astropulse and Einstein could keep the rest of the CPU humming while the GPU knocks out CUDA units.

In my haste to experiment with the CUDA app, I accidentally trash the app_info.xml file too soon and nuke a bunch of Enhanced WUs. No biggy, detach and reattach to let the server know I won't be processing them. Machine starts getting CUDA WUs, and processes them in a serial fashion.

At this point I note a couple of things: 1) while the CPU time for the WU does accurately reflect how much actual CPU time it consumes, it doesn't accurately reflect how much time it is using compute resources on my machine (180 seconds CPU time during the 10 - 15 minute run-time, and it's undoubtedly exercising the GPU for most of that time). Secondly, and more seriously, expected completion times were way, way off from what I'd come to expect from the previous few days of crunching. I took a peek at SETI's records on the computer and was surprised to see the Duration Correction Factor had changed from the around 0.14 it was to 100! No wonder every unit was being run in EDF mode, and GPU units were consuming - and underusing - an entire thread of CPU resources. Worse, two Astropulse units had been downloaded and - with 'expected durations' of 4400 hours, were also being run in EDF mode and thus were blocking BOINC from downloading any new SETI data! And, of course, was not using an optimised application for reasons I'll get into in a moment.

I also had some issues with the CUDA application. The AK SSE4.1 optimised app had proven to be completely reliable; however, the CUDA app aborted some WUs early (-9 too many results issue) and even crashed my video driver with compute errors on a number of occasions. This may be down to the fact that I am using a WHQL 180 series driver, meaning there are possible CUDA 2.1 issues, but I need that driver version for proper X58 functionality with nVidia cards. The video driver crashes appeared to be mitigated by a reboot, though, so there may have been something else up I wasn't aware of.

With how new the app was, I decided to hold back a day on investigating what would be needed in app_info.xml to allow optimised Astropulse and stock CUDA to run together. I tried last night to implement jason_gee's suggestions above but could not get it to work (and wasn't able to dedicate the concentration needed on the task for various reasons), accidentally trashing more WUs in the process. After detaching and reattaching again to make sure the affected WUs are quickly farmed out again for processing, I did some thinking about my experiences and came to the following conclusions:

1) The CUDA app is too unstable for production work on my machine, producing a significant number of -9 erroneous results and creating compute error conditions that force Vista into resetting the video driver.
2) DCF issues result in underutilisation of processor threads and, between that and the way that machine resources are reported, I think I may have been getting better Enhanced throughput with the regular AK optimised apps.
3) Until I better understand how the video card resource contention between games and CUDA apps resolves, I won't leave BOINC running while gaming unless I suspend SETI@Home, something I don't want to be forced into doing. The reason I bought a powerful GPU is to give me top notch eye candy and frame rates; CUDA is a bonus, but not much of one if running CUDA science apps and games simultaneously degrades my play experience.
4) Even if I did get it running, I am loathe to use app_info to allow stock CUDA and optimised Astropulse to run together as I am sure the stock CUDA app will get some updates in the days and weeks ahead; as I can't get to the box while I'm at work, there would be a real prospect of missing a new stock CUDA release for a number of hours, resulting in potential unnecessary video driver crashes and unnecessary -9 aborted WUs.
5) The BOINC client science application handling and resource allocation does not appear to be flexible enough to allow what would, for me and I am sure others, be the ideal combination of applications: a stock CUDA application, an optimised Enhanced application to process some units on spare CPU threads, and an optimised Astropulse application.

The conclusions led me to decide that CUDA was not ready for prime time - I have to wonder if there was some pressure from nVidia PR which led to the premature release - and I have reverted the Monolith to "traditional" AK optimised applications until the CUDA implementation stabilises and becomes more flexible.
ID: 842116 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21247
Credit: 7,508,002
RAC: 20
United Kingdom
Message 842152 - Posted: 19 Dec 2008, 19:53:46 UTC - in response to Message 842116.  

... The conclusions led me to decide that CUDA was not ready for prime time - I have to wonder if there was some pressure from nVidia PR which led to the premature release...

Possibly.

However, the present exposure should promote some rapid development and fixes. Note that s@h on Boinc should really be considered as 'experimental'. We were warned (on these forums at least) that the CUDA stuff was very new and 'exciting' (in all ways)!

Hang in there,

Happy fast crunchin',
Martin


See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 842152 · Report as offensive
Profile Crunch3r
Volunteer tester
Avatar

Send message
Joined: 15 Apr 99
Posts: 1546
Credit: 3,438,823
RAC: 0
Germany
Message 842159 - Posted: 19 Dec 2008, 20:06:56 UTC - in response to Message 842116.  


The conclusions led me to decide that CUDA was not ready for prime time - I have to wonder if there was some pressure from nVidia PR which led to the premature release -


more likely it's a combination of Nvidias PR and the donation drive to attract as much users that fall for the gpu hype and squeeze money out of them ...


Join BOINC United now!
ID: 842159 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Summarize Cuda?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.