CUDA MB V12b rebuild supposed to work with Fermi GPUs


log in

Advanced search

Message boards : Number crunching : CUDA MB V12b rebuild supposed to work with Fermi GPUs

1 · 2 · 3 · 4 . . . 12 · Next
Author Message
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,240,518
RAC: 7,411
Russia
Message 989916 - Posted: 17 Apr 2010, 21:14:49 UTC
Last modified: 17 Apr 2010, 21:47:19 UTC

Here http://files.mail.ru/VH0VC2 you can download V12b rebuild against CUDA 3.0 SDK.
IT should work with older GPUs too (but don't expect speed increase, at least big one) and supposed to work with Fermi GPUs. But last needs check. I have no access to Fermi GPU so if someone will use these binaries on Fermi, please, post your observations here.

BTW, note that 196 or higher driver required to work with this build

Highlander
Avatar
Send message
Joined: 5 Oct 99
Posts: 143
Credit: 31,058,002
RAC: 8
Germany
Message 989947 - Posted: 17 Apr 2010, 23:03:53 UTC
Last modified: 17 Apr 2010, 23:04:39 UTC

I installed the V12b for testing, no negatives/positives to say here about on my

GTX 260-216 / NVidia 197.13

running at stock speed.

This is the first pending WU:

http://setiathome.berkeley.edu/workunit.php?wuid=598972611
http://setiathome.berkeley.edu/result.php?resultid=1577896960

Hope validation is also no problem.
____________

Profile Todd Hebert
Volunteer tester
Avatar
Send message
Joined: 16 Jun 00
Posts: 647
Credit: 217,127,962
RAC: 0
United States
Message 989974 - Posted: 18 Apr 2010, 0:45:52 UTC

Good day,
Any assistance for direction on how to install this new build would be great as I have three idle GTX-480's in my Skulltrail system. They really want to be worked :)
Todd
____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4955
Credit: 72,933,478
RAC: 12,348
Australia
Message 989982 - Posted: 18 Apr 2010, 1:23:17 UTC
Last modified: 18 Apr 2010, 1:29:54 UTC

OK @ All watching this thread,
A couple of things worked out in very limited internal testing with shortened test WUs so far,
Firstly, the Cuda 3.0 build results don't 'seem to' be stable on any driver less than the latest current WHQL driver for the card being run on, i.e. 197.45 for cards pre-fermi (light tests show OK), and likely (but untested) 197.41 for the new fermi cards. 197.13 will probably not suffice in either case (going from the behaviour I've seen, with this driver results may be intermittently different from wingman without any obvious outward sign of a problem.)

Secondly, wet blanket time. Since this is really a pre-alpha test/proof of concept build to identify if the drivers/sdk/hardware and application code even work at all, if you aren't particularly comfortable with the methods of manually installing an app by placing files in the correct locations & modifying the app_info.xml by hand appropriately, decide for yourself if want to try, but be sure you know what you are looking for/doing, as you could potentially dispose of a great number of tasks quite quickly (if it doesn't work quite right, which is highly possible, even probable).

If you are game, and can monitor what happens, then looking at an existing app_info.xml for V12 installation and changing the entries to the appropriate ones in the download should be fairly straightforward.

Please resist the urge to post app_info xml files here, because on more than one occasion this has resulted in novice users trashing many tasks through boinc incompatible clipboard pasting (from browsers).

Thanks,
Jason
P.S these are my opinions & recommendations, to be taken with whatever sized grain of salt you choose.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Todd Hebert
Volunteer tester
Avatar
Send message
Joined: 16 Jun 00
Posts: 647
Credit: 217,127,962
RAC: 0
United States
Message 989988 - Posted: 18 Apr 2010, 1:54:32 UTC

I'm more than game to change the appinfo.xml file but I don't exactly know what to add. If someone want to PM more detailed instructions that would be great so that as the previous user stated a bunch of WU's don't get trashed.

I am happy to monitor my single system with the Fermi cards and will report my experience and findings so that other users can benefit going forward. It doesn't appear that many people have these cards as of yet and I would love to get them going.

Thanks!
____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4955
Credit: 72,933,478
RAC: 12,348
Australia
Message 989989 - Posted: 18 Apr 2010, 1:56:03 UTC - in response to Message 989988.

I'm more than game to change the appinfo.xml file but I don't exactly know what to add. If someone want to PM more detailed instructions that would be great so that as the previous user stated a bunch of WU's don't get trashed.

I am happy to monitor my single system with the Fermi cards and will report my experience and findings so that other users can benefit going forward. It doesn't appear that many people have these cards as of yet and I would love to get them going.

Thanks!


Great! .. will PM more detailed instructions shortly.

Jason
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Todd Hebert
Volunteer tester
Avatar
Send message
Joined: 16 Jun 00
Posts: 647
Credit: 217,127,962
RAC: 0
United States
Message 989991 - Posted: 18 Apr 2010, 2:04:33 UTC

Thanks Jason!
I'll wait for them since I am at my office and can jump on making the correct changes and see where things are it.
Thanks again!
____________

Profile SciManStevProject donor
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4818
Credit: 80,608,100
RAC: 35,478
United States
Message 989997 - Posted: 18 Apr 2010, 2:18:39 UTC - in response to Message 989989.

I'm more than game to change the appinfo.xml file but I don't exactly know what to add. If someone want to PM more detailed instructions that would be great so that as the previous user stated a bunch of WU's don't get trashed.

I am happy to monitor my single system with the Fermi cards and will report my experience and findings so that other users can benefit going forward. It doesn't appear that many people have these cards as of yet and I would love to get them going.

Thanks!


Great! .. will PM more detailed instructions shortly.

Jason


Thank you from me too!

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,240,518
RAC: 7,411
Russia
Message 990034 - Posted: 18 Apr 2010, 6:14:06 UTC
Last modified: 18 Apr 2010, 6:18:04 UTC

For multi-GPU Fermi-based hosts maybe this x4 build can improve speed: http://files.mail.ru/PXERAL
[Another possible optimisation is to dedicate complete CPU core to each GPU. What will works better worth to check in each case]

Profile Todd Hebert
Volunteer tester
Avatar
Send message
Joined: 16 Jun 00
Posts: 647
Credit: 217,127,962
RAC: 0
United States
Message 990050 - Posted: 18 Apr 2010, 7:36:09 UTC - in response to Message 990034.

I have allocated one CPU core to each card at the suggestion of Jason so we will see what the outcome of that is from here. I also have installed the x4 code that was posted but thus far I am seeing some failures with the WU's completing in 11-12 seconds - this was what was happening before too until I got the app_info.xml file squared away. I did make the changes to reference the _x4 file and it is listed in the taskman as being active. Any other changes that I should make?
Thanks!
Todd
____________

Profile Todd Hebert
Volunteer tester
Avatar
Send message
Joined: 16 Jun 00
Posts: 647
Credit: 217,127,962
RAC: 0
United States
Message 990052 - Posted: 18 Apr 2010, 7:45:10 UTC - in response to Message 990050.

I went back to the original Fermi file and again changed the app_info.xml but now I am seeing all WU's fail - it was doing the same with the x4 after I let it run for awhile. For now I have it suspended to not run through my cache.

Any suggestions?
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,240,518
RAC: 7,411
Russia
Message 990053 - Posted: 18 Apr 2010, 7:55:22 UTC - in response to Message 990052.

Will what error was, you could reboot your host meanwhile.

Profile Todd Hebert
Volunteer tester
Avatar
Send message
Joined: 16 Jun 00
Posts: 647
Credit: 217,127,962
RAC: 0
United States
Message 990054 - Posted: 18 Apr 2010, 7:57:47 UTC - in response to Message 990053.

I did reboot the machine following the two changes of the app_info file to reference the changes. Still no change in behavior.
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,240,518
RAC: 7,411
Russia
Message 990055 - Posted: 18 Apr 2010, 8:04:14 UTC - in response to Message 990054.
Last modified: 18 Apr 2010, 8:06:48 UTC

I see many -9 overflows in your results.
Not sure it can be connected with x4 versuion in any way (cause it changes app's CPU-related behavior, not GPU one), but lets restore status quo:
1) stop BOINC.
2) place all 3 files from initial archive in SETI project folder (FERMI version w/o x4 suffix)
3) edit app_info.xml to reference corresponding binary.
4) reboot host.

EDIT: in general, -9 error means GPU memory buffer corrupted with some random/invalid data. Then app starts to find many signals in that junk and ends task prematurely. Though it's possible to find too many signals in usual task too, so high rate of -9 results definitely tells about GPU memory buffer corruption.

Profile Todd Hebert
Volunteer tester
Avatar
Send message
Joined: 16 Jun 00
Posts: 647
Credit: 217,127,962
RAC: 0
United States
Message 990056 - Posted: 18 Apr 2010, 8:05:58 UTC - in response to Message 990055.

Very well - doing that now.
____________

Profile Todd Hebert
Volunteer tester
Avatar
Send message
Joined: 16 Jun 00
Posts: 647
Credit: 217,127,962
RAC: 0
United States
Message 990059 - Posted: 18 Apr 2010, 8:27:10 UTC

I made sure that everything was changed back - the three files have all been overwritten - the appinfo references the original files and the system has been rebooted. I just reported six WU's that completed in 12 seconds - so something is a miss.

Here is the appinfo file

<app_info>
<app>
<name>setiathome_enhanced</name>
</app>
<file_info>
<name>MB_6.09_CUDA_V12b_FERMI.exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart32_30_14.dll</name>
<executable/>
</file_info>
<file_info>
<name>cufft32_30_14.dll</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-1-1a_upx.dll</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>608</version_num>
<plan_class>cuda</plan_class>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>MB_6.09_CUDA_V12b_FERMI.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart32_30_14.dll</file_name>
</file_ref>
<file_ref>
<file_name>cufft32_30_14.dll</file_name>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-1-1a_upx.dll</file_name>
</file_ref>
</app_version>
</app_info>

Todd
____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4955
Credit: 72,933,478
RAC: 12,348
Australia
Message 990062 - Posted: 18 Apr 2010, 8:41:05 UTC
Last modified: 18 Apr 2010, 8:43:52 UTC

Just been through the reams of results. All the successfully completed ones appear to be very high angle range ('Shorties') which leads me to believe something's cranky in the pulsefinding code ( Which has given us heaps of sleepless nights already ... LoL ).

@Raistmer: As a last resort, in case it's needed, I'll repost my V13 experimental one in case avoiding long pulsefinds helps (In our original private Lunatics thread). Doubtful, but I suppose possible, so if you could rehost for Todd if needed, I'll be grateful to eliminate another possibility. I fear physical constraints of Todd's machine might be outweighing 32 bit build capacity, but am not certain exactly how the WoW64/CudaRT32/3 x 1.5Gig cards/4 gig address space, half being system is supposed to work ;)

reposting nack in original place & going to sleep :P
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Todd Hebert
Volunteer tester
Avatar
Send message
Joined: 16 Jun 00
Posts: 647
Credit: 217,127,962
RAC: 0
United States
Message 990063 - Posted: 18 Apr 2010, 8:42:27 UTC - in response to Message 990059.

Sorry - I didn't realize that the formatting wouldn't come through properly - the structure of the file is proper however.
Todd
____________

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 357,745
RAC: 38
Germany
Message 990071 - Posted: 18 Apr 2010, 9:08:21 UTC - in response to Message 990063.

When using the Quote button on the post with app_info.xml, one can see the structure. To make it visible in the post too, you'll have to use the [pre][/pre] BBCode tags around the code.

Gruß,
Gundolf
____________
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,240,518
RAC: 7,411
Russia
Message 990083 - Posted: 18 Apr 2010, 9:59:08 UTC - in response to Message 990063.
Last modified: 18 Apr 2010, 10:14:31 UTC

Well, AR 0.44, midrange one, that usually the best for GPU, fails with -9 on 480 and Lunatics site down (at least I can't reach it).
Lets try another trick for now then.
Try to disable all GPUs but one (it can be done either via BOINC settings, physical removal other GPUs or by suspending all CUDA MB tasks but one).
Will single GPU produce same error on these tasks?..

EDIT: also, try to update to 197.45 driver if it's available for your GPU.

EDIT2:
Rather high AR tasks fail too, for example
WU true angle range is : 1.373955
met invalid overflow.

1 · 2 · 3 · 4 . . . 12 · Next

Message boards : Number crunching : CUDA MB V12b rebuild supposed to work with Fermi GPUs

Copyright © 2014 University of California