s@h on GPU CUDA now?!

Message boards : Number crunching : s@h on GPU CUDA now?!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 850340 - Posted: 7 Jan 2009, 2:16:09 UTC - in response to Message 850330.  
Last modified: 7 Jan 2009, 2:20:22 UTC

can I fix it somehow?

You could try a different driver for the card. That has been known to help others with similar problems. It's kind of a hit and miss with that though. What version are you using now and which ones have you tried before?

Also there's a method in this thread that you can use to keep false results from making it to the server.
ID: 850340 · Report as offensive
Zoran Kirsic

Send message
Joined: 22 May 99
Posts: 34
Credit: 102,258
RAC: 0
Croatia
Message 850354 - Posted: 7 Jan 2009, 2:40:50 UTC

i have nvdia drivers 178.24.. but I wondering if this is a problem of SETI or me? because this is my first results in cuda processing, so maybe I done something wrong!?
ID: 850354 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 850358 - Posted: 7 Jan 2009, 2:50:47 UTC - in response to Message 850354.  
Last modified: 7 Jan 2009, 2:52:45 UTC

i have nvdia drivers 178.24.. but I wondering if this is a problem of SETI or me? because this is my first results in cuda processing, so maybe I done something wrong!?

It's proably not because you've done something wrong, but more likely either the app or the driver. You could try doing some more tasks and see if you still keep getting similar errors, though from my own experience I'd try a Beta drvier like one of the 180.xx and see what results come from that.

Wether you continue using the driver you have, or go to a different one, it would be helpfull to stop netowrk communiation and follow the instructions mentioned in the post of my previous message, since there's a good chance you'll continue getting false results.
ID: 850358 · Report as offensive
Zoran Kirsic

Send message
Joined: 22 May 99
Posts: 34
Credit: 102,258
RAC: 0
Croatia
Message 850360 - Posted: 7 Jan 2009, 2:54:01 UTC - in response to Message 850358.  

OK. Thanks. I try with beta drivers. i let you know.
ID: 850360 · Report as offensive
Harper101
Volunteer tester

Send message
Joined: 13 Jan 08
Posts: 3
Credit: 11,658,919
RAC: 0
United Kingdom
Message 850782 - Posted: 8 Jan 2009, 9:07:44 UTC - in response to Message 850360.  

Grrr.. they bring out a CUDA version of Seti 3 months after I upgrade my pc.. were I changed my GPU from Nvidia to ATi!!
Anyone know if they are working on a version that I can run on my ATi card? (I know the Folding@home guys have versions availible for ATi and Nividia.. though I realise they have a massive amount more funding than my beloved Seti@home!)
ID: 850782 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 850848 - Posted: 8 Jan 2009, 14:27:09 UTC - in response to Message 850782.  

Grrr.. they bring out a CUDA version of Seti 3 months after I upgrade my pc.. were I changed my GPU from Nvidia to ATi!!
Anyone know if they are working on a version that I can run on my ATi card? (I know the Folding@home guys have versions availible for ATi and Nividia.. though I realise they have a massive amount more funding than my beloved Seti@home!)

As far as I know, there are just rumors that ATi has said they are interested in doing something CUDA-like, but the problem is that CUDA is nvidia-specific. What would be smart for everyone to push for is something like OpenCL. This can make it so you program in one unified and consistent language and it works on different "makes and models" of hardware.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 850848 · Report as offensive
Zoran Kirsic

Send message
Joined: 22 May 99
Posts: 34
Credit: 102,258
RAC: 0
Croatia
Message 851490 - Posted: 10 Jan 2009, 1:09:01 UTC - in response to Message 850358.  

i have nvdia drivers 178.24.. but I wondering if this is a problem of SETI or me? because this is my first results in cuda processing, so maybe I done something wrong!?

It's probably not because you've done something wrong, but more likely either the app or the driver. You could try doing some more tasks and see if you still keep getting similar errors, though from my own experience I'd try a Beta driver like one of the 180.xx and see what results come from that.

Wether you continue using the driver you have, or go to a different one, it would be helpful to stop network communication and follow the instructions mentioned in the post of my previous message, since there's a good chance you'll continue getting false results.



I try the new driver (180.84), but still doesn't work. On 31 sec. screen flickers, and the CPU time stop going on. I had a answer from Raistmer: "There is bug in 6.06 sources so both stock and my builds will have this error overflows time to time. Debugging going.."
I went processing WUs on AK_v8_win_SSE3.exe, for now. Thx
ID: 851490 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 851523 - Posted: 10 Jan 2009, 2:10:25 UTC - in response to Message 851490.  

Have you tried this?
<cc_config>
<options>
<ncpus>5</ncpus>
</options>
</cc_config>


create in notepd, save as all files named cc_config.xml to C:\Documents and Settings\All Users\Application Data\BOINC folder. Then in the Boinc Manager Advanced menu click on Read config file

I was having the flickering problem and getting the messages you were but this seemed to fix it. I had to change the number 5 to 3 since I only have a C2D. I had left it at 5 at first and was running 5 WUs at a time but slowly. :)


PROUD MEMBER OF Team Starfire World BOINC
ID: 851523 · Report as offensive
Zoran Kirsic

Send message
Joined: 22 May 99
Posts: 34
Credit: 102,258
RAC: 0
Croatia
Message 851770 - Posted: 10 Jan 2009, 17:04:53 UTC - in response to Message 851523.  

Have you tried this?
<cc_config>
<options>
<ncpus>5</ncpus>
</options>
</cc_config>


create in notepd, save as all files named cc_config.xml to C:\Documents and Settings\All Users\Application Data\BOINC folder. Then in the Boinc Manager Advanced menu click on Read config file

I was having the flickering problem and getting the messages you were but this seemed to fix it. I had to change the number 5 to 3 since I only have a C2D. I had left it at 5 at first and was running 5 WUs at a time but slowly. :)



My cc_config.xml:

<cc_config>
<log_flags>
</log_flags>
<options>
<client_version_check_url>http://www.worldcommunitygrid.org/download.php?xml=1</client_version_check_url>
<client_download_url>http://www.worldcommunitygrid.org/download.php</client_download_url>
<network_test_url>http://www.ibm.com/</network_test_url>
<start_delay>15</start_delay>
<ncpus>3</ncpus>
</options>
</cc_config>

I try this change in the beginning (<ncpus>3</ncpus>), but no change!
ID: 851770 · Report as offensive
Profile John

Send message
Joined: 5 Jun 99
Posts: 30
Credit: 77,663,734
RAC: 236
United States
Message 851777 - Posted: 10 Jan 2009, 17:23:08 UTC

I have changed preferences to not send any more CUDA units as there seem to be stability problems on two of my NVidia supported computers. If there is any interruption of DSL service the computer will dramatically cease to respond until some sort of timeout than I get one chance to abort before waiting for another timeout. If I restart without rebooting then there are massive artifacts in the display screen that flash on and off like Christmas lights. On the other computer I could play games and run Boinc at the same time before CUDA now the atifact thing seems to work itself in at times and once started does not stop until the computer is rebooted. This computer is for games and Boinc is secondary, so I removed the CUDA option under preferences. Will try later perhaps after the bugs are worked out. Later. John
ID: 851777 · Report as offensive
Zoran Kirsic

Send message
Joined: 22 May 99
Posts: 34
Credit: 102,258
RAC: 0
Croatia
Message 853429 - Posted: 14 Jan 2009, 18:33:44 UTC

What about this new drivers 181.20, they came out 5 day ago. Any difference or what!??
ID: 853429 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 853447 - Posted: 14 Jan 2009, 19:31:21 UTC - in response to Message 853429.  

or what... Most of the posts on CUDA are finally finding their way to the Q & A section but its still very buggy and requires a lot of TLC to take care of freeze ups, lock ups and such. clearly Cuda isnt for the inexperienced user


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 853447 · Report as offensive
Tronic

Send message
Joined: 23 Mar 03
Posts: 8
Credit: 10,599,675
RAC: 0
Chile
Message 864263 - Posted: 11 Feb 2009, 6:18:40 UTC - in response to Message 851777.  

(...) On the other computer I could play games and run Boinc at the same time before CUDA now the atifact thing seems to work itself in at times and once started does not stop until the computer is rebooted. This computer is for games and Boinc is secondary, so I removed the CUDA option under preferences. Will try later perhaps after the bugs are worked out. Later. John


Remember that by using CUDA, your GPU temperature will increase dramatically. I have two 8800GTS 640MB with watercooling in series (there is a single high performance radiator between the CPU and GPUs and a triple radiator between GPUs an CPU). The first card jumped from 49ºC to 58ºC, while the second one jumped from 49ºC to 62ºC. This increase is more than when playing Oblivion.
(By the way, my E8400 CPU cores jumped from 44ºC to 52ºC with only one astropulse process).
Remember this is with watercooling. Forced Air cooling Video cards will have a much bigger increase in temperature, not to mention passively cooled video cards.
ID: 864263 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 864402 - Posted: 11 Feb 2009, 19:52:51 UTC - in response to Message 864263.  

Remember that by using CUDA, your GPU temperature will increase dramatically. ...

That all depends on your graphics card and how good your cooling is...

I'm seeing a mere 10 deg C increase which is still well below the limits set in the nVidia software.

More of a problem is a slowdown for other graphics when Boinc-CUDA is running. We really do need a one-button suspend all Boinc CUDA for when trying to do other graphics work!

Happy crunchin',
Martin


See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 864402 · Report as offensive
angler

Send message
Joined: 19 Oct 00
Posts: 33
Credit: 880,214
RAC: 0
United States
Message 865472 - Posted: 14 Feb 2009, 20:49:47 UTC

been pretty stable but my first CUDA glitch

http://setiathome.berkeley.edu/result.php?resultid=1159831901

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
Device 1 : GeForce 9600 GT
totalGlobalMem = 536870912
sharedMemPerBlock = 16384
regsPerBlock = 8192
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 1625000
totalConstMem = 65536
major = 1
minor = 1
textureAlignment = 256
deviceOverlap = 0
multiProcessorCount = 8
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce 9600 GT is okay
SETI@home using CUDA accelerated device GeForce 9600 GT
setiathome_enhanced 6.03 Visual Studio/Microsoft C++
libboinc: 6.3.22

Work Unit Info:
...............
WU true angle range is : 0.447866
Optimal function choices:
-----------------------------------------------------
name
-----------------------------------------------------
v_BaseLineSmooth (no other)
v_GetPowerSpectrum 0.00024 0.00000
v_ChirpData 0.01821 0.00000
v_Transpose4 0.00930 0.00000
FPU opt folding 0.00460 0.00000
Cuda error 'cufftExecC2C' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_fft.cu' in line 63 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 160 : unknown error.

</stderr_txt>
]]>

ID: 865472 · Report as offensive
Zydor

Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 865536 - Posted: 14 Feb 2009, 23:10:34 UTC - in response to Message 865472.  

An 8800GTS will idle at 45C, and go to 70C under full load in its stock state, so the temps given are way below any kind of hassle - which they shood be if water cooled.

I run a 9800GTX 7x24 on one of my machines under full load, it happily purrs away at around 62C no water, well below hassle point. If I go to "normal" use not under full load it drops to around 55C.

Such rises of circa 10C are normal when a gpu is placed under full load. its not "because" of BOINC or CUDA, any demanding application or game will have the same result. What can happen is perceptions of the max allowable temp in a gpu under load are usually grossly understated - they are resilient beasts and designed for high temperatures.

I tend to find with PCs of friend's I have looked at that are behaving "wierdly", my first reaction is to put my hand on the exit fan grill, and if its warmer than a tepid cup of coffee, the cover comes off with the air can at the ready. 7 times out of 10, there's more dust inside, than in a municipal dust cart, some have so much caked on the cpu fan, it has vertually stopped because it cant get through the piled up clag! In those situations, a 10C rise can be a killer if they start using it at full load, but thats not the fault of the software or hardware, its the user not maintaining the machine. Too few realise that dust actually accumilates inside them (and quickly!) let alone go to the trouble of opening it up once every couple of months, cleaning the inside case fan filters, and using an air can on the cpu and psu etc.

Re drivers, I am using 181.22 (with 6.5.0), and have been since I returned to Seti crunching 10 days ago after an extended absence - I drifted away on the changeover to BOINC - CUDA tempted me back, meant I could use the spare capacity in the card when not gaming. Not noticed any undue hassles apart from the work/fetch routine which is definitely not right yet. I also use the "set for 10 days" work-around to get hold of CUDA WUs when I am short. I had one hassle that resulted in losing some WUs, and using the abort button, but that was my fault, not Seti's.

Frankly, I reckon the guys do a great job keeping the ship running, compared to 5/6 years ago they are serving and shovelling out a staggering volume of work on the same shoe string they have always had for finance in comparison to the results they are expected to achieve. Get's my Vote .....
ID: 865536 · Report as offensive
Tronic

Send message
Joined: 23 Mar 03
Posts: 8
Credit: 10,599,675
RAC: 0
Chile
Message 865758 - Posted: 15 Feb 2009, 14:59:37 UTC
Last modified: 15 Feb 2009, 15:00:14 UTC

My point was more intended for overclockers and passive cooled graphic cards. GPUs are designed to operate at higher temps than CPUs. My two 8800GTS are factory overclocked (by 10% and 13%) and I've been able to overclock them 20% with watercooling.

I've seen in game forums some people to argue a game has problems because they cannot overclock their GPU when playing that specific game, while they have no problem in other games. I specifically remember complaints in "The Elder Scrolls: Oblivion" forums. It turns out that the problem is not the game stability, but that the game is better compiled to use the full GPU potential, making it rise to higher temps than other games.
So if video artifacts or instability start to appear when using CUDA, I'd point to a GPU cooling problem rather than a CUDA or SETI problem.

On another matter, I'm having some CUDA problems which seem to arise when accessing the PC through windows XP "remote desktop". nVidia control panel (181.22) doesn't show any of my 2 8800GTS, so boinc doesn't see them either and CUDA reverts back to CPU processing, taking up both my CPU cores, but not pausing the astropulse workunit which stays as an active process but is hardly assigned any CPU time.

Problem seems to be resolved by exiting boinc (with stop tasks option) and restarting it. If the session was (re)started through a remote desktop, the only solution seems to be restarting the session. But in both cases this has to be done on the PC itself and not through the remote desktop. I suppose it's a problem with the nvidia driver or windows XP itself and has nothing to do with boinc.

Maybe this workunit shows the problem, which includes a manual stop and restart done by me after a remote desktop access done a couple of hours earlier:
http://setiathome.berkeley.edu/result.php?resultid=1157030215

I also had to disable SLI for boinc to recognize both GPUs and run 2 CUDA tasks simultaneously. I know that enabling SLI uses only one of the GPUs (work is not distributed between the two) because only one of the GPUs shows a temperature rise.
ID: 865758 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 865767 - Posted: 15 Feb 2009, 15:37:28 UTC - in response to Message 865758.  

The remote desktop uses it's own drivers, circumventing the NVidia drivers. That's what the problem is there. And you are right about having to disable SLI to get both GPUs crunching.


PROUD MEMBER OF Team Starfire World BOINC
ID: 865767 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 865769 - Posted: 15 Feb 2009, 15:49:17 UTC

On another matter, I'm having some CUDA problems which seem to arise when accessing the PC through windows XP "remote desktop". nVidia control panel (181.22) doesn't show any of my 2 8800GTS, so boinc doesn't see them either and CUDA reverts back to CPU processing, taking up both my CPU cores, but not pausing the astropulse workunit which stays as an active process but is hardly assigned any CPU time.

Problem seems to be resolved by exiting boinc (with stop tasks option) and restarting it. If the session was (re)started through a remote desktop, the only solution seems to be restarting the session. But in both cases this has to be done on the PC itself and not through the remote desktop. I suppose it's a problem with the nvidia driver or windows XP itself and has nothing to do with boinc.


perryjay is correct........

I had the same problems and after a lot of messing around I found a MS page that said MS remote desktop loads it's own video driver, ignoring the installed driver. I then disabled MS remote desktop and installed VNC Viewer. No problems since doing this.


Boinc....Boinc....Boinc....Boinc....
ID: 865769 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Number crunching : s@h on GPU CUDA now?!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.