Someone, need some cuda help here

Questions and Answers : GPU applications : Someone, need some cuda help here
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile SlyWolf
Volunteer tester
Avatar

Send message
Joined: 9 Jan 03
Posts: 17
Credit: 1,660,449
RAC: 0
United States
Message 844141 - Posted: 23 Dec 2008, 12:38:29 UTC

Okay a few questions...
With the cuda active, it only allows 1 wu to run at a time, I have a dual core cpu and dual video cards running in sli. I've also tried it with the sli disabled, and no change.
Second, the timer is really really wrong. My first wu with cuda said it took around 4 minutes to complete.. when in reality it was more like 1 - 1 1/2 hours. It's doing the same thing with the next wu also.
Third, It seemed so much faster without the cuda running. The dual cores could kick out 2 wu in about an 1 - 2 hours. Now that it's only letting me run one at a time, and taking longer for that one, I don't get the hype.
Can someone help me understand this, is their a seting, a bug, a glich, anything?
~SlyWolf
ID: 844141 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 844147 - Posted: 23 Dec 2008, 12:53:28 UTC - in response to Message 844141.  

We have a whole CUDA forum, but you prefer the Windows forum... figures.

With the cuda active, it only allows 1 wu to run at a time, I have a dual core cpu and dual video cards running in sli. I've also tried it with the sli disabled, and no change.

How many cards are detected by BOINC when you disable or enable SLI?
Did you restart BOINC after disabling SLI?

For now Seti can only be crunched either on the GPU (with small use of one CPU) or on your CPUs. It can only do both at the same time, if you allow Astropulse and Seti Enhanced work. AP will then run on the CPUs, SE on the GPU. It can't do SE on and the CPU and on the GPU at the same time.

Then again, there's problems getting the work out at a fast enough pace, so that may account to why you're without work at times.

Second, the timer is really really wrong. My first wu with cuda said it took around 4 minutes to complete.. when in reality it was more like 1 - 1 1/2 hours. It's doing the same thing with the next wu also.

That's CPU time that you see there, not the GPU time. BOINC can't count GPU time (yet). With each GPU task about 4% of the CPU is used, so the time it shows as running is flawed.

Third, It seemed so much faster without the cuda running. The dual cores could kick out 2 wu in about an 1 - 2 hours.

That was probably depending on the tasks. Seti has ultra-short to ultra-long tasks, with a lot of possibilities in between. Wait until the other computer returns the tasks to be able to compare.
ID: 844147 · Report as offensive
Profile SlyWolf
Volunteer tester
Avatar

Send message
Joined: 9 Jan 03
Posts: 17
Credit: 1,660,449
RAC: 0
United States
Message 844153 - Posted: 23 Dec 2008, 13:03:38 UTC - in response to Message 844147.  
Last modified: 23 Dec 2008, 13:03:57 UTC

First of all, soorrryy for posting on the wrong forum, i've only been up for 27 hours taking care of a family member whom had surgery.

It only detects 1 card. Doesn't matter if sli is enabled or disabled, and yes reboot is required. Shows 1 cuda device found.

I'm also judging the time by the percentage shown, and no, not faster. Could be the batch of wu's, this is possible.

Okay, good to know the timer is somewhat flawed.
~SlyWolf
ID: 844153 · Report as offensive
Robert P. Herbst
Volunteer tester
Avatar

Send message
Joined: 10 Jun 03
Posts: 45
Credit: 64,523,408
RAC: 142
United States
Message 844154 - Posted: 23 Dec 2008, 13:04:12 UTC - in response to Message 844141.  

If you start getting a message that your NVLDDMKM driver has quit and recovered and you are using and EVGY GT graphics card. Open your computer and check the graphics card for ruptured capacitors. I run six multi core computers and I have had to shut down BOINC on three of them that had the 8 and 9 series GeFORCE GT cards after switching over to the new software.
Other than that I have also noticed that the WU is posted as saying it will take several thousand hours to complete, but usually takes less than 100 hours, according to the computer.
Please Visit Mount Perry, Florida
Home to Florida's Only Snow Capped Mountain
www.mountperryfl.com
ID: 844154 · Report as offensive
Robert P. Herbst
Volunteer tester
Avatar

Send message
Joined: 10 Jun 03
Posts: 45
Credit: 64,523,408
RAC: 142
United States
Message 844155 - Posted: 23 Dec 2008, 13:09:51 UTC - in response to Message 844147.  

BOINC/SETI have stopped sending out any new work I suspect because of this problem and I have had to remove BOINC from three of my six Multi Core computers. One of which had no SLI at all. You need to check your graphic cards for blown or ruptured capacitors as once the graphics begin to fail the card degenerates to complete failure rapidly.
Please Visit Mount Perry, Florida
Home to Florida's Only Snow Capped Mountain
www.mountperryfl.com
ID: 844155 · Report as offensive
Profile SlyWolf
Volunteer tester
Avatar

Send message
Joined: 9 Jan 03
Posts: 17
Credit: 1,660,449
RAC: 0
United States
Message 844157 - Posted: 23 Dec 2008, 13:10:31 UTC - in response to Message 844154.  

If you start getting a message that your NVLDDMKM driver has quit and recovered and you are using and EVGY GT graphics card. Open your computer and check the graphics card for ruptured capacitors. I run six multi core computers and I have had to shut down BOINC on three of them that had the 8 and 9 series GeFORCE GT cards after switching over to the new software.
Other than that I have also noticed that the WU is posted as saying it will take several thousand hours to complete, but usually takes less than 100 hours, according to the computer.


Good god I hope not, $3,000 laptop. It has Nvidia GF8700m Gt's x 2. Hear of any problems with those? Otherwise the cuda driver's getting the boot real fast.

~SlyWolf
ID: 844157 · Report as offensive
Profile SlyWolf
Volunteer tester
Avatar

Send message
Joined: 9 Jan 03
Posts: 17
Credit: 1,660,449
RAC: 0
United States
Message 844158 - Posted: 23 Dec 2008, 13:16:27 UTC - in response to Message 844155.  

And yeah, I did have that error with the new driver about an hour ago. I don't THINK I was running Boinc....
~SlyWolf
ID: 844158 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 844161 - Posted: 23 Dec 2008, 13:29:38 UTC - in response to Message 844157.  

It has Nvidia GF8700m Gt's x 2.

Ah, I was already about to ask if they were physical cards or embedded GPUS. That answered that, they're embedded.

When in SLI, does the OS report two cards? DO games? Does GPUZ?

I see you're using Vista. When you installed BOINC, you did make sure it installed not as a service (Protected Application Execution was off)?
ID: 844161 · Report as offensive
Profile SlyWolf
Volunteer tester
Avatar

Send message
Joined: 9 Jan 03
Posts: 17
Credit: 1,660,449
RAC: 0
United States
Message 844164 - Posted: 23 Dec 2008, 13:33:59 UTC - in response to Message 844161.  

It has Nvidia GF8700m Gt's x 2.

Ah, I was already about to ask if they were physical cards or embedded GPUS. That answered that, they're embedded.

When in SLI, does the OS report two cards? DO games? Does GPUZ?

I see you're using Vista. When you installed BOINC, you did make sure it installed not as a service (Protected Application Execution was off)?


Their supposed to be stand alone cards. All the programs read as Sli.
I made sure it wasn't installed as a service. Had to reload it the other day. Now I'm freaking out and trying to very very carfully open this thing up to look at the cards... not sure the warrenty will cover this event or not.
~SlyWolf
ID: 844164 · Report as offensive
Profile SlyWolf
Volunteer tester
Avatar

Send message
Joined: 9 Jan 03
Posts: 17
Credit: 1,660,449
RAC: 0
United States
Message 844173 - Posted: 23 Dec 2008, 13:43:39 UTC

Okay got it open, if anyone could see anything capacitors on these cards they way their installed I would be amazed. Can't see anything with this config. Except a lot more dust on the dual fans then I thought their would be after only 4 months.
~SlyWolf
ID: 844173 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 844174 - Posted: 23 Dec 2008, 13:44:26 UTC - in response to Message 844164.  

not sure the warrenty will cover this event or not.

Then do not open it up, but leave it to someone qualified. The screws are usually put in place with some goo that gets rubbed off when opened. A qualified person can then see that you tampered with it.

The error you got can as well have been a driver side-effect as more people have reported it and I doubt there's a whole influx of bad capacitors out there all of a sudden. This happened in the past, but since that time Taiwan has checks in place. It's possible the EVGA cards Robert talks about have just a bad batch of them. No need to panic.

I have reported that error to the developers already.
ID: 844174 · Report as offensive
Profile SlyWolf
Volunteer tester
Avatar

Send message
Joined: 9 Jan 03
Posts: 17
Credit: 1,660,449
RAC: 0
United States
Message 844179 - Posted: 23 Dec 2008, 13:50:03 UTC - in response to Message 844174.  

Thank god, don't have to worry about the opening voiding the warrenty though, to change memory or swap drives you have to take the whole underside of the computer off anyway, and instructions are in the book for it, vauge, but in their so no worries their.
Just to be on the safe side though, I've removed the driver and went back to the original one.
Nice to know what the thing has under the hood though lol.
Thanks though Jord, Hopefully your right and it's just the driver doing it, it did it once and hasn't since. I haven't noticed any games going crazy so I think I'm clear.
~SlyWolf
ID: 844179 · Report as offensive
Profile SlyWolf
Volunteer tester
Avatar

Send message
Joined: 9 Jan 03
Posts: 17
Credit: 1,660,449
RAC: 0
United States
Message 844180 - Posted: 23 Dec 2008, 13:52:32 UTC

I build my own computers normaly, but the laptop was a special treat after the floods here in Iowa, myself and my family lost most everything we owned, so I figured I'd spoil myself and buy a computer instead of building another one.
~SlyWolf
ID: 844180 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 844181 - Posted: 23 Dec 2008, 13:54:44 UTC - in response to Message 844180.  

Sorry to hear. But at least a laptop is easier to take along when running for a flood... not that it's as important as photos and papers... or kids. :-)
ID: 844181 · Report as offensive
Profile SlyWolf
Volunteer tester
Avatar

Send message
Joined: 9 Jan 03
Posts: 17
Credit: 1,660,449
RAC: 0
United States
Message 844186 - Posted: 23 Dec 2008, 14:00:28 UTC - in response to Message 844181.  

Sorry to hear. But at least a laptop is easier to take along when running for a flood... not that it's as important as photos and papers... or kids. :-)


Very Very True :) and the 5 dogs, 4 cats, 1 rabbit, 4 turtles :)

Luckly the very heavy desktop was on the second floor, wouldn't have floated anywhere, but I doubt that kind of water would have been acceptable for a watercooler :)
~SlyWolf
ID: 844186 · Report as offensive
Robert P. Herbst
Volunteer tester
Avatar

Send message
Joined: 10 Jun 03
Posts: 45
Credit: 64,523,408
RAC: 142
United States
Message 844192 - Posted: 23 Dec 2008, 14:10:26 UTC - in response to Message 844174.  

The blown capacitors are not a new problem. I've already had to replace four of these graphics cards because of them rupturing over the past year. Now I sit here looking at another card with blown capacitors. If they have already replaced four cards for this reason, doesn't it make sense they would know there is a problem?
The series of events goes as follows. First the graphics start doing flaky things like flicking on and off while you are working a program, then you start getting the message, "The NVLDDMKM driver has shut down and restarted." A short time later the video degenerates to a multi colored snow storm, but if you reboot it goes away for a time. Then you get no video at all, just a black screen. The card is dead.
The combination of BOINC version 5.6.0 for VISTA 64 bit and the 180.48_VISTA 64 GeFORCE update seems to bring the whole thing to a head and the graphics card fails rapidly.
Please Visit Mount Perry, Florida
Home to Florida's Only Snow Capped Mountain
www.mountperryfl.com
ID: 844192 · Report as offensive
Profile SlyWolf
Volunteer tester
Avatar

Send message
Joined: 9 Jan 03
Posts: 17
Credit: 1,660,449
RAC: 0
United States
Message 844198 - Posted: 23 Dec 2008, 14:23:27 UTC - in response to Message 844192.  

Well the drivers now gone, and my nvidia control pannel is on the fritz, so attempting a system restore to see if that fixes the problem. I still haven't seen the error message pop back up
~SlyWolf
ID: 844198 · Report as offensive
www.WowTattoos.com (≈ #275/1.1mil Users In Average Credit)

Send message
Joined: 16 Nov 00
Posts: 2
Credit: 16,811,496
RAC: 0
United States
Message 844958 - Posted: 25 Dec 2008, 11:06:18 UTC

Alright, well, in light of all the CUDA problems a LOT of people are having, I'm ditching CUDA for now and going back to straight CPU work units. I'm getting the same nvlddmkm failure errors about 1 of every 3-4 WU. And I'm quite frankly a little worried about the continued stress of running 100% workload on my 8800 GTX for hours on end. The fan is running like a leaf blower for hours at a time. I don't think that the extra work units a day isn't worth the continued stress on my GPU. Let me know your thoughts.
ID: 844958 · Report as offensive

Questions and Answers : GPU applications : Someone, need some cuda help here


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.