@Pre-FERMI nVidia GPU users: Important warning

Message boards : Number crunching : @Pre-FERMI nVidia GPU users: Important warning
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 13 · Next

AuthorMessage
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 1597217 - Posted: 5 Nov 2014, 15:44:48 UTC

Okay. That helps a little. Is there any chance you can isolate an exact operation that is returning an incorrect result?
ID: 1597217 · Report as offensive
styxdogg

Send message
Joined: 7 Mar 01
Posts: 3
Credit: 1,540,836
RAC: 0
United States
Message 1597238 - Posted: 5 Nov 2014, 16:59:00 UTC - in response to Message 1572552.  

I am really only running lately on a school machine I wont be upgrading. I got a message from someone as well so I reverted to 337 on that machine. I am still getting invalid results on some apps according to the records I looked up.

opencl_nvidia_cc1 comes up ok:

3816687402 1613138364 6773137 4 Nov 2014, 15:01:46 UTC 5 Nov 2014, 7:05:34 UTC Completed and validated 11,223.64 10,908.10 585.17 AstroPulse v7 v7.05 (opencl_nvidia_cc1)

but cuda23 is still being pulled and fails:

3817796369 1632271283 6773137 5 Nov 2014, 14:42:50 UTC 5 Nov 2014, 15:06:52 UTC Error while computing 0.00 0.00 --- SETI@home v7 v7.00 (cuda23)

I went to my "Separate preferences for school" and set:
SETI@home v7: no

is this the right thing to do? any other suggestions? thanks.
ID: 1597238 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 1597264 - Posted: 5 Nov 2014, 17:54:17 UTC
Last modified: 5 Nov 2014, 17:54:49 UTC

Raistmer,

Can you please relay that those 4 SDK examples are failing, to the NVIDIA team? You could copy/paste what I wrote (so they will have my GPU make, driver version info, OS version, etc.)

Additionally, I am considering logging my own request with them, to get it fixed. I have a high suspicion that whatever is causing those SDK examples to fail, is also responsible for making your application fail.
ID: 1597264 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1597278 - Posted: 5 Nov 2014, 18:24:12 UTC - in response to Message 1597264.  

Raistmer,

Can you please relay that those 4 SDK examples are failing, to the NVIDIA team? You could copy/paste what I wrote (so they will have my GPU make, driver version info, OS version, etc.)

Additionally, I am considering logging my own request with them, to get it fixed. I have a high suspicion that whatever is causing those SDK examples to fail, is also responsible for making your application fail.

I've posted the first three samples in the development area at Lunatics, from a confirmation run on my 9800GT with drivers 337.88 (test apps ran successfully) and 340.52 (tests failed). I'd completed that run and dismounted the card again before I saw that you'd found a fourth example, so I'll complete the set later when the test machine is free again.

I think that the best approach - as I've also written at Lunatics - would be to concentrate for the time being on convincing the ODE driver team that a fix is necessary: these test failures with NVidia's own demonstration suite are a very potent weapon in that battle. Any overlap we can identify between the functions implicated in the test suite failures, and the failure of Raistmer's Astropulse application, will be of most use in phase two, when we pressure NVidia to replicate the ODE fixes into consumer drivers.

BTW, Raistmer works during the week (in the sense of earning a living), and I don't think he's seen my report yet.
ID: 1597278 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1597285 - Posted: 5 Nov 2014, 18:57:56 UTC - in response to Message 1597217.  
Last modified: 5 Nov 2014, 19:15:06 UTC

Okay. That helps a little. Is there any chance you can isolate an exact operation that is returning an incorrect result?

Not in any priority for now.

Guys, seems you don't understand how nVidia treats
1) OpenCL
2) old pre-FERMI cards.

EDIT: try to read these forums a little: https://devtalk.nvidia.com/

With first goal - try to find OpenCL subforum there...
ID: 1597285 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1597301 - Posted: 5 Nov 2014, 19:43:16 UTC - in response to Message 1597172.  

Thomas Arnold wrote:
Hello, I need your insight and help.
I am using this Video card, NVIDIA GeForce GTX 260 (896MB) driver: 311.06 OpenCL: 1.0

In the past we have never had a problem but now we are receiving
Computation error running seti@homev77.00 (cuda22)

We are not too familiar with much of the program but we support the efforts to run the data sets. Can you please tell me if we need to change something with our setup or will these errors clear themselves or just continue to build up in the task tab?

The driver is old enough so it doesn't have the issue which started this thread. I don't know why all SETI@home v7 7.00 windows_intelx86 (cuda22) and (cuda23) tasks are failing on your host 6648399, but it does very well on (cuda32), (cuda42), and (cuda50). Perhaps one of the CUDA experts here can figure out why the servers aren't sending tasks for the plan classes which work well.
                                                                   Joe
ID: 1597301 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1597374 - Posted: 5 Nov 2014, 21:38:15 UTC - in response to Message 1597301.  
Last modified: 5 Nov 2014, 21:40:56 UTC

Thomas Arnold wrote:
Hello, I need your insight and help.
I am using this Video card, NVIDIA GeForce GTX 260 (896MB) driver: 311.06 OpenCL: 1.0

In the past we have never had a problem but now we are receiving
Computation error running seti@homev77.00 (cuda22)

We are not too familiar with much of the program but we support the efforts to run the data sets. Can you please tell me if we need to change something with our setup or will these errors clear themselves or just continue to build up in the task tab?

The driver is old enough so it doesn't have the issue which started this thread. I don't know why all SETI@home v7 7.00 windows_intelx86 (cuda22) and (cuda23) tasks are failing on your host 6648399, but it does very well on (cuda32), (cuda42), and (cuda50). Perhaps one of the CUDA experts here can figure out why the servers aren't sending tasks for the plan classes which work well.
                                                                   Joe


Will likely be digging out the scheduler code again on the weekend, if someone doesn't beat me to it. No accumulated data for the app versions, plus a logic hole with respect to systematically issuing to all app versions, ignoring the error count & quota, seems to be along the lines of what's happening. [I'll need to start by looking if that server code's been changed since a couple of months ago]

For the host side, FWIW the application (2.2 & 2.3 planclasses) appears to not even be making it to device initialisation. That would seem to me the DLLs are somehow damaged, or the driver install has gone awry. I'd imagine a clean driver install (of a suitable known good older version for this GPU) and a project reset may be in order.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1597374 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1597376 - Posted: 5 Nov 2014, 21:41:56 UTC - in response to Message 1597374.  
Last modified: 5 Nov 2014, 21:44:36 UTC

For the host side, FWIW the application (2.2 & 2.3 planclasses) appears to not even be making it to device initialisation. That would seem to me the DLLs are somehow damaged, or the driver install has gone awry. I'd imagine a clean driver install (of a suitable known good older version for this GPU) and a project reset may be in order.

I was thinking of a project detach/reattach might be in order, a project reset doesn't clear out files not mentioned in the client_state.xml, in case the setienhanced cuda22 and cuda23 dll's are still hanging around. (Eric renamed them for Seti v7 release)

Claggy
ID: 1597376 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 1597385 - Posted: 5 Nov 2014, 22:00:39 UTC
Last modified: 5 Nov 2014, 22:03:00 UTC

For what it's worth, I have filed my official bug report of it. The full details are below.

Basically, I can reproduce the SDK example problems and the Astropulse problem, on any R340 driver for my GPU. And when I use R337, it works fine. Hopefully they fix it so that the OpenCL SDK examples work correctly, and then hopefully that makes your app work correctly too :)

Regards,
Jacob

==================================================================
Filed Bug #1574543
https://developer.nvidia.com/nvbugs/cuda/edit/1574543

Summary:
R340 drivers cause OpenCL data errors on pre-Fermi GPUs (see OpenCL SDK Code Samples)

Relevant Area:
CUDA C/C++ Runtime

Description:
R340 drivers cause OpenCL data errors on pre-Fermi GPUs (see OpenCL SDK Code Samples in Duplication Steps). R337 drivers were working correctly.

Duplication Steps:
-------------------------------
I have a laptop with a Quadro FX 3800M GPU, on Windows 8.1 x64. Recently, I installed the latest drivers (R340 341.05 WHQL), and have noticed data errors on some OpenCL applications. I then ran the full suite of OpenCL SDK Code Samples (from https://developer.nvidia.com/opencl). The full results of my testing, including an Excel summary file and a folder called "NVIDIA OpenCL SDK Code Samples - Testing Results", can be found on my OneDrive folder, here: http://1drv.ms/1zwM8k7

I've found that the following samples are presently failing on all R340 drivers for my GPU (340.43, 340.52, 340.66, 340.84, 341.05). The samples are NOT failing on the older R337 drivers (337.88)
- oclFDTD3d (FAILED): CompareData (tolerance 0.000100)… Data error at point (0,0,0) 3.678468 instead of 10.912090
- oclDXTCompression (FAILED): RMS(reference, result) = 5606.724609
- oclQuasirandomGenerator (FAILED): ckQuasirandomGenerator deviations ABOVE Allowable Tolerance
- oclConvolutionSeparable (FAILED): Relative L2 norm: 1.204e-001

Additionally, the following 2 samples appear to possibly have some other regression using R340:
- oclVolumeRender: Passed, but it seemed to look different than other times that I tested this.
- oclParticles: Failed - TDR - Out of Memory? - Error # -5 (CL_OUT_OF_RESOURCES) at line 99 , in file .\src\oclManager.cpp

One colleague has confirmed the exact same behavior (those first 4 tests fail on R340, but succeed on R337), on a GT 9800.

My questions are:
- Is this a known bug/limitation with using R340 WHQL on this GPU?
- Is it at all related to the errata on the Cuda6.5 toolkit, regarding csr2csc() and bsr2bsc()
- Would you please consider fixing it in a future R340 driver release?
- Do you plan on supporting this GPU (and its OpenCL execution) until April 2016, per http://nvidia.custhelp.com/app/answers/detail/a_id/3473

Note: Whatever is causing these silent data errors, may also be the cause of bug 1554016, reported by an acquaintance of mine.

If there's anything else you need to get this resolved promptly, please don't hesitate to contact me.
-------------------------------

Product:
FX 3800M, GT 9800

CUDA Toolkit Version:
CUDA Toolkit 6.5

Bug Priority:
High

CUDA Toolkit Details:
OpenCL is failing data integrity on R340 on pre-Fermi GPUs

Operating System(s):
Windows8-x64

Operating System Details:
Confirmed on Windows 8.1 x64
==================================================================
ID: 1597385 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1597388 - Posted: 5 Nov 2014, 22:05:20 UTC - in response to Message 1597376.  

For the host side, FWIW the application (2.2 & 2.3 planclasses) appears to not even be making it to device initialisation. That would seem to me the DLLs are somehow damaged, or the driver install has gone awry. I'd imagine a clean driver install (of a suitable known good older version for this GPU) and a project reset may be in order.

I was thinking of a project detach/reattach might be in order, a project reset doesn't clear out files not mentioned in the client_state.xml, in case the setienhanced cuda22 and cuda23 dll's are still hanging around. (Eric renamed them for Seti v7 release)

Claggy


And maybe uncheck the old v6 apps (SETI@home Enhanced) in SETI@home preferences and/or "If no work for selected applications is available, accept work from other applications"?

I honestly can't remember anymore but I'm pretty sure I had to forgo v6 to get started on v7. Is this ringing any bells? (I could probably manage to dig up the thread if need be)
ID: 1597388 · Report as offensive
DanHansen@Denmark
Volunteer tester
Avatar

Send message
Joined: 14 Nov 12
Posts: 194
Credit: 5,881,465
RAC: 0
Denmark
Message 1597397 - Posted: 5 Nov 2014, 22:14:24 UTC
Last modified: 5 Nov 2014, 22:17:25 UTC

Hi,


Haven't read all posts - only as long as Tbar's post:

Attention, if you have a nVidia card that's 4 years old or older, and have updated to Driver 340.xx, you are now Flooding SETI with Bad AstroPulse Science. This includes just about any nVidia card that's not at least a 400 series or around 4 years old or newer.

Except many people don't have a clue about 'FERMI' and the title doesn't mention a thing about Flooding SETI with BAD SCIENCE. To make matters worse, there is a thread about this New Driver, with over 1000 views, without a mention about it causing the older cards to Flood SETI with BAD SCIENCE.


Hi Tbar,

You couldn't be more right ;)

Maybe you can help me. Is these cards affected?:
OS: Win7 32bit/64bit - Asus GeForce GTX770
OS: Linux 64 bit - Asus GeForce GT640

Currently I'm using these drivers so I'm not affected yet, apparently:
OS: Win - Driver Version 327.23
OS: Linux - Driver Version 340.29

Thanks in advance ;)

.
Project Headless CLI Linux Multiple GPU Boinc Servers
Ubuntu Server 14.04.1 64bit
Kernel 3.13.0-32-generic
CPU's i5-4690K
GPU's GT640/GTX750TI
Nvidia v.340.29
BOINC v.7.2.42

ID: 1597397 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1597408 - Posted: 5 Nov 2014, 22:25:33 UTC - in response to Message 1597385.  

For what it's worth, I have filed my official bug report of it. The full details are below.

Basically, I can reproduce the SDK example problems and the Astropulse problem, on any R340 driver for my GPU. And when I use R337, it works fine. Hopefully they fix it so that the OpenCL SDK examples work correctly, and then hopefully that makes your app work correctly too :)

Regards,
Jacob

Lets hope... hope dies last as known... :)
wbr
ID: 1597408 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1597410 - Posted: 5 Nov 2014, 22:27:43 UTC - in response to Message 1597397.  

Maybe you can help me. Is these cards affected?:
OS: Win7 32bit/64bit - Asus GeForce GTX770
OS: Linux 64 bit - Asus GeForce GT640

No, none of those GPUs are pre-Fermi's.

Claggy
ID: 1597410 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1597423 - Posted: 5 Nov 2014, 22:58:58 UTC
Last modified: 5 Nov 2014, 23:05:44 UTC

Another problem with the Pre-Fermi cards is they apparently produce the same type of problem, Invalid APv7 results, when using the APv7 CommandLine option -oclFFT_plan xxx xx xxx. I noticed the problem with my NV 8800 GT running Driver 266.58 in Windows XP. I ran the AP Benchmark suit and it confirmed the issue was with the -oclFFT_plan option and the driver works fine when not using that option. I don't have any results for any of the other Pre-Fermi cards or drivers.
ID: 1597423 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1597599 - Posted: 6 Nov 2014, 10:06:25 UTC - in response to Message 1597423.  

Another problem with the Pre-Fermi cards is they apparently produce the same type of problem, Invalid APv7 results, when using the APv7 CommandLine option -oclFFT_plan xxx xx xxx. I noticed the problem with my NV 8800 GT running Driver 266.58 in Windows XP. I ran the AP Benchmark suit and it confirmed the issue was with the -oclFFT_plan option and the driver works fine when not using that option. I don't have any results for any of the other Pre-Fermi cards or drivers.


What particular values did you use?
It's known that not all theoretically possible combos are OK even for ATi cards and little testing were done for NV so far.
ID: 1597599 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1597621 - Posted: 6 Nov 2014, 10:39:01 UTC - in response to Message 1597599.  

Another problem with the Pre-Fermi cards is they apparently produce the same type of problem, Invalid APv7 results, when using the APv7 CommandLine option -oclFFT_plan xxx xx xxx. I noticed the problem with my NV 8800 GT running Driver 266.58 in Windows XP. I ran the AP Benchmark suit and it confirmed the issue was with the -oclFFT_plan option and the driver works fine when not using that option. I don't have any results for any of the other Pre-Fermi cards or drivers.


What particular values did you use?
It's known that not all theoretically possible combos are OK even for ATi cards and little testing were done for NV so far.


I only had the chance to test oclFFT_plan on some Fermi and later Cards.


With each crime and every kindness we birth our future.
ID: 1597621 · Report as offensive
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 1597670 - Posted: 6 Nov 2014, 12:58:37 UTC
Last modified: 6 Nov 2014, 13:44:43 UTC

I have received the following updates to my NVIDIA bug 1574543:
https://developer.nvidia.com/nvbugs/cuda/edit/1574543
Status changed from "Open - pending review" to "Open - in progress"

5 November 2014 9:32 pm Kevin Kang
Hi Jacob, thanks for the reporting. We have reproduced this issue and have assigned it to the appropriate developer team for reviewing. Thanks!
ID: 1597670 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1597684 - Posted: 6 Nov 2014, 14:39:42 UTC - in response to Message 1597670.  

I have received the following updates to my NVIDIA bug 1574543:
https://developer.nvidia.com/nvbugs/cuda/edit/1574543
Status changed from "Open - pending review" to "Open - in progress"

5 November 2014 9:32 pm Kevin Kang
Hi Jacob, thanks for the reporting. We have reproduced this issue and have assigned it to the appropriate developer team for reviewing. Thanks!

Well, at least you got a named contact out of it - that's more than Raistmer's rather less specific version of the same report got. I do think that the identified failure of NVidia's own sample code on professional hardware with enterprise drivers stands a better chance of being fixed than a third-party application on the consumer platform. Best of luck.
ID: 1597684 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1597686 - Posted: 6 Nov 2014, 14:47:52 UTC - in response to Message 1597599.  

Another problem with the Pre-Fermi cards is they apparently produce the same type of problem, Invalid APv7 results, when using the APv7 CommandLine option -oclFFT_plan xxx xx xxx. I noticed the problem with my NV 8800 GT running Driver 266.58 in Windows XP. I ran the AP Benchmark suit and it confirmed the issue was with the -oclFFT_plan option and the driver works fine when not using that option. I don't have any results for any of the other Pre-Fermi cards or drivers.


What particular values did you use?
It's known that not all theoretically possible combos are OK even for ATi cards and little testing were done for NV so far.

Besides the ones in the link, the test ran;
4 science app(s) found
(AP7_win_x86_SSE2_OpenCL_NV_r2721.exe)
(AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 64)
(AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 128)
(AP7_win_x86_SSE2_OpenCL_NV_r2721.exe -oclFFT_plan 128 8 256)
I still have the full results from the test.
ID: 1597686 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1597713 - Posted: 6 Nov 2014, 16:43:52 UTC - in response to Message 1597388.  

For the host side, FWIW the application (2.2 & 2.3 planclasses) appears to not even be making it to device initialisation. That would seem to me the DLLs are somehow damaged, or the driver install has gone awry. I'd imagine a clean driver install (of a suitable known good older version for this GPU) and a project reset may be in order.

I was thinking of a project detach/reattach might be in order, a project reset doesn't clear out files not mentioned in the client_state.xml, in case the setienhanced cuda22 and cuda23 dll's are still hanging around. (Eric renamed them for Seti v7 release)

Claggy


And maybe uncheck the old v6 apps (SETI@home Enhanced) in SETI@home preferences and/or "If no work for selected applications is available, accept work from other applications"?

I honestly can't remember anymore but I'm pretty sure I had to forgo v6 to get started on v7. Is this ringing any bells? (I could probably manage to dig up the thread if need be)

Bells are definitely ringing, very old ones. Take a look at thread "v7 cuda23 WUs getting ERR_TOO_MANY_EXITS" from June, 2013. The problem was never actually addressed directly, the hope apparently being that it would just eventually go away all by itself. That was 16+ months ago.
ID: 1597713 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 13 · Next

Message boards : Number crunching : @Pre-FERMI nVidia GPU users: Important warning


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.