Message boards :
Number crunching :
Safe despite errors? (nvlddmkm)
Message board moderation
Author | Message |
---|---|
[ue] Alex Send message Joined: 3 Apr 99 Posts: 9 Credit: 1,026,736 RAC: 0 ![]() |
I am getting constant (nvlddmkm) errors. GFX card, GTX280, stops to function because of that driver, starts again and the process repeats until my whole system reboots. All i want confirmation on is that, is this because of the data in the WU itself, or is it a driver issue, or gfx card issue or my optimized client.....and will this cause damage to my card? Basically is the GPU client safe to run? |
![]() ![]() Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 ![]() |
It is the opinion of many here that since CUDA is returning work that is invalid it should not be used. The final answer to your question is of course up to you. Boinc....Boinc....Boinc....Boinc.... |
[ue] Alex Send message Joined: 3 Apr 99 Posts: 9 Credit: 1,026,736 RAC: 0 ![]() |
Yes, i looked through these forums. Was just wondering if anyone had an educated best guess. I know running this kind of stuff 24hours is taxing on any system. I think ill go back to cpu's for awhile. Thanks |
![]() Send message Joined: 30 Sep 04 Posts: 70 Credit: 11,323,275 RAC: 0 ![]() |
....... Basically is the GPU client safe to run? Not safe for others who have CPU's, in my opinion. ...... as I am setting No New Tasks on some boxes......... (Have 10 day cache, so if it gets fixed before then I can re-load) ![]() |
Golden_Frog ![]() Send message Joined: 28 Oct 99 Posts: 27 Credit: 1,650,057 RAC: 0 ![]() |
My best bet is that the video driver is corrupted. I had the same issues with driver crashing on my 8800GS box. Even after a driver sweep I couldn't get it fixed. I ended up reformatting and downgrading from Vista 64bit to Vista 32 bit. This seems to have fixed the issue as I have been crunching error free for 2 days now. ![]() |
![]() ![]() Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 ![]() ![]() |
safe for your PC ? Your Seti work? Any time you your PC restarts suddenly its bad. Possible solution http://www.eggheadcafe.com/software/aspnet/29415832/display-driver-nvlddmkm-s.aspx? As always google is your friend ![]() In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 ![]() |
I am getting constant (nvlddmkm) errors. GFX card, GTX280, stops to function because of that driver, starts again and the process repeats until my whole system reboots. "Safe" is as always, relative. If in your particular environment it causes you to lose your video card, then if you do any other work on that machine that could be impacted (lost) due to a crash, then I would likely recommend against it until you find the root cause and fix it. I'd start with drivers. The other question, which some have raised is "safety" with regard to the work being done. Matt reports that 3% of the results being returned are from CUDA, with 97% logically being returned by CPU apps. I don't know what percentage of the CUDA work is valid, and what percentage isn't, and I don't think anyone else does either. But, in order for a "bad" CUDA result to make it to the science database, it has to match up with another, identically bad CUDA result. In other words, not only bad results, but bad in the same way. The odds of a work unit being assigned to two CUDAs is 3% * 3%, or about 0.09% and if we assume that every CUDA result is bad (which probably isn't true) and that bad CUDA results are consistent, then the "threat" is 0.09%. The rest get caught and filtered out by the validator. Since that is the most pessimistic number, and we probably are getting valid CUDA results, I'd say that CUDA is fairly safe from a science standpoint. ... and that the "bad" results are very helpful to the project as a whole right now. -- Ned |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
safe for your PC ? Your Seti work? Any time you your PC restarts suddenly its bad. Checked this solution - it's not the case on my host at least with 181.20 driver. It updates nvlddmkm.sys in system32/drivers properly. Maybe nVidia already repaired this driver installer bug. |
David Send message Joined: 8 Aug 00 Posts: 20 Credit: 301,705 RAC: 0 |
I have also been having this issue for quite some time, and it only BSODs with the nvlddmkm.sys error when BOINC is running. This error still occurs, even with the latest Nvidia drivers (I've tried many versions) Forceware 182.06 WHQL. Any resolution for this by chance? Thanks! |
Imannotu Send message Joined: 9 Aug 08 Posts: 1 Credit: 205,956 RAC: 0 ![]() |
O.k. I know this sounds crazy but I actually had this same driver crash happen for me in video games so I searched around game makers websites and found out it was a problem with over heating. I use EVGA precision (because they made my card) and found that this crash only happened when my card reached 80-90 Celsius then either the driver would crash or the computer would crash. I found out this was a defense mechanism put in place by nvidia so that in the case of over heating your computer would stop its GPU intensive activitys and allow the fan time to cool your card. So for a while I turned up the fan and kept the temp. low manually. But just when I thought I was going out of my mind nvidia and subsequently EVGA released a new driver that solved the fan regulation issue. Now my card stays at 50-63 on full load and has no problems. I also had this problem with boinc so I tuned the fan up and that solved the issue how ever if you are not able to control the fan you could also try lowering the load on you'r card by turning down the % of gpu boinc can use. And as always, update to the newest drivers. Hope this helps. |
![]() ![]() Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 ![]() |
Hi, IMHO, most graphics cards, run too hot, too easy. When they are also being used @ 100% and maybe also overclocked, the certainly get too hot, if not actively cooled and the casing also, has sufficient airflow. Otherwise, it's doubt full, that they have a long life. (I've seen cards with burnt memory-chips, for example.) I'am sure there are back-planes, for PCI-Express, too, they only way to run > 3 cards, at a time. I'am pretty sure, of the future for parallel computing, using graphic-cards, everything is difficult in the beginning, with a lot of failures, too. But, when you see, the number 1 in R.A.C., sure a powerfull PCU Q9770 @ 4GHz. and 6 nVIDIA GTX 295 cards. The biggest part of the computation, comes from these graphic cards. Probably 7-8K for the CPU and ~16K for the 6 GTX 295, a beautiful piece of 'hard-ware', I.M.H.O. :) ![]() |
David Send message Joined: 8 Aug 00 Posts: 20 Credit: 301,705 RAC: 0 |
I hate to say it, but in this case it IS NOT that the cards are running to hot. This will happen when BOINC has just just barely started to crunch numbers (I do not have the screensaver enabled). The cards heat levels stay pretty much near nominal levels. Again, it only happens with BOINC. I can run massively graphic intensive games, with NO crashes...ever. So I've narrowed it down to BOINC, and no heat issues. |
![]() Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 ![]() |
The GTX 200 series of cards run very hot by themselves. The added problem is that they blow the hot air out the backplate, heating up half the memory that's there. And that goes pretty quickly with CUDA. If you do not have sufficient airflow behind your computer, you're prone to heat related crashes. Run GPU-Z to check your actual temperatures. If need be run with the log on. |
David Send message Joined: 8 Aug 00 Posts: 20 Credit: 301,705 RAC: 0 |
Again, it's not the GPUs fault. Nor are the GPU temps anything that would not be expected under a heavy load. For the GTX 280 in idle you can expect a temperature of 50-55 Degrees C. Pretty normal. At 100%, the temperatures tnormally settle at 85 Degrees, but nowhere near the 105 Degrees C threshold for the GPU to jump into safe mode. My GPUs with BOINC running usually run between 69 degrees C and 81 degrees C under full load. The fans don't even kick up to 100%, but setting in around 51%. This is WELL within norms. Again, I play extremely graphics intensive games, that frequently push the GPUs to the limit, and have the fans running at 100%, but it NEVER crashes. EVER. Let me repeat this again, the crash on occurs ONLY when BOINC is running tasks. It never crashes any other time....NEVER. The crash is directly related to BOINC. |
![]() ![]() Send message Joined: 21 Jun 01 Posts: 21804 Credit: 2,815,091 RAC: 0 ![]() |
http://www.nvlddmkm.com/ me@rescam.org |
David Send message Joined: 8 Aug 00 Posts: 20 Credit: 301,705 RAC: 0 |
Again, none of these apply in this case, as this issue only occurs with BOINC running tasks. Also, I long ago installed the latest drivers, and patches. After a lot of trial and error, testing, etc...I've tracked this down to ONLY BOINC running tasks, and no other app or game on my system EVER causes this issue. Why do I have to keep repeating this? |
![]() ![]() Send message Joined: 21 Jun 01 Posts: 21804 Credit: 2,815,091 RAC: 0 ![]() |
Why do I have to keep repeating this? BOINC message board <-- try there. Otherwise your only solution is to stop using BOINC. No more need for repetition. Edit: The latest driver versions were released recently, since you updated long ago you'll need to update again. me@rescam.org |
David Send message Joined: 8 Aug 00 Posts: 20 Credit: 301,705 RAC: 0 |
Brilliant. The suggestion is to stop using BOINC. Wow..... I am running the latest drivers, I've said that over and over again as well. Geesh...sad. |
![]() ![]() Send message Joined: 4 Oct 00 Posts: 35 Credit: 2,051,424 RAC: 0 ![]() |
The only combo that has worked for me in 64bit without any issues... 6.5.0/181.22 Fish ![]() |
![]() ![]() Send message Joined: 21 Jun 01 Posts: 21804 Credit: 2,815,091 RAC: 0 ![]() |
Brilliant. The suggestion is to stop using BOINC. Wow..... to quote: Also, I long ago installed the latest drivers, and patches. So in your mind any date after Feb 18 when the latest drivers were made available is LONG AGO. Brilliant that you finally posted at the BOINC board. Sad that you didn't appreciate the help the people were trying to provide for you here. :( me@rescam.org |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.