Message boards :
Number crunching :
NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 20 · Next
Author | Message |
---|---|
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
Thank you everyone for the feedback. Fortunately, Nvidia is quick to fix any bugs that people find as long as they report the bug in the GeForce driver feedback form at https://docs.google.com/forms/d/e/1FAIpQLSewHJk1xP-C5elLBRCDLTLpNQZ9eiefrdZmUGP9hMCN6gKssA/viewform, which I found at https://www.nvidia.com/en-us/geforce/forums/game-ready-drivers/13/320827/official-geforce-43648-game-ready-driver-feedback-/. Since you are a volunteer tester, maybe you can fill out the form with more detail than I could fill it out with. I remember that Nvidia has fixed a bug that caused its drivers to incorrectly compute some math affecting PrimeGrid, and it once found a bug in the code for Folding@home instead of the driver thanks to a valid driver optimization that triggered the bug but would have computed the math correctly had the bug not have been present in Folding@home. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I believe the bug has been posted over a month ago to Nvidia feedback/support. Still hasn't been resolved/fixed. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3800 Credit: 1,114,826,392 RAC: 3,319 |
I've discovered many excessive runtime tasks with Arecibo VLAR, not VHAR, work units on this computer that resulted in 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED errors but on driver 430.50, not a 436.xx driver, and I'm wondering if they may be related. They all have runtimes of 12,000+ seconds on GPU though the usual completion time is about 35-50 seconds. The computer was recently rebooted and is processing all other work properly. As well these work units errored out on other computers they were assigned to, even on CPU, though not from excessive runtimes. Tasks are: 8100604949, 8100604968, 8100605056, 8100605039, 8100605044, 8100605056, 8100605074, 8100605099, 8097708300, 8097708079, 8097708336, 8097708355, 8097708139, 8097708255 and 8097708262. Hope this proves helpful... |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I've discovered many excessive runtime tasks with Arecibo VLAR, not VHAR, work units on this computer that resulted in 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED errors but on driver 430.50, not a 436.xx driver, and I'm wondering if they may be related. They all have runtimes of 12,000+ seconds on GPU though the usual completion time is about 35-50 seconds. The computer was recently rebooted and is processing all other work properly. As well these work units errored out on other computers they were assigned to, even on CPU, though not from excessive runtimes. I not check all the WU but the ones i look are GPU WU but are running on the CPU not the GPU, that will cause the EXIT_TIME_LIMIT_EXCEEDED error. Now you need to find why that happening. Did you reschedule them? <core_client_version>7.16.1</core_client_version> <![CDATA[ <message> exceeded elapsed time limit 12958.64 (41454.44G/3.20G)</message> <stderr_txt> Not using mb_cmdline.txt-file, using commandline options. Build features: SETI8 Non-graphics FFTW FFTOUT JSPF SSE4.1 64bit System: Linux x86_64 Kernel: 4.15.0-65-generic CPU : AMD FX(tm)-8350 Eight-Core Processor 8 core(s), Speed : 1404.693 MHz L1 : 64 KB, Cache : 2048 KB Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT PNI SSSE3 SSE4A SSE4_1 SSE4_2 AVX FMA4 ar=0.012314 NumCfft=145989 NumGauss=0 NumPulse=50164588672 NumTriplet=67958470816 In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768 Linux optimized setiathome_v8 application Version info: SSE4.1xjf (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSE4.1xjf Linux64 Build 3711 , Ported by : Raistmer, JDWhale, Urs Echternacht |
rob smith Send message Joined: 7 Mar 03 Posts: 22493 Credit: 416,307,556 RAC: 380 |
There is something very strange about that computer - from one of the error tasks: Build features: SETI8 Non-graphics FFTW FFTOUT JSPF SSE4.1 64bit The CPU is clocked at about a third of the normal speed for an AMD FX8350 Indeed the majority of the task description suggests this task was actually run on a CPU not a GPU, despite the headline claim of running on a GPU. There is no mention of either OpenCL or CUDA, only the various CPU options which would (should?) be there. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3800 Credit: 1,114,826,392 RAC: 3,319 |
|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thank you both... that was it. The error tasks listing indicates "SETI@home v8 Anonymous platform (NVIDIA GPU)" so I failed to notice that no GPU was involved. They must have been rescheduled and I forgot about it. Apologies and please disregard. Or this condition can be caused by an unstable gpu which produces too many pulses or invalid power readings. When that happens, the stderr.txt will state the case that the PoT was exceeded (Power over Time) and the gpu task was moved to the cpu for completion. That always leads to a -197 time exceeded error. I have not seen any errors on VLAR tasks only on the VHAR tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jesse Viviano Send message Joined: 27 Feb 00 Posts: 100 Credit: 3,949,583 RAC: 0 |
I just filed a bug report with Nvidia in case nobody else has done so. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
I just filed a bug report with Nvidia in case nobody else has done so.Thanks. Hopefully people from the other projects that are having similar issues will have done the same thing, and with luck the multiple reports from different sources will add some weight to the issue. Grant Darwin NT |
Wiggo Send message Joined: 24 Jan 00 Posts: 36584 Credit: 261,360,520 RAC: 489 |
Thankfully I a very long way from being in the "Latest is Greatest Belief Club". ;-) Cheers. |
daysteppr Send message Joined: 22 Mar 05 Posts: 80 Credit: 19,575,419 RAC: 53 |
https://setiathome.berkeley.edu/workunit.php?wuid=3678863986 Name 19au10ac.25800.16848.10.37.129_1 Workunit 3679081064 Created 4 Oct 2019, 9:43:51 UTC Sent 4 Oct 2019, 13:18:47 UTC Report deadline 25 Oct 2019, 0:28:29 UTC Received 6 Oct 2019, 10:25:28 UTC Server state Over Outcome Computation error Client state Compute error Exit status 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED Computer ID 8591279 Run time 45 min 3 sec CPU time 13 sec Validate state Invalid Credit 0.00 Device peak FLOPS 18,124.80 GFLOPS Application version SETI@home v8 Anonymous platform (NVIDIA GPU) Peak working set size 112.63 MB Peak swap size 137.32 MB Peak disk usage 0.02 MB Thats what it gives me when i get a wu that goes to 0.006 and essentially stops for an hr ish in time. So, is it flaky power to the card or the drivers? |
Wiggo Send message Joined: 24 Jan 00 Posts: 36584 Credit: 261,360,520 RAC: 489 |
That workunit has a true angle of 2.715739 so it's the driver that you are using, roll back to 431.60 driver to stop that. Cheers. |
Bill Send message Joined: 30 Nov 05 Posts: 282 Credit: 6,916,194 RAC: 60 |
I'm getting the same problem on the 436.48 DCH driver. I have rolled back to the 431.60 DCH driver and everyting works good on my 1660 Ti. I posted on the Nvidia message board as Jesse suggested, so we'll see what happens. Seti@home classic: 1,456 results, 1.613 years CPU time |
Whirling Steel Send message Joined: 21 Sep 19 Posts: 8 Credit: 84,779 RAC: 0 |
If you are experiencing difficulties with this driver, I found that when I updated my system to this driver NVidia sleazed a program with the driver named GeForce Experience. This program was a huge problem for me. If you installed this driver and see that GeForce Experience is loaded, get it the heck out of your system. I had no more issues after I removed that program. |
Bernie Vine Send message Joined: 26 May 99 Posts: 9958 Credit: 103,452,613 RAC: 328 |
This program was a huge problem for me. If you installed this driver and see that GeForce Experience is loaded, get it the heck out of your system. I had no more issues after I removed that program. Use the custom install option. That allows you to not install it. Saying that I have had it installed for the past year with no problems, GeForce Experience is really only for gaming. I use it as I game and it allows me to do in game screenshots, and video and to optimise games for the best performance. Wouldn't want to be without it now. I made the decision to stop crunching on GPU's on my Windows machines and just run one of my Linux rigs for a couple of days a week. |
Whirling Steel Send message Joined: 21 Sep 19 Posts: 8 Credit: 84,779 RAC: 0 |
Agreed Bernie. in my experience, end users never use the "custom" or "advanced" options when installing software :) |
Whirling Steel Send message Joined: 21 Sep 19 Posts: 8 Credit: 84,779 RAC: 0 |
I have a semi off-topic... ok completely off topic question, as this is the only thread I have posted to :/ |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13841 Credit: 208,696,464 RAC: 304 |
#2 Can someone PLEASE tell me what the "credits" are for?For keeping track of how much work you have done. Grant Darwin NT |
Whirling Steel Send message Joined: 21 Sep 19 Posts: 8 Credit: 84,779 RAC: 0 |
thanks man I appreciate it |
Patrick Meyer Send message Joined: 18 Jun 11 Posts: 5 Credit: 23,418,285 RAC: 104 |
do you think that NVIDIA will ever fix the driver |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.