NVIDIA Driver 388.13 Crashing/Recovering...

Message boards : Number crunching : NVIDIA Driver 388.13 Crashing/Recovering...
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1955363 - Posted: 14 Sep 2018, 18:52:35 UTC - in response to Message 1955360.  

The most obvious thing to me would be to go back to crunching Seti and see if the problem shows up.
Different computing applications can be very different in how they use the computer's resources.
And since you indicated that moving to Collatz was a recent change, that could change a lot.

Thank you for that. And, yes, Collatz crunching IS quite different than Einstein or SETI! MUCH more aggressive than ANY Project I've run in the past.

The fact that the Driver Crash ONLY SEEMS to happen during the daytime hours, (Non-crunching Hours), leads me to believe that just switching Projects back to SETI won't reveal anything at this time.

I'm beginning to believe that 388.13 itself may be a faulty Driver for my 1050s under Win7 Pro x64... Or, that the 1050 listed as Dev 0 in BOINC, (attached to my monitor), has somehow been damaged by Collatz AFTER Optimization with the Original Default of "sieve_size=30"...

I had good luck with 353.30 on Old Prometheus. (GA-EP45-UD3P MOBO with Intel Quad Core Extreme - QX9650 at 3GHz...)

[New Hardware List for New Prometheus:]

Intel i7 7700K @ 4.2GHz - New
CoolerMaster Hyper212 EVO - 4+ Years Old
Gigabyte GA-Z270-HD3 - New
32GB Corsair Vengeance DDR4 RAM, (4x8GB Sticks), 2400MHz - New
Lite-On ATAPI DVD Burner - 4+ Years Old
SilverStone FS303B Hot Swap Bay - 2+ Years Old, went in when Hackintosh-Andromeda was created.
WD Black SATA 1TB Hard Drive - 4+ Years Old
TWO EVGA GTX-1050 2GB GDDR5 VRAM Cards - New
Corsair HX750i Platinum Rated 750Watt PSU - New
Antec Mid Tower ATX Case - 4+ Years Old


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1955363 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1955364 - Posted: 14 Sep 2018, 18:59:03 UTC - in response to Message 1955359.  


2. try less aggressive compute settings - The Collatz Settings listed ARE the Less Aggressive Settings, now.


i mean try even LESS aggressive settings than you're using now.


7. replace physical hardware (GPUs/motherboard/etc) - ALL Hardware is Brand New, (except the Hard Drive - 4 Years Old), and IF you would like donate US $$$ to the cause, I'd THEN be willing to do this... Otherwise, NOT happening on my limited budget... (It took me 4-5 Months to get the Hardware I have now.)


even brand new hardware can be bad. I'm not saying that this is your issue right now, but if you check everything else off this list... at that point, you don't necessarily have to buy new hardware, but you you truly think a component is defective, your option is usually to:

a) send it back to the manufacturer for warranty repair/replacement (you'll pay shipping
b) deal with it until it fails completely


in any case, i think you should try at least some of the easier checks first to rule things out and help narrow down the list of possible problems.

and kittyman is absolutely right. different tasks will load up your hardware in different ways so a different app might expose issues that wont show up with another. you could run SETI all day long no problem, but run into issues trying to run an intense 3D application like FurMark, just for example.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1955364 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1955365 - Posted: 14 Sep 2018, 19:00:32 UTC
Last modified: 14 Sep 2018, 19:02:32 UTC

For those interested in comparing Original Collatz Optimization Specs:

Original Collatz Optimizations:

verbose=1
kernels_per_reduction=48
threads=8
lut_size=16
sieve_size=30
reduce_cpu=0
cache_sieve=1
sleep=1

These were running for about 4-5 Days before I noticed the Sluggishness in trying to use the System while crunching. STILL DIDN'T see Video Driver Crashing at this point.

Started adjusting these settings a couple days ago, (before getting this Thread created), and found the "Sweet Spot" Settings Posted earlier. After getting the "Sweet Spot" Settings, THEN during Non-Crunching Hours started having Driver Crashes.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1955365 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1955366 - Posted: 14 Sep 2018, 19:07:11 UTC - in response to Message 1955363.  


I'm beginning to believe that 388.13 itself may be a faulty Driver for my 1050s under Win7 Pro x64... Or, that the 1050 listed as Dev 0 in BOINC, (attached to my monitor), has somehow been damaged by Collatz AFTER Optimization with the Original Default of "sieve_size=30"...

I had good luck with 353.30 on Old Prometheus. (GA-EP45-UD3P MOBO with Intel Quad Core Extreme - QX9650 at 3GHz...)


i wouldn't immediately think the GPU was DAMAGED by running collatz. that's pretty unlikely unless the GPU was defective to begin with. new GPUs have lots of preventative measures built in that aim to specifically prevent damage to the core by the workload, usually in the form of throttling the card if it exceeds current, power, or thermal limits.

are you planning to try to install the 353.30 drivers? when searching for the older drivers, the oldest ones i can find from nvidia's site for the 1050 is the 385 driver from 1 year ago. I'm not sure if they purge drivers older than a year or something.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1955366 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1955368 - Posted: 14 Sep 2018, 19:12:05 UTC - in response to Message 1955364.  


2. try less aggressive compute settings - The Collatz Settings listed ARE the Less Aggressive Settings, now.


i mean try even LESS aggressive settings than you're using now.


7. replace physical hardware (GPUs/motherboard/etc) - ALL Hardware is Brand New, (except the Hard Drive - 4 Years Old), and IF you would like donate US $$$ to the cause, I'd THEN be willing to do this... Otherwise, NOT happening on my limited budget... (It took me 4-5 Months to get the Hardware I have now.)


even brand new hardware can be bad. I'm not saying that this is your issue right now, but if you check everything else off this list... at that point, you don't necessarily have to buy new hardware, but you you truly think a component is defective, your option is usually to:

a) send it back to the manufacturer for warranty repair/replacement (you'll pay shipping
b) deal with it until it fails completely


in any case, i think you should try at least some of the easier checks first to rule things out and help narrow down the list of possible problems.

and kittyman is absolutely right. different tasks will load up your hardware in different ways so a different app might expose issues that wont show up with another. you could run SETI all day long no problem, but run into issues trying to run an intense 3D application like FurMark, just for example.


I do plan on implementing some of the Fixes you've Posted... Just need to take them one step at a time.

As just recently state, YES, I agree that different Projects and Tasks run VERY differently than SETI. It seems that Collatz is VERY aggressive in what they do and how they do things...

If a different Video Driver doesn't yield positive results, I will switch the GPU Device Positions and see what happens there.

Temps do NOT seem to be an issue... Non-Crunching Temps seem to hover Less Than 40 C, and Crunching Temps Less Than 60 C for CPU... Non-Crunching GPU Temps are Less Than 30 C, and Crunching Temps are Less Than 70 C.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1955368 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1955434 - Posted: 15 Sep 2018, 0:41:38 UTC - in response to Message 1955348.  

I am not a GPU expert. I have a GTX 1050 Ti installed in my Windows 10 PC with its A10-6700 AMD CPU, not overclocked. I have the following results; Einstein@home GPU tasks run peacefully, with a low error percentage. SETI@home GPU tasks run also, but cause frequent reboots, but the tasks complete and validate,no errors. GPUGRID GPU tasks run, but the GPU reaches 80 C and then the task crashes, without causing a reboot. So I am mostly running Einstein@home tasks on it, leaving SETI@home and GPUGRID GPU tasks to my Linux box with its GTX 750 Ti which never goes above 70 C.
Tullio


Hi,
Did you ever install TThrottle on your Windows box? I remember you posting the exact same info above but no report on getting your temperature under control using robust temp control software.

Respectfully,
Tom
A proud member of the OFA (Old Farts Association).
ID: 1955434 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1955440 - Posted: 15 Sep 2018, 1:10:47 UTC - in response to Message 1955434.  
Last modified: 15 Sep 2018, 1:13:12 UTC

I am not a GPU expert. I have a GTX 1050 Ti installed in my Windows 10 PC with its A10-6700 AMD CPU, not overclocked. I have the following results; Einstein@home GPU tasks run peacefully, with a low error percentage. SETI@home GPU tasks run also, but cause frequent reboots, but the tasks complete and validate,no errors. GPUGRID GPU tasks run, but the GPU reaches 80 C and then the task crashes, without causing a reboot. So I am mostly running Einstein@home tasks on it, leaving SETI@home and GPUGRID GPU tasks to my Linux box with its GTX 750 Ti which never goes above 70 C.
Tullio


Hi,
Did you ever install TThrottle on your Windows box? I remember you posting the exact same info above but no report on getting your temperature under control using robust temp control software.

Respectfully,
Tom

The Temps, with Hyper212, Non-Crunching are UNDER 40 C, and Crunching Under 60 C... NO NEED for TThrottle...

I ROUTINELY monitor ALL System Temps and Specs through CPUID Hardware Monitor.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1955440 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1955523 - Posted: 15 Sep 2018, 15:04:01 UTC - in response to Message 1955434.  

On the Linux box with GTX 750 Ti the SETI@home GPU tasks reach 54 C, the GPUGRID tasks 61 C. Einstein@home GPU tasks on the Windows 10 PC reach 64 C. What is the problem on it are the Windows 10 reboots I check on the Maintenance/Reliability Monitor and all reboots happen only when SETI@home GPU tasks are running.
Tullio
ID: 1955523 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1955556 - Posted: 15 Sep 2018, 18:54:00 UTC

I have one very important question -- why are you running GTX 1050s under 388.13, which is almost a year old (10 Oct, 2017)? I venture that if you upgrade your driver to at least 391.01 (26 Feb, 2018) you will have better success.


I don't buy computers, I build them!!
ID: 1955556 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13732
Credit: 208,696,464
RAC: 304
Australia
Message 1955591 - Posted: 15 Sep 2018, 22:12:04 UTC - in response to Message 1955556.  

I have one very important question -- why are you running GTX 1050s under 388.13, which is almost a year old (10 Oct, 2017)? I venture that if you upgrade your driver to at least 391.01 (26 Feb, 2018) you will have better success.

And unless he's running games, or a particular programme that the new driver addresses an issue for, I doubt it would make any difference.
My GTX 750Tis were running quite happily on the original supporting release drivers on my old Vista system. They had to use a more recent driver when they moved to new hardware & Win10. No change in processing times, or system reliability.
Grant
Darwin NT
ID: 1955591 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1955604 - Posted: 15 Sep 2018, 22:54:50 UTC - in response to Message 1955556.  
Last modified: 15 Sep 2018, 22:58:54 UTC

I have one very important question -- why are you running GTX 1050s under 388.13, which is almost a year old (10 Oct, 2017)? I venture that if you upgrade your driver to at least 391.01 (26 Feb, 2018) you will have better success.

Well, 388.13 is exactly what was given on the Install DVD contained in the EVGA Packaging. Keep in mind that I bought the system and GPUs piece by piece, and that I started this endeavor 5 Months ago!

Also, as Grant said: "And unless he's running games, or a particular programme that the new driver addresses an issue for, I doubt it would make any difference."

For games, I run Blizzard's Battle.net App and run StarCraft:Remastered, StarCraft II, HoTS, and Hearthstone... I also run GOG.com's Galaxy Launcher and therein run 7th Guest, 11th Hour, and Wing Commander III.

NONE of the above games requires the latest Driver Set, and since BOINC DOES NOT require the Latest Cutting Edge Drivers to work... :p


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1955604 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1956801 - Posted: 22 Sep 2018, 15:13:24 UTC

I am not a gamer. What I suspect is that the BOINC Screen saver is causing some crashes on my Windows 10 PC.While running SETI@home GPU tasks, GPU-Z gives 67 C temp, 45% fan, 98% GPU load.
Tullio
ID: 1956801 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1956804 - Posted: 22 Sep 2018, 15:27:01 UTC - in response to Message 1956801.  

@tullio

I am not a gamer. What I suspect is that the BOINC Screen saver is causing some crashes on my Windows 10 PC.While running SETI@home GPU tasks, GPU-Z gives 67 C temp, 45% fan, 98% GPU load.
Tullio


I remember Windows 10 running the screen saver used to screwup the processing but I don't remember the people having that problem saying it was crashing things. This was the creator edition, a year ago last fall, I think.

Does the problem go away if you disable the screen saver in Windows?

Are you running any commandlines files or app_config.xml files in the Seti dir? If yes, could you post the contents?

Thank you.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1956804 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1956809 - Posted: 22 Sep 2018, 15:49:41 UTC - in response to Message 1956804.  

I am a Windows novice, since I use mostly Linux after having used UNIX in my professional life. So all my Windows parameters are default. I cannot but accept the Windows upgrades, and get my nVidia drivers via Geforce.
Tullio
ID: 1956809 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : NVIDIA Driver 388.13 Crashing/Recovering...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.