Ton´s of error´s - Solved? - Possible cause Asus Disk Unlocker incompatibility with cuda50

Message boards : Number crunching : Ton´s of error´s - Solved? - Possible cause Asus Disk Unlocker incompatibility with cuda50
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1397488 - Posted: 1 Aug 2013, 9:26:00 UTC
Last modified: 1 Aug 2013, 9:44:15 UTC

One of my hosts starts to produce ton´s of error WU after yesterday.

The host is: http://setiathome.berkeley.edu/show_host_detail.php?hostid=5280419

The WU starts and after few second stops with this error:

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741819 (0xc0000005)
</message>
]]>

I allready reinstall all from the Nvidia driver (clean instalation) to boinc & lunnatics (before install i clean all the old directories to be sure i left nothig behind)

It´s runs cuda50 and have a 670 GPU, all temps are ok.

Any clues?
ID: 1397488 · Report as offensive
Profile Vicki
Avatar

Send message
Joined: 30 Nov 01
Posts: 65
Credit: 1,640,576
RAC: 46
New Zealand
Message 1397498 - Posted: 1 Aug 2013, 10:41:42 UTC - in response to Message 1397488.  

Hmm, that's a tricky 1. How many cpu tasks are running at the same time as the gpu unit? Since that host has 4 cores, I'd be inclined to try setting max cpu tasks to 3 or less to see if that makes a difference. May Im also suggest openng up your system unit & dusting <especially the cooling fans & power supply with a pastry bruch or similar> & physicaly removing & reinserting your graphics card in case somethng has crept in there. Error checking & defragmenting your hard drive may also be worth a try.
A city destroyed by an earthquake is an opportunity to Rebuild, redeign & make it a better place to be. Better, stronger, faster like the 6 Million Dollar Man
ID: 1397498 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1397502 - Posted: 1 Aug 2013, 11:04:23 UTC - in response to Message 1397498.  
Last modified: 1 Aug 2013, 11:12:48 UTC

Hmm, that's a tricky 1. How many cpu tasks are running at the same time as the gpu unit? Since that host has 4 cores, I'd be inclined to try setting max cpu tasks to 3 or less to see if that makes a difference. May Im also suggest openng up your system unit & dusting <especially the cooling fans & power supply with a pastry bruch or similar> & physicaly removing & reinserting your graphics card in case somethng has crept in there. Error checking & defragmenting your hard drive may also be worth a try.

- 2 cores are free to feed the gpu. The host is configurated to crunch GPU WU 1 AP + 1 MB or 2 MB max and with a maximum of 2 CPU WU.
- fans are all clear, i allready physicaly remove and reinsert the card.
- disk checked and defraged.
- GPU AP WU are running OK, the host just compleated 2 at almost the same time the MB error apears.
- just in case, reinstalling the lunnatics enabling only cuda42 even if the host have a 670, i don´t know why but i remember see something similar few weeks ago, for some reason when i run cuda50 on the host (even with only keppler GPU) i get an error msg, with cuda42 all works ok, just don´t remember if that happening on this specific host or why the error ocours. At that time even with the help of Jason´s i can´t discover why the error apears. All the test we done show all was Ok at that time.
- if you see the validated WU the hosts works crunch a lot of WU without any problem before.
- due the large numbers of error will take some time for the host start to receive MB again in order to see if the error gone.
ID: 1397502 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1397514 - Posted: 1 Aug 2013, 11:56:35 UTC

That's a windows error.
might be that strange bug with cuda 5 on Kepler and cuda 5.5 drivers or might just be a broken file or permission somewhere.
Anyway you are producing valid results again.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1397514 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1397538 - Posted: 1 Aug 2013, 13:33:33 UTC - in response to Message 1397514.  
Last modified: 1 Aug 2013, 13:40:16 UTC

That's a windows error.
might be that strange bug with cuda 5 on Kepler and cuda 5.5 drivers or might just be a broken file or permission somewhere.
Anyway you are producing valid results again.

Sorry i was sleeping...

Yes but it´s running now with cuda42.

If that´s is a windows error, i remember something else, this is my only hosts who uses a 3GB HDD, could be because it uses the Asus program that allow you to use large HDD by split it in 2 and the boinc is running on the splited partition? But the error apear just in the begining of the crunching process.

But what realy bugs my mind is why the error apears from nowhere after the hosts crunchig a lot of WU without problem... a broken file (i reinstall all from clean instalation) permission (i wish i know more about but i belive is to dangerous waters for me).

Last time that happening the only way to stop that error is by change the GPU for a 580 but i only have 1 580 now, i´m waiting for the new 770 arrives.

Jason´s? Any clue? I don´t want to leave a 670 host crunching with cuda42... i allready check lantency, memory, even change the 670 , psu and all is working fine..
ID: 1397538 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1397553 - Posted: 1 Aug 2013, 14:10:18 UTC - in response to Message 1397538.  
Last modified: 1 Aug 2013, 14:27:58 UTC

Jason´s? Any clue? I don´t want to leave a 670 host crunching with cuda42... i allready check lantency, memory, even change the 670 , psu and all is working fine..



Yes this:
(unknown error) - exit code -1073741819 (0xc0000005)


Looks driver related alright, BUT, not necessarily directly nVidia driver related, since Cuda 5 relies on so much & that's way before any Cuda device startup.

- Check your motherboard manufacturer for a BIOS update.
- check drive integrity
- check the date/version on your chipset (Intel ?) drivers in device manager...sometimes these are tricky to force to update after OS install.
- Start->Run->cmd & press enter , sfc /scannow , checks all system files for corruption

[Edit:] example sfc run:
C:\Users\Jason>sfc /scannow

Beginning system scan. This process will take some time.

Beginning verification phase of system scan.
Verification 100% complete.

Windows Resource Protection did not find any integrity violations.

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1397553 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1397574 - Posted: 1 Aug 2013, 15:58:16 UTC - in response to Message 1397553.  
Last modified: 1 Aug 2013, 16:02:52 UTC

Before i do that few "stupid" questions...

- what else you means with check drive integrity, i already do chdksk & defrag on that HDD, another question the problem could be originated by the ASUS driver who split the 3GB hdd in 2 to be used in Win 7/64 due it´s 2 GB limitation? This is my only host that uses it, and i allready use this MB/Bios configuration on other several hosts without any problem.

- when you say... sometimes these are tricky to force to update after OS install... what or where i need to look/do?

sfc allready run no violation integrity found.

On last thing, the problem starts when i change the old 580 (was running cuda42) for a new 670 (start to run cuda50) the last weekend, yes i redo a totaly clean instalation (nvidia/boinc/lunatics) after i see the firsts errors...

thanks for your usual help.
ID: 1397574 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1397622 - Posted: 1 Aug 2013, 16:58:50 UTC - in response to Message 1397574.  
Last modified: 1 Aug 2013, 17:01:57 UTC

Before i do that few "stupid" questions...

- what else you means with check drive integrity, i already do chdksk & defrag on that HDD, another question the problem could be originated by the ASUS driver who split the 3GB hdd in 2 to be used in Win 7/64 due it´s 2 GB limitation? This is my only host that uses it, and i allready use this MB/Bios configuration on other several hosts without any problem.
chkdsk OK ? good

- when you say... sometimes these are tricky to force to update after OS install... what or where i need to look/do?
I may need to make a special post with pictures. Need more beer :) . The short version is that if most of the complex Intel(R) named chipset devices, in device manager system devices don't have matching 2013 dates & versions, chances are there can be instability, like with say for example my listed Intel(R) G33 Chipset says 9/07/2013 version 9.1.9.1004, as with the processor IO controller and all the PCI express ports connected to it. A mismatched (or old, or generic non-Intel ) version can mean changes, compatibility problems.

sfc allready run no violation integrity found.
Great, one less suspect (Windows Kernel files)

On last thing, the problem starts when i change the old 580 (was running cuda42) for a new 670 (start to run cuda50) the last weekend, yes i redo a totaly clean instalation (nvidia/boinc/lunatics) after i see the firsts errors...


Totally clean is a big word these days, there is a hidden compute cache maintined by the drivers :D.
- Stop Boinc
- Paste %AppData%\Roaming\NVIDIA\ComputeCache into the address bar of a file browser window.
- If there are folders in there like 0,1,2... & index, delete them all & reboot
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1397622 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1397624 - Posted: 1 Aug 2013, 17:08:07 UTC - in response to Message 1397622.  
Last modified: 1 Aug 2013, 17:17:22 UTC


Totally clean is a big word these days, there is a hidden compute cache maintined by the drivers :D.

Then why they call "clean instalation"? Someone at Nvidia is smoking bad "c-words"...

Allready do that and allow cuda50 back to work, lets see if that clears the problem.

Thanks my guru, go for some beer i could wait, i´m at work anyway...
ID: 1397624 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1397629 - Posted: 1 Aug 2013, 17:18:56 UTC - in response to Message 1397624.  


Totally clean is a big word these days, there is a hidden compute cache maintined by the drivers :D.

Why they call "clean instalation"? Someone at Nvidia is smoking bad "c-words"...

Allready do that and allow cuda50 back to work, lets see if that clears the problem.

Thanks my guru, go for some beer i could way, i´m at work anyway...



No Problem. I suggest to check the Intel drivers anyway: Here's a pic I took of where in device manager (RightClick 'Computer'-->Manage-->device Manager)


"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1397629 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1397637 - Posted: 1 Aug 2013, 17:31:06 UTC - in response to Message 1397629.  

The Hdd imanager was the ASUS disk unlocker that allow the use of the 3GB HDD.

I made the tests i just don´t know how to put the screen on this forum but that say´s

Intel(R) B75 Express Chipset LPC COntroller - 1E49

Version Driver: 9.3.0.1011
Date: 26/08/2011

Allready check the driver update buttom and says that´s is the latest avaiable driver.

Can´t check the Bios version of the host now but will do that ASAP.

I clean the the nvidia driver and reboot the host, its working with cuda50 now, let´s wait few hours to see what happening.
ID: 1397637 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1397654 - Posted: 1 Aug 2013, 18:10:33 UTC - in response to Message 1397637.  
Last modified: 1 Aug 2013, 18:12:16 UTC

The Hdd imanager was the ASUS disk unlocker that allow the use of the 3GB HDD.

I made the tests i just don´t know how to put the screen on this forum but that say´s

Intel(R) B75 Express Chipset LPC COntroller - 1E49

Version Driver: 9.3.0.1011
Date: 26/08/2011

Allready check the driver update buttom and says that´s is the latest avaiable driver.

Can´t check the Bios version of the host now but will do that ASAP.

I clean the the nvidia driver and reboot the host, its working with cuda50 now, let´s wait few hours to see what happening.


Great. Yep bet to know 'which fix' does it ;)

In the long run yeah I'd call those Intel drivers pretty old, but if they are stable & all match then no need to update. If on the other hand you get occasional 'strange' issues, then it'll be worth looking at.

With the Cuda Computecache, if there ever was a bad or boxed card driver on the system then really that should be cleared. If we see it more than once a year I could mention it to nVidia, make a tool, AND nicely request if the Lunatics installer devs might clear that one for us when Cuda apps are (re)installed


[Edit:]
Allready check the driver update buttom and says that´s is the latest avaiable driver.
It Lies :D there are July 2013 ones ;)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1397654 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1397656 - Posted: 1 Aug 2013, 18:14:09 UTC - in response to Message 1397654.  

If it's a possible issue, then clearing that cache when doing a new opti install would be a worthy addition.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1397656 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1397662 - Posted: 1 Aug 2013, 18:25:53 UTC - in response to Message 1397622.  



Totally clean is a big word these days, there is a hidden compute cache maintined by the drivers :D.
- Stop Boinc
- Paste %AppData%\Roaming\NVIDIA\ComputeCache into the address bar of a file browser window.
- If there are folders in there like 0,1,2... & index, delete them all & reboot


I did that and it kept telling me that it could not locate C:\Users\arkayn\AppData\Roaming\Roaming|NVIDIA\ComputeCache

Eliminating the \Roaming\ part does take me to the correct folder.

ID: 1397662 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1397663 - Posted: 1 Aug 2013, 18:29:12 UTC - in response to Message 1397662.  
Last modified: 1 Aug 2013, 18:41:28 UTC



Totally clean is a big word these days, there is a hidden compute cache maintined by the drivers :D.
- Stop Boinc
- Paste %AppData%\Roaming\NVIDIA\ComputeCache into the address bar of a file browser window.
- If there are folders in there like 0,1,2... & index, delete them all & reboot


I did that and it kept telling me that it could not locate C:\Users\arkayn\AppData\Roaming\Roaming|NVIDIA\ComputeCache

Eliminating the \Roaming\ part does take me to the correct folder.


I'm guessing the difference there might be something to do with Home Vs Pro Editions.

[Edit:] or more likely a copy/paste error on my part ::S, well at least Juan seems to have found it!
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1397663 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1397666 - Posted: 1 Aug 2013, 18:46:30 UTC - in response to Message 1397654.  


[Edit:]
Allready check the driver update buttom and says that´s is the latest avaiable driver.
It Lies :D there are July 2013 ones ;)

Where can i find it? If we can´t trust on the update bottom it´s complicated...

ID: 1397666 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1397672 - Posted: 1 Aug 2013, 19:04:23 UTC - in response to Message 1397666.  
Last modified: 1 Aug 2013, 19:06:15 UTC


[Edit:]
Allready check the driver update buttom and says that´s is the latest avaiable driver.
It Lies :D there are July 2013 ones ;)

Where can i find it? If we can´t trust on the update bottom it´s complicated...


yep, sure is a bit of a pain.

If they're suspect, to force update on an existing system You need to:
- download the Intel inf update utility in ZIP form (not the installer) for your platform[It's listed in the drivers section0

-unzip the contents to something like c:\inteldrv
-Run the setup.exe (as the installer would have)

Then (Key part)
-Manually for each 'update driver', like 'Chipset', 'PCI Express...', 'SMBus Controller...', by telling it not to search automatically, but use 'Browse my computer for driver software', tell it 'C:\inteldrv\All', then 'Next'


There is a similar (but different) process detailed in the extracted readme, bt it's oriented to doing the process immediately after OS Install, so the device names are different.

Whether you really need to do this is up to you...but I do, because systems I've seen prior to chipset driver updates include some very strange things like unexplained system freezes (total).
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1397672 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1397720 - Posted: 1 Aug 2013, 20:56:27 UTC
Last modified: 1 Aug 2013, 20:56:50 UTC

All crash again after few hours of cuda50 normal working, back to cuda42 to see what happening. Since AP crunching still OK i don´t belive is something wrong with the GPU itself. I check the temp and max temp was 71 C, normal here. No one uses the hosts all the time of the test, so in theory no other program is running on the host at that time that could crash the driver, just the normal win 7/64 stuff.

For now i change the data directory to the first partition to see if the problem is some conflict with the Asus Disk Unlocker.

Will need to wait a lot for the server start to send MB WU again to this hosts due the error.

My ideia is to leave crunching cuda42 for about a day and see if OK then try with cuda50 again to check this theory.

If nothing works the i will try to Dl and install the chipset driver, but i´m not so confident i know how to do that.

ANy other sugestion? Besides go to drink some beer?
ID: 1397720 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1397901 - Posted: 2 Aug 2013, 13:14:27 UTC
Last modified: 2 Aug 2013, 13:15:19 UTC

Update:

- Cuda32 works for all night without problems.

- A made a search on my others hosts who have the similar MB/chipset, in all the driver is the same and as they run with the same 670 EVGA FTW(they are actualy twins with the same OS/memory,PSU, etc) and works ok, i belive is unlikely the source of the problem could be the B75 driver.

- I belive the source of the problem must be something that is instaled on the problematic host and not in the others.

- I will start to check by the Asus Disk Unlocker if cuda50 works let´s see what happens.
ID: 1397901 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1398278 - Posted: 3 Aug 2013, 9:08:10 UTC
Last modified: 3 Aug 2013, 9:26:56 UTC

Update 2

- After about 1/2 day working with cuda50...

- I think the problem is realy an incompatibility of cuda50 with the ASUS disk Unlocker program, after i change the partition where boinc is running the host is crunching cuda50 with no problem.

Lets try to explain how the problem appears so somebody else could confirm that.

In order to be used by win 7/64 the 3 TB hdd on the host is partitioned by the Asus prog in 3 diferent drives due the 2 TB partition limit of the win itself. SO the drive works as: C with 250GB, D with 1.75TB & e: with 750GB.

Originaly i install the boinc data directory on the drive e: as e:\boinc\ and all working fine with a 580 installed and running cuda42 for months.

Then i put a new 670 in the place of the 580 and by running the lunnatics instaler allow to crunch with cuda50 as expected because it´s a Kepler class GPU (i use a similar configuration/instalation on some of my others hosts). Of course i made a new clean instalation of the driver and boinc/lunatics

Sudenly after that,after some time of normal crunching, with no warning, all the cuda work on the host crashes after few seconds of the start (3-4 secs).

I try all procedures listed on this thread in the host (less the chipset manualy update, i´m not confident on my habilities to do that) including an exchange of the GPU it self (i have few 670 of the same model).

And nothing works, the host continuous refuse to work stable with cuda50 but runs without problem with cuda42.

As i have few other "twins" of this host runing with the same MB/GPU configuraiton, i try to imagine what is diferent form one to the other, and i remember the problematic hosts is the only who use a 3TB HD that need to be partitioned diferent due the 2TB limit.

Trying to find a "gost", i made a simple test, copy the entire boinc data directory form the e: drive to the c: drive (just a simple copy & paste nothing special) and reinstall the boinc itself, now by enable it to use the c:\boinc\ as the data directory.

Restart the boinc and cuda42 works as expected, then after few hours i enable the cuda50 again (by reinstaling the lunnatics) then simply ALL START´s TO WORK FINE, the host is cruncing with cuda50 for about 1/2 day with no error.

That could be a hell of a coincidence but misteriously the "bug" dissapears.

That´s rises my questions: That is possible? The Asus Disk Unlocker could crash the cuda50 work? Anyone else experiment a similar issue? Or all not pass of a simple hell of a coincidence and the bug will be reapear in the future?

Jason our cuda guru, any possible explanation? or sugestion?
ID: 1398278 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Ton´s of error´s - Solved? - Possible cause Asus Disk Unlocker incompatibility with cuda50


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.