Another WU victim to the CUDA error's

Message boards : Number crunching : Another WU victim to the CUDA error's
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1224084 - Posted: 27 Apr 2012, 21:07:30 UTC - in response to Message 1224001.  
Last modified: 27 Apr 2012, 21:16:27 UTC

Device enumeration seems to be bugged. Something for Raistmer to look at.

Actually, more likely to be a BOINC problem. v6.10.60 is very old to be trying this sort of tricksery with - it doesn't even schedule ATI very well. That came in with v6.12, and was extended to proper OpenCL scheduling with v7.0

I see you're running v7.0.25 on one of your other machines - perhaps you should try it on you mixed-card machine too.

And follow the advice about the driver installation order, too. I *believe* (but I'm not an authoritative source on this) that it has to be 'NVidia first, ATI second' - apparently ATI do something non-standard to support both OpenCL 1.0 and OpenCL 1.1 at the same time, so it has to be their version loaded last for their cards to work.


Yep, it is Nvidia first and then ATI.

I currently have OpenCL 1.1 on the GTX460 and OpenCL 1.2 on my HD7750.



Well, I do have trouble getting enough GPU Load. The cat. 11.4 driver crashes,
but recovers whithout rebooting.
These are ATI 5870 GPU's, 2, core-clock and memory clock drops down.(Core: 157 MHz!)

When I killed BOINC.exe & BOINCmgr.exe, 4 GPU and ofcoarse CPU, 2 cores, processes went on.......? But it trashes the WU !!!
After 3 minutes, they stopped.(Crashed/stopped)
(WIN 7, 64 BIT; BOINC 7.0.25; 64BIT, CPU=I7-2600 stock, DDR3 @ 1600MHz. GPU's
EAH 5870, 2x).
Also tried the MB version for ATI 48xx GPU's and lower, also crashes.
AVX SSE3 version for (Core I7/I5/I3)CPU, works great, >40% versus stock, I7-2600 + VLAR (0.01xx AR)

This
host and

this workunit.

ID: 1224084 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4436
Credit: 55,006,323
RAC: 0
United States
Message 1224001 - Posted: 27 Apr 2012, 17:15:04 UTC - in response to Message 1223948.  

Device enumeration seems to be bugged. Something for Raistmer to look at.

Actually, more likely to be a BOINC problem. v6.10.60 is very old to be trying this sort of tricksery with - it doesn't even schedule ATI very well. That came in with v6.12, and was extended to proper OpenCL scheduling with v7.0

I see you're running v7.0.25 on one of your other machines - perhaps you should try it on you mixed-card machine too.

And follow the advice about the driver installation order, too. I *believe* (but I'm not an authoritative source on this) that it has to be 'NVidia first, ATI second' - apparently ATI do something non-standard to support both OpenCL 1.0 and OpenCL 1.1 at the same time, so it has to be their version loaded last for their cards to work.


Yep, it is Nvidia first and then ATI.

I currently have OpenCL 1.1 on the GTX460 and OpenCL 1.2 on my HD7750.

ID: 1224001 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14114
Credit: 200,643,578
RAC: 1,983
United Kingdom
Message 1223948 - Posted: 27 Apr 2012, 14:09:28 UTC - in response to Message 1223921.  

Device enumeration seems to be bugged. Something for Raistmer to look at.

Actually, more likely to be a BOINC problem. v6.10.60 is very old to be trying this sort of tricksery with - it doesn't even schedule ATI very well. That came in with v6.12, and was extended to proper OpenCL scheduling with v7.0

I see you're running v7.0.25 on one of your other machines - perhaps you should try it on you mixed-card machine too.

And follow the advice about the driver installation order, too. I *believe* (but I'm not an authoritative source on this) that it has to be 'NVidia first, ATI second' - apparently ATI do something non-standard to support both OpenCL 1.0 and OpenCL 1.1 at the same time, so it has to be their version loaded last for their cards to work.
ID: 1223948 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1223921 - Posted: 27 Apr 2012, 11:30:22 UTC - in response to Message 1223912.  

http://setiathome.berkeley.edu/workunit.php?wuid=966603166

-12 errors on all of them sheesh.

It would be nice if this error were considered a technical glitch of CUDA processing and not an actual WU error.


I've got 131 errors piled-up, and every one is a nVidia CUDA error on guess what kind of card?

The thing that *really* has me worried and unhappy about this is that if you will look at any of them, you'll see that they all "ran normally" as far as anyone could tell. Then, of course, they piled-up in the "Pending" bin.

And I have a TON of "Waiting" and "Inconclusives" BUT several that validated.

All of this is happening on one computer where I have a mixed nVidia / ATI setup, and prior to the ATI car installation that particular computer has been stable and productive for months.

I'm going to "start-over" and reinstall all the drivers and try to get rid of the Catalyst Control Center and anything else I can find to dump. I might even do a "system restore" to a week ago and start with the ATI installation from scratch.

BUT... what the heck does the ATI card have to-do with these normal-looking but flaky Work Units? I'm guessing "nothing at all."

This EVGA card has thrown trash in the past, but it did it quickly, obviously, and loudly. No signs of heat or despair... no way to recognize it as having gone on the fritz.

I wish I loved a mystery.


Relax - nothing to do with the 560Ti as such.

Ah mixed card setup. something about the sequence of driver install iirc. and you ati app is trying to run on your Nvidia card :D That probably explains both the ATI errors - wrong card - and the NVidia errors - another app trying to grab it.

Device enumeration seems to be bugged. Something for Raistmer to look at.
Jason is going to just love that

Try a clean reinstall of the NVidia driver.

Which ATI driver did you try?

FWIW mixed card setups are tricky. Can you shift cards around so you don't have mixed hosts?
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1223921 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 90
United States
Message 1223912 - Posted: 27 Apr 2012, 10:25:23 UTC - in response to Message 1223207.  

http://setiathome.berkeley.edu/workunit.php?wuid=966603166

-12 errors on all of them sheesh.

It would be nice if this error were considered a technical glitch of CUDA processing and not an actual WU error.


I've got 131 errors piled-up, and every one is a nVidia CUDA error on guess what kind of card?

The thing that *really* has me worried and unhappy about this is that if you will look at any of them, you'll see that they all "ran normally" as far as anyone could tell. Then, of course, they piled-up in the "Pending" bin.

And I have a TON of "Waiting" and "Inconclusives" BUT several that validated.

All of this is happening on one computer where I have a mixed nVidia / ATI setup, and prior to the ATI car installation that particular computer has been stable and productive for months.

I'm going to "start-over" and reinstall all the drivers and try to get rid of the Catalyst Control Center and anything else I can find to dump. I might even do a "system restore" to a week ago and start with the ATI installation from scratch.

BUT... what the heck does the ATI card have to-do with these normal-looking but flaky Work Units? I'm guessing "nothing at all."

This EVGA card has thrown trash in the past, but it did it quickly, obviously, and loudly. No signs of heat or despair... no way to recognize it as having gone on the fritz.

I wish I loved a mystery.
ID: 1223912 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 63168
Credit: 55,293,173
RAC: 111
United States
Message 1223808 - Posted: 27 Apr 2012, 2:09:08 UTC

Yeah I get -12 errors, sometimes 1 or 2 or as many as 6 and the thing is I never know when another will show up, so I ignore them, their not worth My time and yes I do use an optimized app.
My Amazon Wishlist
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1223808 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14114
Credit: 200,643,578
RAC: 1,983
United Kingdom
Message 1223752 - Posted: 26 Apr 2012, 23:17:16 UTC - in response to Message 1223716.  

Can't help feeling that we need some code server side to ensure tasks that are in limbo due to CUDA issues are sent to CPUs for resolution....

Or ATI hosts.

I was just thinking, does this kind of multiple error also affect tasks processed by ATI cards ?
Or is it that the Lunatics has sorted the problem ?

It doesn´t affect ATI cards, never did.
Thats a cuda issue only.

And yes, the Lunatics CUDA app substantially reduces (but does not completely eliminate) the risk of errors on this class of task, compared to the stock apps.
ID: 1223752 · Report as offensive
Profile Wiggo "Democratic Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 18404
Credit: 261,360,520
RAC: 1,109
Australia
Message 1223728 - Posted: 26 Apr 2012, 22:17:48 UTC - in response to Message 1223716.  

I've also seen sort of thing happen with with CPU w/u's several times as well where a completed optimised run w/u has been tipped out that way when all the other w/u's were errored out by stock CPU app's.

Cheers.
ID: 1223728 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 32172
Credit: 79,922,639
RAC: 181
Germany
Message 1223716 - Posted: 26 Apr 2012, 22:10:19 UTC - in response to Message 1223709.  

Can't help feeling that we need some code server side to ensure tasks that are in limbo due to CUDA issues are sent to CPUs for resolution....


Or ATI hosts.


I was just thinking, does this kind of multiple error also affect tasks processed by ATI cards ?
Or is it that the Lunatics has sorted the problem ?


It doesn´t affect ATI cards, never did.
Thats a cuda issue only.

With each crime and every kindness we birth our future.
ID: 1223716 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 156
United Kingdom
Message 1223709 - Posted: 26 Apr 2012, 22:01:53 UTC - in response to Message 1223703.  
Last modified: 26 Apr 2012, 22:03:00 UTC

Can't help feeling that we need some code server side to ensure tasks that are in limbo due to CUDA issues are sent to CPUs for resolution....


Or ATI hosts.


I was just thinking, does this kind of multiple error also affect tasks processed by ATI cards ?
Or is it that the Lunatics has sorted the problem ?
ID: 1223709 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 32172
Credit: 79,922,639
RAC: 181
Germany
Message 1223703 - Posted: 26 Apr 2012, 21:42:23 UTC - in response to Message 1223695.  

Can't help feeling that we need some code server side to ensure tasks that are in limbo due to CUDA issues are sent to CPUs for resolution....


Or ATI hosts.

With each crime and every kindness we birth our future.
ID: 1223703 · Report as offensive
Profile Area 51
Avatar

Send message
Joined: 31 Jan 04
Posts: 965
Credit: 42,193,520
RAC: 0
United Kingdom
Message 1223695 - Posted: 26 Apr 2012, 21:08:27 UTC

Can't help feeling that we need some code server side to ensure tasks that are in limbo due to CUDA issues are sent to CPUs for resolution....
ID: 1223695 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14114
Credit: 200,643,578
RAC: 1,983
United Kingdom
Message 1223693 - Posted: 26 Apr 2012, 21:04:04 UTC - in response to Message 1223669.  

Yes, it's becoming more and more normalish, as more and more people get hold of cuda-class GPUs.
ID: 1223693 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 32172
Credit: 79,922,639
RAC: 181
Germany
Message 1223692 - Posted: 26 Apr 2012, 21:03:48 UTC

I wouldn´t call it normal but it happens when work is only sent to cuda hosts.
But i would call it rare.

With each crime and every kindness we birth our future.
ID: 1223692 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 777
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1223669 - Posted: 26 Apr 2012, 20:09:07 UTC

Found an interesting invalid workunit here - is this normalish:

http://setiathome.berkeley.edu/workunit.php?wuid=971749771
ID: 1223669 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 137
Yemen
Message 1223207 - Posted: 25 Apr 2012, 17:11:52 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=966603166

-12 errors on all of them sheesh.

It would be nice if this error were considered a technical glitch of CUDA processing and not an actual WU error.


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1223207 · Report as offensive

Message boards : Number crunching : Another WU victim to the CUDA error's


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.