Access Violation with AstroPulse v6.04

Message boards : Number crunching : Access Violation with AstroPulse v6.04
Message board moderation

To post messages, you must log in.

AuthorMessage
Brian Priebe

Send message
Joined: 26 Dec 11
Posts: 19
Credit: 43,663,786
RAC: 0
Canada
Message 1382973 - Posted: 20 Jun 2013, 9:23:51 UTC
Last modified: 20 Jun 2013, 9:34:15 UTC

Lately I have about 100 errors in ATI OpenCL work units all failnig with exactly the same error (e.g. work unit 1265051358):

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0040A1FA read attempt to address 0x009EC744

These units always fail after more than an hour crunching away on the GPU. Ideas?
ID: 1382973 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1382975 - Posted: 20 Jun 2013, 9:39:22 UTC - in response to Message 1382973.  

Lately I have about 100 errors in ATI OpenCL work units all failnig with exactly the same error (e.g. work unit 1265051358):

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0040A1FA read attempt to address 0x009EC744

These units always fail after more than an hour crunching away on the GPU. Ideas?

Try this;
-unroll 10 -ffa_block 6144 -ffa_block_fetch 1536

There's another thread here, OpenCL AstroPulse crash after processing completion - write here
ID: 1382975 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1383265 - Posted: 21 Jun 2013, 3:55:47 UTC

It looks like things are improving, however, this Host has an out of date Display Driver as well;
Computer 6611359
Driver version: CAL 1.4.1607
Version: OpenCL 1.1 AMD-APP-SDK-v2.5 (793.1)

I'm not familiar with Windows Server 2003 "R2", but if you can, you need to update the driver to Catalystâ„¢ Display Driver 11.12. That should help that host even more.
ID: 1383265 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 1383379 - Posted: 21 Jun 2013, 14:25:08 UTC - in response to Message 1383265.  
Last modified: 21 Jun 2013, 14:26:32 UTC

Probably want to go to a 12.x version. None of my AMD cards (4850,6950,7970) liked 11.12 particularly.

12.4 would be my choice since Server 2003 is on the XP codebase.
ID: 1383379 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1383443 - Posted: 21 Jun 2013, 18:26:56 UTC - in response to Message 1383379.  
Last modified: 21 Jun 2013, 18:30:38 UTC

Probably want to go to a 12.x version. None of my AMD cards (4850,6950,7970) liked 11.12 particularly.

12.4 would be my choice since Server 2003 is on the XP codebase.

Yes, Server 2003 is on the XP codebase and you can't update XP past 12.1 and still have working OpenCL in XP. Can you give an example of 12.4 in XP with working OpenCL? Also, AstroPulse 1812 and above require SDK 2.6. In XP, SDK 2.6 begins with Catalyst 11.12 and ends with 12.1. You really don't have much of a choice. I'm having the same Error as the OP, and I've had better success with 11.12 with my Barts 6850, but, you're welcome to try 12.1 if you wish. 12.2 and above will not work for OpenCL in XP.
ID: 1383443 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 1383451 - Posted: 21 Jun 2013, 18:47:28 UTC - in response to Message 1383443.  
Last modified: 21 Jun 2013, 18:49:20 UTC

OOPS.... My bad.

You are correct, 12.1 is as high as you can go without installing the 2.6 SDK separately.

At 12.4 it's game over for XP.
ID: 1383451 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1383455 - Posted: 21 Jun 2013, 18:56:56 UTC - in response to Message 1383451.  
Last modified: 21 Jun 2013, 19:00:17 UTC

OOPS.... My bad.

You are correct, 12.1 is as high as you can go without installing the 2.6 SDK separately.

At 12.4 it's game over for XP.

I haven't had any version above 12.1 work with OpenCL even with manually installing the SDK. I have tried with SDK 2.6, 2.7 & 2.8. 12.1 is as high as it goes, manual SDK or not.
ID: 1383455 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 1383457 - Posted: 21 Jun 2013, 19:04:52 UTC - in response to Message 1383455.  

Hmmmm...

I'm pretty sure back in the day I had it work after installing the 2.6 SDK manually on 12.3, but went back to to an 11.x version because the driver had other problems not related to crunching on the host I was working on.

But maybe that's just time and memory working to forget an unpleasant experience. ;-)
ID: 1383457 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1383459 - Posted: 21 Jun 2013, 19:14:23 UTC - in response to Message 1383457.  

Here's what the AstroPulse Developer had to say about it;
"I know you can't go later Cat with XP but maybe it's worth to try lower ones like 11.12 ?"

I tried 11.12, and it did seem to give a few less Errors than 12.1. It offers no other advantages that I'm aware of.
ID: 1383459 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1383467 - Posted: 21 Jun 2013, 19:59:55 UTC
Last modified: 21 Jun 2013, 20:01:55 UTC

-1073741819 (0xffffffffc0000005) Unknown error number


Saw this error on my own machine a while back. IIRC you'll need to free up CPU cores to feed the GPU for that one.

Was the BOINC manager set to run as a service?


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1383467 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1383471 - Posted: 21 Jun 2013, 20:10:51 UTC - in response to Message 1383467.  
Last modified: 21 Jun 2013, 20:13:24 UTC

Freeing a CPU core has Absolutely NO effect for those having a large number of those Errors, including Myself. What does work is what I've been suggesting for the last couple of months. We now have two recent cases where Hosts having a large number of these Errors were 'fixed' by adding Parameters to the Text Commands. I suggest people pay attention to what actually works.

-unroll 10 -ffa_block 6144 -ffa_block_fetch 1536

All tasks for computer 6611359
ID: 1383471 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51542
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1383473 - Posted: 21 Jun 2013, 20:13:11 UTC - in response to Message 1383471.  

Freeing a CPU has Absolutely NO effect for those having a large number of those Errors, including Myself. What does work is what I've been suggesting for the last couple of months. We now have two recent cases where Hosts having a large number of these Errors were 'fixed' by adding Parameters to the Text Commands. I suggest people pay attention to what actually works.

-unroll 10 -ffa_block 6144 -ffa_block_fetch 1536

I can testify that it worked on 3 of my rigs that were throwing errors after completion of the WU. For reasons still unknown, my other 6 rigs did not require it. But for anybody scratching their head about what to try next....do this first!
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1383473 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1383508 - Posted: 21 Jun 2013, 22:24:39 UTC

Thats work on NV?
ID: 1383508 · Report as offensive
Brian Priebe

Send message
Joined: 26 Dec 11
Posts: 19
Credit: 43,663,786
RAC: 0
Canada
Message 1383512 - Posted: 21 Jun 2013, 22:30:49 UTC

Thanks for the help. With that machine, my general approach is "if it ain't broke, don't fix it". If it didn't have 16 mostly-idle cores and a GPU, I would never have attempted to install BOINC on it. It has multiple issues: it's a DC, it runs BOINC 6.x on MS/Server which isn't officially supported, AMD doesn't officially have a driver release for MS/Server, etc.

The parameter list TBar supplied appears to be working fine. I expect it will last long enough until the OS is upgraded in a few months or until all these projects start requiring BOINC 7.x which won't install on a DC at all.
ID: 1383512 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1383517 - Posted: 21 Jun 2013, 22:48:59 UTC - in response to Message 1383508.  

Thats work on NV?


Yes, Mark is using it on all his Nvidia rigs.

ID: 1383517 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51542
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1383594 - Posted: 22 Jun 2013, 6:03:14 UTC - in response to Message 1383517.  
Last modified: 22 Jun 2013, 6:04:16 UTC

Thats work on NV?


Yes, Mark is using it on all his Nvidia rigs.

Actually, not ALL of them...
For reasons unknown, 5 rigs run OK with the Lunatics app right out of the box, with the default settings, adjusted to 2/per.
The other 4 were throwing errors at the completion of the WU, and on all of those, simply adding the parameter line straightened them right out, also running 2/per.
All of my rigs are purely nVidia....
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1383594 · Report as offensive
Brian Priebe

Send message
Joined: 26 Dec 11
Posts: 19
Credit: 43,663,786
RAC: 0
Canada
Message 1383697 - Posted: 22 Jun 2013, 17:29:20 UTC - in response to Message 1382975.  

Alas even with the script, upgrading video driver to 11.12 (also OpenCL), it bombed again on an access violation in a different place: http://setiathome.berkeley.edu/result.php?resultid=3047244746

Several other GPU AstroPulse workunits did complete successfully though.
ID: 1383697 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1383729 - Posted: 22 Jun 2013, 20:40:19 UTC - in response to Message 1383697.  

Alas even with the script, upgrading video driver to 11.12 (also OpenCL), it bombed again on an access violation in a different place: http://setiathome.berkeley.edu/result.php?resultid=3047244746

Several other GPU AstroPulse workunits did complete successfully though.

Yes, it's not a complete fix, just much better than before. When I first started using the parameters I was receiving about one error a day, which was much better than 4 errors per 1 success. The card can complete over 30 a day, so 1 out of 30+ isn't bad. Now I usually receive about 1 a week although I just received the 2nd in three days. You could bump the setting up to the next level, it might help, but probably won't. The next level is;
-unroll 10 -ffa_block 8192 -ffa_block_fetch 4096
If you try to go above those settings you will be approaching the maximum memory allotted by the driver and risk receiving an Out_Of_Resources error.
Good Luck.
ID: 1383729 · Report as offensive
Brian Priebe

Send message
Joined: 26 Dec 11
Posts: 19
Credit: 43,663,786
RAC: 0
Canada
Message 1383735 - Posted: 22 Jun 2013, 21:08:18 UTC - in response to Message 1383729.  
Last modified: 22 Jun 2013, 21:11:47 UTC

4 of out 5 completed today was certainly an improvement on 0 out of about 100...
ID: 1383735 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51542
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1383737 - Posted: 22 Jun 2013, 21:11:58 UTC - in response to Message 1383735.  

4 of out 5 completed today was certainly an improvement on 0 out of about 90...

LOL.....indeed!
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1383737 · Report as offensive

Message boards : Number crunching : Access Violation with AstroPulse v6.04


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.