OpenCL AstroPulse crash after processing completion - write here.

Message boards : Number crunching : OpenCL AstroPulse crash after processing completion - write here.
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next

AuthorMessage
Profile trader
Volunteer tester

Send message
Joined: 25 Jun 00
Posts: 126
Credit: 4,968,173
RAC: 0
United States
Message 1346119 - Posted: 13 Mar 2013, 11:49:04 UTC - in response to Message 1346106.  

ok will try intel gpu now just have to figure out where to turn it on at
I RTFM and it was WYSIWYG then i found out it was a PEBKAC error
ID: 1346119 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1346169 - Posted: 13 Mar 2013, 14:15:58 UTC - in response to Message 1346119.  

ok will try intel gpu now just have to figure out where to turn it on at

Be ready that Intel's GPU too low compared with AMD's APU GPU part. Though one can saw this before from independent test of graphics part.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1346169 · Report as offensive
Profile trader
Volunteer tester

Send message
Joined: 25 Jun 00
Posts: 126
Credit: 4,968,173
RAC: 0
United States
Message 1346308 - Posted: 13 Mar 2013, 21:02:02 UTC - in response to Message 1346169.  

ok not finding where to turn on the intel gpu computing.
I RTFM and it was WYSIWYG then i found out it was a PEBKAC error
ID: 1346308 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1346322 - Posted: 13 Mar 2013, 21:50:15 UTC - in response to Message 1346308.  

ok not finding where to turn on the intel gpu computing.

This thread : http://setiathome.berkeley.edu/forum_thread.php?id=70717
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1346322 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1347796 - Posted: 17 Mar 2013, 16:47:18 UTC - in response to Message 1344222.  
Last modified: 17 Mar 2013, 16:51:06 UTC

New record today. It's been 10 days since installing driver 11.12 with AP r1316 on my 6850, and still no Exit Code -1073741819 (0xc0000005) Error. If you add the time since the last error, it's been three weeks; Error AstroPulse v6 tasks for computer 6797524, it averages around 30-35 tasks a day. I'm still running one at a time with the settings -unroll 12 -ffa_block 8192 -ffa_block_fetch 2048 -sbs 156 -hp and BOINC 7.0.52. Sometimes it hits 95% GPU load, sometimes it struggles along around 60% with the heavily blanked tasks. It averages around 85% on the rest.
ID: 1347796 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1348316 - Posted: 19 Mar 2013, 0:34:54 UTC - in response to Message 1347796.  

I knew it... The streak ends at 10 days. You might want to check this one out, I ran it twice. Most of them don't give the ERROR Twice in a row. This is the first one that has Erred Out twice in a row, Workunit 1189905156.
ID: 1348316 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1372249 - Posted: 26 May 2013, 20:20:17 UTC

Here's another task that caused my 6850 to give the (0xc0000005) Error Twice in a row. As said previously, getting this Error Twice in a row on the same task is very rare. Whatever is causing this Error is pronounced with this particular task. After the second Error, I assigned the task to the CPUs, see how it likes that...
Workunit 1250900999
3009864672 	6955881 	22 May 2013, 21:22:13 UTC 	22 May 2013, 22:46:22 UTC 	Aborted by user 	0.00 	0.00 	--- 	AstroPulse v6 v6.04 (ati_opencl_100)
3009864673 	4895485 	22 May 2013, 21:22:10 UTC 	23 May 2013, 0:49:54 UTC 	Completed, validation inconclusive 	349.43 	349.43 	pending 	AstroPulse v6 v6.04 (opencl_nvidia_100)
3010063313 	6699905 	23 May 2013, 0:56:27 UTC 	26 May 2013, 17:15:40 UTC 	Completed, validation inconclusive 	28,974.69 	28,971.89 	pending 	AstroPulse v6 Anonymous platform (CPU)
3015902578 	6797524 	26 May 2013, 17:15:48 UTC 	20 Jun 2013, 17:15:48 UTC 	In progress...
ID: 1372249 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1378876 - Posted: 9 Jun 2013, 9:56:12 UTC

I've already posted what works for me. Run one AP Task at a time and set the parameters as high as you can to achieve 90-95% GPU usage. For my AMD 6850 the settings are;
-unroll 12 -ffa_block 10240 -ffa_block_fetch 5120 -sbs 256 -hp



I receive very few of those errors using that setting. If you receive the 'out of resources' error, lower the setting. It also helps to run a Display Driver close to the version the App was built for. Out of the 4 cards I have, only the 6850 has a long history of those errors.

I decided to move this post back to a thread about AstroPulse Errors. This was my solution after all the time & energy I spent on this Error. Nothing else in this thread worked for this problem. The above quote is what worked, and continues to work for me.
ID: 1378876 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1379028 - Posted: 9 Jun 2013, 17:44:15 UTC
Last modified: 9 Jun 2013, 17:48:30 UTC

OK, I have been having a discussion about error problems with AP GPU tasks in the Lunatics installer release thread. It was suggested that I bring the discussion here instead.
I am using the r1843 app on NV GPUs as distributed by Lunatics. I have made no changes to any parameter supplied with it. The app is running just fine on most of my rigs, but several of them are not doing so well with it.

I have updated NV drivers, tried freeing xtra cores, setting the app priority to high with Fred's priority tool, setting count to 1 instead of .5, ...ect. You could peruse my posts in the installer release thread for more details about what I have tried.

I am still getting errors. Here are some recent examples.
Host 3480243
Host 2645052
Host 2353446

Is there a newer version I should try? Most of the discussion in this thread seems to be aimed at ATI cards. I want to be able to set my count back to .5 to properly load these NV GPUs.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1379028 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1379087 - Posted: 9 Jun 2013, 19:02:03 UTC - in response to Message 1379028.  

Mark, I suggest you try BOINC 7.0.64 on a host which is getting those errors using 6.10.58. Dr. Anderson tries to keep the API between applications and BOINC backward compatible, but nearly 3 years of accumulated changes may be contributing to the issue.
                                                                   Joe
ID: 1379087 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1379088 - Posted: 9 Jun 2013, 19:06:01 UTC - in response to Message 1379087.  

Mark, I suggest you try BOINC 7.0.64 on a host which is getting those errors using 6.10.58. Dr. Anderson tries to keep the API between applications and BOINC backward compatible, but nearly 3 years of accumulated changes may be contributing to the issue.
                                                                   Joe

Ouch...
That would be a big nut for me to swallow, Joe. I run the same Boinc on all 9 rigs, know how it works and how it reacts. And it still confuses me why the app would run fine on 5 or 6 of my rigs and not the others.

I shall have to ponder that one.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1379088 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1379100 - Posted: 9 Jun 2013, 19:27:20 UTC - in response to Message 1379088.  

One thing to consider is that none of your
computers are exactly alike. There must be
some item that's common to the working ones
that are not shared by the non-working ones.

What that might be is going to be devilishly
hard to track down, IMO.

If I were doing the trouble shooting I would
only work on one machine at a time. Doing
changes to multiple machines at the same time
is too confusing not to mention it might take
different/individual fixes per machine to get
them all working satisfactorily.
ID: 1379100 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1379103 - Posted: 9 Jun 2013, 19:44:29 UTC - in response to Message 1379100.  
Last modified: 9 Jun 2013, 19:50:45 UTC

I know.
I have been at this a while.
And I don't recall ever having this much grief with a Seti application, optimized or otherwise.
Not something that would work on one rig and not on another which are very similar in capabilities.

It's not like I am mixing different OSs, different versions of Boinc, different drivers, or even in some cases different GPUs. I have always tried to be as consistent on all rigs as possible. Otherwise the farm can get a bit hard to manage.

I'm gonna have to ponder what to do next.
I could regroup the rigs that are not liking the AP app to one venue and opting out of AP work on them.
But to me, that is a bandaid, not a solution to the problem with the app.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1379103 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1379108 - Posted: 9 Jun 2013, 19:50:55 UTC - in response to Message 1379103.  

I know.
I have been at this a while.
And I don't recall ever having this much grief with a Seti application, optimized or otherwise.


First time for everything.

http://www.youtube.com/watch?v=Ylv1_FEXvLM
ID: 1379108 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1379265 - Posted: 10 Jun 2013, 8:29:38 UTC

Bug that causes crash after scientific computations already completed is transient one (that makes it especially hard to catch). Currently I think it's something that going wrong in BOINC own API cause after completion too little app has to do besides calling boinc_finish(). Also, crash dumps I saw so far supports this point of view, call stack somewhere inside boinc finish call.
Because of another BOINC API bugfix there will be updated builds for GPU apps (cause BOINC API calls linked inside app, not just called from some BOINC DLL).
Maybe those updated builds will have this issue fixed too.

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1379265 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1379266 - Posted: 10 Jun 2013, 8:47:55 UTC - in response to Message 1379028.  
Last modified: 10 Jun 2013, 8:48:06 UTC


Is there a newer version I should try? Most of the discussion in this thread seems to be aimed at ATI cards. I want to be able to set my count back to .5 to properly load these NV GPUs.

Await debug build.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1379266 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1379267 - Posted: 10 Jun 2013, 9:01:34 UTC - in response to Message 1378876.  


I decided to move this post back to a thread about AstroPulse Errors. This was my solution after all the time & energy I spent on this Error. Nothing else in this thread worked for this problem. The above quote is what worked, and continues to work for me.


Good that some workaround could be found. Of course it's not "proper solution".
Let's try another round of debugging for this issue in next few days.

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1379267 · Report as offensive
Ibo

Send message
Joined: 22 Mar 00
Posts: 6
Credit: 6,075,931
RAC: 0
Austria
Message 1379270 - Posted: 10 Jun 2013, 9:35:33 UTC

ID: 1379270 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1379280 - Posted: 10 Jun 2013, 10:23:49 UTC - in response to Message 1379270.  

I think I might have got 2 errors:

http://setiathome.berkeley.edu/result.php?resultid=3032148687
http://setiathome.berkeley.edu/result.php?resultid=3031511108

Anything else to do other then wait?


If error happens too often suspend and opt out from NV AP tasks.
Maybe worth to try latest BOINC build if still not on the last.

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1379280 · Report as offensive
Ibo

Send message
Joined: 22 Mar 00
Posts: 6
Credit: 6,075,931
RAC: 0
Austria
Message 1379290 - Posted: 10 Jun 2013, 10:52:48 UTC - in response to Message 1379280.  

BOINC is latest version. I even have 2 cores "free" of work for a single GPU.
ID: 1379290 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next

Message boards : Number crunching : OpenCL AstroPulse crash after processing completion - write here.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.