Inconclusive Work Units Running AP Ver 6

Message boards : Number crunching : Inconclusive Work Units Running AP Ver 6
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1224535 - Posted: 28 Apr 2012, 19:18:25 UTC

My inconclusive 30/30 came in and validated http://setiathome.berkeley.edu/workunit.php?wuid=976828693


PROUD MEMBER OF Team Starfire World BOINC
ID: 1224535 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1224913 - Posted: 29 Apr 2012, 15:02:35 UTC - in response to Message 1221680.  

Got another one. Just reported it and was doing my spreadsheet and noticed CBNC on the task page for it whilst plugging in numbers.

http://setiathome.berkeley.edu/result.php?resultid=2392793864

This one of mine is still being decided. It has been sent out to _5 now. Three computing errors so far.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1224913 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1224927 - Posted: 29 Apr 2012, 16:07:43 UTC - in response to Message 1224913.  
Last modified: 29 Apr 2012, 16:51:04 UTC

Got another one. Just reported it and was doing my spreadsheet and noticed CBNC on the task page for it whilst plugging in numbers.

http://setiathome.berkeley.edu/result.php?resultid=2392793864

This one of mine is still being decided. It has been sent out to _5 now. Three computing errors so far.


Just done 5 AstroPulse, v6 WU's on ATI 5870 GPU's, 2 still are being computed, also "playing with UNROLL and DATA_CHUNK" settings. They aren't UPLoaded, yet.

AstroPulse WU's.
This I7-
2600+ 2 EAH5870 GPU's, host.


I just UPPED ffa_block & _fetch, to 10240 and fetch 5120, Unroll
is 16.
ID: 1224927 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1225884 - Posted: 1 May 2012, 22:35:50 UTC

Hi,

I seem to have got an inconclusive result (not astropulse) using opti apps.

http://setiathome.berkeley.edu/workunit.php?wuid=981712304

Do not know if you need these.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1225884 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1226966 - Posted: 4 May 2012, 1:57:57 UTC
Last modified: 4 May 2012, 1:58:27 UTC

This isn't inconclusive, but these five tasks (all CPU AP's) all failed with the same -226 error code (Too many exits). These are on a VERY reliable machine, and other tasks running simultaneously were running normally, as have subsequent AP's on the same machine. Thought I'd mention it for, if nothing else, curiosity as to what the heck happened.

http://setiathome.berkeley.edu/results.php?hostid=5457097&offset=0&show_names=0&state=5&appid=12
ID: 1226966 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1226997 - Posted: 4 May 2012, 3:44:10 UTC - in response to Message 1226966.  

This isn't inconclusive, but these five tasks (all CPU AP's) all failed with the same -226 error code (Too many exits). These are on a VERY reliable machine, and other tasks running simultaneously were running normally, as have subsequent AP's on the same machine. Thought I'd mention it for, if nothing else, curiosity as to what the heck happened.

http://setiathome.berkeley.edu/results.php?hostid=5457097&offset=0&show_names=0&state=5&appid=12

Each one has 227 "No heartbeat from core client for 30 sec - exiting" lines, the 6.10.58 core client logic kills the task if there are over 100 between checkpoints. Unless something went wrong with the computer's time, 100 times 30 seconds means there was no checkpoint written in 50 minutes. And 227 times 30 seconds says the problem went on for nearly 2 hours. Each of the tasks progressed over 9 percent from the first occurrence to the last, and should have checkpointed at whatever interval you have specified. The r557 CPU build calls the checkpoint logic very frequently, though files are only updated as needed.

Looking in your pending list for other tasks reported 1 May 2012 21:01:33 UTC I found many x41g MB tasks which also had no heartbeat exits, though of course they make so much progress in 30 seconds that the 100 limit couldn't be reached before the task finishes.

None of that defines what caused the problem, of course. The BOINC time-tagged messages would give a better picture of the time period it lasted, and perhaps Windows' event log might indicate something too.
                                                                  Joe


ID: 1226997 · Report as offensive
Stick Project Donor
Volunteer tester

Send message
Joined: 26 Feb 00
Posts: 100
Credit: 5,283,449
RAC: 5
United States
Message 1227340 - Posted: 4 May 2012, 19:54:27 UTC

Here's another one: wuid=983484154. Mine is the Anonymous platform (CPU) task.

Also had a "-226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS" task. I only mention it because of an earlier post here and because it happened on the same host which was having the problems I described in the Catalyst 12.4 and Lunatics 6.10 (ati13ati) thread - but after I reverted back to Catalyst 12.1.
ID: 1227340 · Report as offensive
Stick Project Donor
Volunteer tester

Send message
Joined: 26 Feb 00
Posts: 100
Credit: 5,283,449
RAC: 5
United States
Message 1228416 - Posted: 6 May 2012, 20:00:46 UTC - in response to Message 1227340.  

Here's another one: wuid=983484154. Mine is the Anonymous platform (CPU) task.

UPDATE: The tie-breaker has come in and all three validated.
ID: 1228416 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1228430 - Posted: 6 May 2012, 20:35:01 UTC - in response to Message 1228416.  
Last modified: 6 May 2012, 21:19:07 UTC

Here's another one: wuid=983484154. Mine is the Anonymous platform (CPU) task.

UPDATE: The tie-breaker has come in and all three validated.


I tried a lot of different values for UNROLL Data Chunk and FFA_Data_
Chunk & _Fetch
, for the AstroPulse rev.555 optimized build.
Also used (3:1) different values, not a multiple of 2, but 3 and 5.
CPU=I7-2600+ 2 EAH5870 GPU's; BOINC 7.0.25; WIN 7, both x64
8 GByte DDR3 1600MHz, dedicated BOINC Data Drive; LUNATICs v0.40 Installer, AMD/ATI Cat.12.4 driver. It were the earlier 11.4/11.5/11.6 OpenCL 1.1, causing
crashes, so I tried cat.12.4 {and AMD-APP(SDK)1.2(?)}OpenCL 1.2, which works OK.

<core_client_version>7.0.25</core_client_version>
<![CDATA[
<stderr_txt>
Number of app instances per device setted to:1
DATA_CHUNK_UNROLL setted to:15
FFA thread block override value:15360
FFA thread fetchblock override value:5120
Maximum single buffer size setted to:256MB
Running on device number: 0
DATA_CHUNK_UNROLL at default:15
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 0 device, slots 0 to 0 (including) will be checked
Used slot is 0;	Used GPU device parameters are:
	Number of compute units: 20
	Single buffer allocation size: 256MB
	max WG size: 256 


These figures depends on the (ATI) GPU used, i.e work group size, memory per Compute Unit, (S{ingle} P{recision} or D{ouble} P{?}, # of Compute Units, etc..

Still have some more to learn about AMD/ATI GPU's and OpenCL.....;-) These figures I found by trial and error...........?!
ID: 1228430 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1228446 - Posted: 6 May 2012, 21:18:44 UTC

You are mixing things up here.

Number of CU´s yes.
Memory not really.
No DP needed atm.



With each crime and every kindness we birth our future.
ID: 1228446 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1228448 - Posted: 6 May 2012, 21:25:53 UTC - in response to Message 1228446.  
Last modified: 6 May 2012, 21:42:27 UTC

You are mixing things up here.

Number of CU´s yes.
Memory not really.
No DP needed atm.


It's about rev.557 and CPU fall-back, while I'm using rev.555.
I know only Single Precision is needed and a 256MByte Work Group Size is preferred.
Memory (, per Compute Unit) is a different case, of which I do not know it's
impact).
But you're right, this has nothing todo, with the thread-subject, my bad.
ID: 1228448 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1228477 - Posted: 6 May 2012, 22:38:41 UTC - in response to Message 1224913.  

Got another one. Just reported it and was doing my spreadsheet and noticed CBNC on the task page for it whilst plugging in numbers.

http://setiathome.berkeley.edu/result.php?resultid=2392793864

This one of mine is still being decided. It has been sent out to _5 now. Three computing errors so far.

This one finally validated. Three-way granted credit.

ap_25ja12ad_B1_P1_00393_20120410_27293 worked out fine against stock this time.


Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1228477 · Report as offensive
OTS
Volunteer tester

Send message
Joined: 6 Jan 08
Posts: 369
Credit: 20,533,537
RAC: 0
United States
Message 1228494 - Posted: 6 May 2012, 23:19:12 UTC

I trust that when there are enough reports that someone will so indicate or lock the thread. I haven't seen any indication of the former so with that in mind, I have another inconclusive going with a Linux app. It is Task 2394877085 and I am running ap_6.01r546_sse3_linux32. It looks like both of the inconclusive results reported 30/30.
ID: 1228494 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1228634 - Posted: 7 May 2012, 11:36:30 UTC - in response to Message 1228494.  

I trust that when there are enough reports that someone will so indicate or lock the thread. I haven't seen any indication of the former so with that in mind, I have another inconclusive going with a Linux app. It is Task 2394877085 and I am running ap_6.01r546_sse3_linux32. It looks like both of the inconclusive results reported 30/30.


Im sure Richard Haselgrove or one of the testers will notify a mod to lock the thread when they have enough data, Or find the problem and or solution.
[/quote]

Old James
ID: 1228634 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1228644 - Posted: 7 May 2012, 12:15:04 UTC - in response to Message 1228634.  

I trust that when there are enough reports that someone will so indicate or lock the thread. I haven't seen any indication of the former so with that in mind, I have another inconclusive going with a Linux app. It is Task 2394877085 and I am running ap_6.01r546_sse3_linux32. It looks like both of the inconclusive results reported 30/30.

Im sure Richard Haselgrove or one of the testers will notify a mod to lock the thread when they have enough data, Or find the problem and or solution.

We have a possible handle on one possible cause of the problem, but weekends (and, in some parts of the world, holidays) get in the way of testing and deploying solutions. Development work - especially when a full test against the stock CPU application can take well over 24 hours - is a slow business.

I think we've probably got enough reports from this application now, thank you. Once a few more reports have come in on the bugfix version that's being tested now, it'll move to full online testing at Beta - whether under anonymous platform, or as a new 'stock' test, we'll have to wait for Eric to decide. He won't be in the lab yet.....
ID: 1228644 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1228786 - Posted: 7 May 2012, 20:33:44 UTC - in response to Message 1228644.  

He won't be in the lab yet.....

He is now. A new Beta test for AP on OpenCL cards (both ATI and NVidia) has just started at Beta.

Testers - especially those who encountered the inconclusive validations last time - please report for duty :-)
ID: 1228786 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1230059 - Posted: 10 May 2012, 16:45:54 UTC
Last modified: 10 May 2012, 16:47:29 UTC

I don't know how it went this long without me noticing this one.. and I don't know/remember if I mentioned it, but I've got this one, too..

http://setiathome.berkeley.edu/workunit.php?wuid=967360052

Still waiting for _3 to return it, so you should have at least 24 hours to capture it. edit: unless the files get deleted immediately after it validates. Didn't think about that part.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1230059 · Report as offensive
Profile X-Files 27
Avatar

Send message
Joined: 17 May 99
Posts: 104
Credit: 111,191,433
RAC: 0
Canada
Message 1231707 - Posted: 13 May 2012, 20:24:10 UTC

Don't know whats wrong with this:
2428927751

This rig has been crunching for awhile now without issues before this.
ID: 1231707 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1231713 - Posted: 13 May 2012, 20:31:56 UTC - in response to Message 1231707.  
Last modified: 13 May 2012, 20:39:11 UTC

Don't know whats wrong with this:
2428927751

This rig has been crunching for awhile now without issues before this.

Try using the r1305 app (aka 6.03) and AstroPulse_Kernels_r1305.cl from Seti Beta, that has the fix.

http://boinc2.ssl.berkeley.edu/beta/download/astropulse_6.03_windows_intelx86__opencl_nvidia_100.exe

http://boinc2.ssl.berkeley.edu/beta/download/astropulse_6.03_windows_intelx86__opencl_nvidia_100.pdb

http://boinc2.ssl.berkeley.edu/beta/download/AstroPulse_Kernels_r1305.cl

Claggy
ID: 1231713 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34256
Credit: 79,922,639
RAC: 80
Germany
Message 1231737 - Posted: 13 May 2012, 21:05:19 UTC
Last modified: 13 May 2012, 21:05:31 UTC

Thats in case of the nvidia bug.
1305 has a fix.

ATI´s dont have this issue.


With each crime and every kindness we birth our future.
ID: 1231737 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : Inconclusive Work Units Running AP Ver 6


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.