Message boards :
Number crunching :
Inconclusive Work Units Running AP Ver 6
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next
Author | Message |
---|---|
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
My inconclusive 30/30 came in and validated http://setiathome.berkeley.edu/workunit.php?wuid=976828693 PROUD MEMBER OF Team Starfire World BOINC |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Got another one. Just reported it and was doing my spreadsheet and noticed CBNC on the task page for it whilst plugging in numbers. This one of mine is still being decided. It has been sent out to _5 now. Three computing errors so far. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Got another one. Just reported it and was doing my spreadsheet and noticed CBNC on the task page for it whilst plugging in numbers. Just done 5 AstroPulse, v6 WU's on ATI 5870 GPU's, 2 still are being computed, also "playing with UNROLL and DATA_CHUNK" settings. They aren't UPLoaded, yet. AstroPulse WU's. This I7- 2600+ 2 EAH5870 GPU's, host. I just UPPED ffa_block & _fetch, to 10240 and fetch 5120, Unroll is 16. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, I seem to have got an inconclusive result (not astropulse) using opti apps. http://setiathome.berkeley.edu/workunit.php?wuid=981712304 Do not know if you need these. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0 |
This isn't inconclusive, but these five tasks (all CPU AP's) all failed with the same -226 error code (Too many exits). These are on a VERY reliable machine, and other tasks running simultaneously were running normally, as have subsequent AP's on the same machine. Thought I'd mention it for, if nothing else, curiosity as to what the heck happened. http://setiathome.berkeley.edu/results.php?hostid=5457097&offset=0&show_names=0&state=5&appid=12 |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
This isn't inconclusive, but these five tasks (all CPU AP's) all failed with the same -226 error code (Too many exits). These are on a VERY reliable machine, and other tasks running simultaneously were running normally, as have subsequent AP's on the same machine. Thought I'd mention it for, if nothing else, curiosity as to what the heck happened. Each one has 227 "No heartbeat from core client for 30 sec - exiting" lines, the 6.10.58 core client logic kills the task if there are over 100 between checkpoints. Unless something went wrong with the computer's time, 100 times 30 seconds means there was no checkpoint written in 50 minutes. And 227 times 30 seconds says the problem went on for nearly 2 hours. Each of the tasks progressed over 9 percent from the first occurrence to the last, and should have checkpointed at whatever interval you have specified. The r557 CPU build calls the checkpoint logic very frequently, though files are only updated as needed. Looking in your pending list for other tasks reported 1 May 2012 21:01:33 UTC I found many x41g MB tasks which also had no heartbeat exits, though of course they make so much progress in 30 seconds that the 100 limit couldn't be reached before the task finishes. None of that defines what caused the problem, of course. The BOINC time-tagged messages would give a better picture of the time period it lasted, and perhaps Windows' event log might indicate something too. Joe |
Stick Send message Joined: 26 Feb 00 Posts: 100 Credit: 5,283,449 RAC: 5 |
Here's another one: wuid=983484154. Mine is the Anonymous platform (CPU) task. Also had a "-226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS" task. I only mention it because of an earlier post here and because it happened on the same host which was having the problems I described in the Catalyst 12.4 and Lunatics 6.10 (ati13ati) thread - but after I reverted back to Catalyst 12.1. |
Stick Send message Joined: 26 Feb 00 Posts: 100 Credit: 5,283,449 RAC: 5 |
Here's another one: wuid=983484154. Mine is the Anonymous platform (CPU) task. UPDATE: The tie-breaker has come in and all three validated. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Here's another one: wuid=983484154. Mine is the Anonymous platform (CPU) task. I tried a lot of different values for UNROLL Data Chunk and FFA_Data_ Chunk & _Fetch, for the AstroPulse rev.555 optimized build. Also used (3:1) different values, not a multiple of 2, but 3 and 5. CPU=I7-2600+ 2 EAH5870 GPU's; BOINC 7.0.25; WIN 7, both x64 8 GByte DDR3 1600MHz, dedicated BOINC Data Drive; LUNATICs v0.40 Installer, AMD/ATI Cat.12.4 driver. It were the earlier 11.4/11.5/11.6 OpenCL 1.1, causing crashes, so I tried cat.12.4 {and AMD-APP(SDK)1.2(?)}OpenCL 1.2, which works OK. <core_client_version>7.0.25</core_client_version> <![CDATA[ <stderr_txt> Number of app instances per device setted to:1 DATA_CHUNK_UNROLL setted to:15 FFA thread block override value:15360 FFA thread fetchblock override value:5120 Maximum single buffer size setted to:256MB Running on device number: 0 DATA_CHUNK_UNROLL at default:15 Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used OpenCL platform detected: Advanced Micro Devices, Inc. BOINC assigns 0 device, slots 0 to 0 (including) will be checked Used slot is 0; Used GPU device parameters are: Number of compute units: 20 Single buffer allocation size: 256MB max WG size: 256 These figures depends on the (ATI) GPU used, i.e work group size, memory per Compute Unit, (S{ingle} P{recision} or D{ouble} P{?}, # of Compute Units, etc.. Still have some more to learn about AMD/ATI GPU's and OpenCL.....;-) These figures I found by trial and error...........?! |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
You are mixing things up here. Number of CU´s yes. Memory not really. No DP needed atm. With each crime and every kindness we birth our future. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
You are mixing things up here. It's about rev.557 and CPU fall-back, while I'm using rev.555. I know only Single Precision is needed and a 256MByte Work Group Size is preferred. Memory (, per Compute Unit) is a different case, of which I do not know it's impact). But you're right, this has nothing todo, with the thread-subject, my bad. |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
Got another one. Just reported it and was doing my spreadsheet and noticed CBNC on the task page for it whilst plugging in numbers. This one finally validated. Three-way granted credit. ap_25ja12ad_B1_P1_00393_20120410_27293 worked out fine against stock this time. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
OTS Send message Joined: 6 Jan 08 Posts: 369 Credit: 20,533,537 RAC: 0 |
I trust that when there are enough reports that someone will so indicate or lock the thread. I haven't seen any indication of the former so with that in mind, I have another inconclusive going with a Linux app. It is Task 2394877085 and I am running ap_6.01r546_sse3_linux32. It looks like both of the inconclusive results reported 30/30. |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
I trust that when there are enough reports that someone will so indicate or lock the thread. I haven't seen any indication of the former so with that in mind, I have another inconclusive going with a Linux app. It is Task 2394877085 and I am running ap_6.01r546_sse3_linux32. It looks like both of the inconclusive results reported 30/30. Im sure Richard Haselgrove or one of the testers will notify a mod to lock the thread when they have enough data, Or find the problem and or solution. [/quote] Old James |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I trust that when there are enough reports that someone will so indicate or lock the thread. I haven't seen any indication of the former so with that in mind, I have another inconclusive going with a Linux app. It is Task 2394877085 and I am running ap_6.01r546_sse3_linux32. It looks like both of the inconclusive results reported 30/30. We have a possible handle on one possible cause of the problem, but weekends (and, in some parts of the world, holidays) get in the way of testing and deploying solutions. Development work - especially when a full test against the stock CPU application can take well over 24 hours - is a slow business. I think we've probably got enough reports from this application now, thank you. Once a few more reports have come in on the bugfix version that's being tested now, it'll move to full online testing at Beta - whether under anonymous platform, or as a new 'stock' test, we'll have to wait for Eric to decide. He won't be in the lab yet..... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
He won't be in the lab yet..... He is now. A new Beta test for AP on OpenCL cards (both ATI and NVidia) has just started at Beta. Testers - especially those who encountered the inconclusive validations last time - please report for duty :-) |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I don't know how it went this long without me noticing this one.. and I don't know/remember if I mentioned it, but I've got this one, too.. http://setiathome.berkeley.edu/workunit.php?wuid=967360052 Still waiting for _3 to return it, so you should have at least 24 hours to capture it. edit: unless the files get deleted immediately after it validates. Didn't think about that part. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
X-Files 27 Send message Joined: 17 May 99 Posts: 104 Credit: 111,191,433 RAC: 0 |
Don't know whats wrong with this: 2428927751 This rig has been crunching for awhile now without issues before this. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Don't know whats wrong with this: Try using the r1305 app (aka 6.03) and AstroPulse_Kernels_r1305.cl from Seti Beta, that has the fix. http://boinc2.ssl.berkeley.edu/beta/download/astropulse_6.03_windows_intelx86__opencl_nvidia_100.exe http://boinc2.ssl.berkeley.edu/beta/download/astropulse_6.03_windows_intelx86__opencl_nvidia_100.pdb http://boinc2.ssl.berkeley.edu/beta/download/AstroPulse_Kernels_r1305.cl Claggy |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Thats in case of the nvidia bug. 1305 has a fix. ATI´s dont have this issue. With each crime and every kindness we birth our future. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.