OpenCL AstroPulse crash after processing completion - write here. |
![]() |
| log in |
Message boards : Number crunching : OpenCL AstroPulse crash after processing completion - write here.
1 · 2 · 3 · 4 . . . 6 · Next
| Author | Message |
|---|---|
|
If you see computation errors with OpenCL AstroPulse (AP) application and in stderr of task you see that computations were finished (that is, number of found pulses printed in stderr, counters are printed in stderr and after that debug dump occured), please, report in this thread with relevant pecularities of your setup. | |
| ID: 1333906 · | |
|
IF you experience this issue please upgrade to these apps: ERROR: some exception inside XXXXXXX, doing hard termination... ____________ News about SETI opt app releases: https://twitter.com/Raistmer | |
| ID: 1333931 · | |
|
I don't think I've had any crashes with AP before - or if I have, then only very occasionally. What's in the new revisions? Extra debugging info? | |
| ID: 1333981 · | |
I don't think I've had any crashes with AP before - or if I have, then only very occasionally. What's in the new revisions? Extra debugging info? Only safer app termination. If you dont experience app crash on exit you dont need to upgrade. ____________ | |
| ID: 1333983 · | |
IF you experience this issue please upgrade to these apps: Now that these tasks are being rerun and listed as a Success, they will be much more difficult to find. Much more time consuming than just looking for an Error Listing. Do you know if this rerun 'feature' is just in 7.0.45 or are all the newer versions going to be this way? No more wasted time, the days of a wasted task appear to be over. I just installed the new App, should be finished in about 20 minutes. It would be nice if Uploads were working... :-) | |
| ID: 1334046 · | |
|
I definitely had two of these, but I reset them and recrunched them with no problems on second run. Validated too. | |
| ID: 1334053 · | |
IF you experience this issue please upgrade to these apps: I don't know about what rerun you speaking. What feature? Where it was described ? ____________ News about SETI opt app releases: https://twitter.com/Raistmer | |
| ID: 1334064 · | |
I definitely had two of these, but I reset them and recrunched them with no problems on second run. Validated too. Your description of error very similar to what I see when all CPU cores are busy. Do you run with idle core? Try to free more cores. If enough CPU is free I see no stuck, if not enough - I got such stuck tasks too. But it's completely different issue, better to discuss it in separate or common release thread, not here. ____________ News about SETI opt app releases: https://twitter.com/Raistmer | |
| ID: 1334067 · | |
IF you experience this issue please upgrade to these apps: I don't see the feature listed either. It began when I installed BOINC 7.0.45 late on the 30th. I probably had around 4 Restarts/Reruns since the 31st. Those are the ones I witnessed, there could be more. Like I said, they will be more difficult to find now, you will have to look at the details of each task. This was the last one, ap_03ja13ai_B2_P0_00302_20130130_18386.wu_1. Apparently, BOINC 7.0.45 does an Auto Restart & Reruns the last minute of the failed task. If you weren't there to see it, you wouldn't know it happened. Here's another one I just found ap_02ja13ae_B1_P1_00345_20130129_30906.wu_1. There should be a couple more... | |
| ID: 1334070 · | |
|
Thanks for info. And very good feature indeed. To lose complete task when it actually was finished already was not very good. | |
| ID: 1334074 · | |
I didn't use CPU then, all the 6 logical cores at 4.6 GHz(3930K with HT off) were free.
Sure, just a clarification of your last suggestion. ____________ | |
| ID: 1334209 · | |
|
I just had another Restart/Rerun, the first one since installing r1764. Everything went pretty smooth, no problems with the computer not responding during the restart. I found an easy way to locally search for the restarts. Open the stdoutdae.txt file and search for 'Libraries', that word is only used during startup. The Auto Restarts are not preceded by the line "Exit requested by user". | |
| ID: 1334466 · | |
|
Hm.... no exception interception occured. | |
| ID: 1334506 · | |
|
Another 'Restart'. This time the program started having problems with a different active task afterwards. That's the first time a 'Restart' has caused any lingering effects. I had to nuke the nvidia task and have it resent, only way to be sure. Initialization completed 04-Feb-2013 09:19:54 [SETI@home] Restarting task ap_02ja13ae_B0_P0_00146_20130129_23122.wu_1 using astropulse_v6 version 601 in slot 3 04-Feb-2013 09:19:54 [SETI@home] Restarting task ap_14dc12ac_B5_P0_00190_20130127_05925.wu_1 using astropulse_v6 version 601 in slot 1 04-Feb-2013 09:19:54 [SETI@home] Restarting task ap_16dc12aa_B1_P0_00065_20130127_17705.wu_0 using astropulse_v6 version 604 (opencl_nvidia_100) in slot 2 04-Feb-2013 09:19:54 [SETI@home] Restarting task ap_27dc12ad_B2_P1_00015_20130202_21288.wu_0 using astropulse_v6 version 604 (ati_opencl_100) in slot 0 04-Feb-2013 09:19:54 [SETI@home] Sending scheduler request: To fetch work. 04-Feb-2013 09:19:54 [SETI@home] Requesting new tasks for NVIDIA and ATI 04-Feb-2013 09:19:58 [SETI@home] Scheduler request completed: got 0 new tasks 04-Feb-2013 09:19:58 [SETI@home] Project has no tasks available 04-Feb-2013 09:20:30 [SETI@home] Task ap_16dc12aa_B1_P0_00065_20130127_17705.wu_0 exited with zero status but no 'finished' file 04-Feb-2013 09:20:30 [SETI@home] If this happens repeatedly you may need to reset the project. 04-Feb-2013 09:21:17 [SETI@home] Computation for task ap_27dc12ad_B2_P1_00015_20130202_21288.wu_0 finished 04-Feb-2013 09:21:17 [SETI@home] Starting task ap_27dc12ad_B2_P1_00030_20130202_21288.wu_1 using astropulse_v6 version 604 (ati_opencl_100) in slot 0 04-Feb-2013 09:21:20 [SETI@home] Started upload of ap_27dc12ad_B2_P1_00015_20130202_21288.wu_0_0 04-Feb-2013 09:21:25 [SETI@home] Finished upload of ap_27dc12ad_B2_P1_00015_20130202_21288.wu_0_0 04-Feb-2013 09:25:04 [SETI@home] Sending scheduler request: To fetch work. 04-Feb-2013 09:25:04 [SETI@home] Reporting 1 completed tasks 04-Feb-2013 09:25:04 [SETI@home] Requesting new tasks for ATI 04-Feb-2013 09:25:07 [SETI@home] Scheduler request completed: got 0 new tasks 04-Feb-2013 09:25:07 [SETI@home] Project has no tasks available 04-Feb-2013 09:31:12 [SETI@home] Sending scheduler request: To fetch work. 04-Feb-2013 09:31:12 [SETI@home] Not requesting tasks 04-Feb-2013 09:31:15 [SETI@home] Scheduler request completed 04-Feb-2013 09:31:18 [SETI@home] Restarting task ap_16dc12aa_B1_P0_00065_20130127_17705.wu_0 using astropulse_v6 version 604 (opencl_nvidia_100) in slot 2 04-Feb-2013 09:31:53 [SETI@home] Task ap_16dc12aa_B1_P0_00065_20130127_17705.wu_0 exited with zero status but no 'finished' file 04-Feb-2013 09:31:53 [SETI@home] If this happens repeatedly you may need to reset the project. 04-Feb-2013 09:36:20 [SETI@home] Sending scheduler request: To fetch work. 04-Feb-2013 09:36:20 [SETI@home] Not requesting tasks 04-Feb-2013 09:36:23 [SETI@home] Scheduler request completed 04-Feb-2013 09:41:29 [SETI@home] Sending scheduler request: To fetch work. 04-Feb-2013 09:41:29 [SETI@home] Requesting new tasks for ATI 04-Feb-2013 09:41:32 [SETI@home] Scheduler request completed: got 0 new tasks 04-Feb-2013 09:41:32 [SETI@home] Project has no tasks available 04-Feb-2013 09:41:54 [SETI@home] Restarting task ap_16dc12aa_B1_P0_00065_20130127_17705.wu_0 using astropulse_v6 version 604 (opencl_nvidia_100) in slot 2 04-Feb-2013 09:42:29 [SETI@home] Task ap_16dc12aa_B1_P0_00065_20130127_17705.wu_0 exited with zero status but no 'finished' file... Latest Debug, ap_27dc12ad_B2_P1_00015_20130202_21288.wu_0 | |
| ID: 1334627 · | |
09:21:14 (2264): called boinc_finish This line absent in crashed version. Looks like crash in boinc_finish() call, but for some reason it not get intercepted with try/catch block... ____________ News about SETI opt app releases: https://twitter.com/Raistmer | |
| ID: 1334660 · | |
|
The 7.0.45 change log notes a few changes to OpenCL. I didn't have 'Restarts' in 7.0.44, just Computation errors and Invalid results. I kinda like the Valid results, as long as it doesn't cause other problems... | |
| ID: 1334665 · | |
|
BTW, what do you think this person has found to cause the results listed? Those results are being validated, quite remarkable. | |
| ID: 1334704 · | |
BTW, what do you think this person has found to cause the results listed? Those results are being validated, quite remarkable. Oh well, it appears all he found was some HTML Error that is listing the CPU Time as Run Time. Unimpressive, to say the least. I did log one more restart yesterday; ap_05dc12aa_B5_P1_00070_20130203_17591.wu_0 Another Success! No Invalid Results since updating to 7.0.45, impressive... Something else impressive is how well Ubuntu 64-bit crunches CPU AstroPulses. My pieced together Linux system is crunching better than a faster Xeon in both 64-bit OSX and 32-bit XP. The 2.8GHz Xeon takes just under 9 hours in OSX and over 10 hours in 32-bit XP. The 2.4GHz Xeon is doing it around the mid-eights in Ubuntu. We need an better CPU AstroPulse App for 32-bit Windows. | |
| ID: 1335272 · | |
|
I have been getting a heap of these lately, is this what we are talking about? | |
| ID: 1335289 · | |
|
No, that's not the Error he is looking for... <app_name>setiathome_enhanced</app_name> <version_num>610</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.10</max_ncpus> <plan_class>ati13ati</plan_class> <flops>170000000000</flops> <cmdline>-period_iterations_num 20 -instances_per_device 1</cmdline> <coproc> <type>ATI</type> <count>1</count> </coproc> It might work, in your case. | |
| ID: 1335317 · | |
Message boards : Number crunching : OpenCL AstroPulse crash after processing completion - write here.
| Copyright © 2013 University of California |