Message boards :
Number crunching :
OpenCL AstroPulse crash after processing completion - write here.
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next
Author | Message |
---|---|
Spectrum Send message Joined: 14 Jun 99 Posts: 468 Credit: 53,129,336 RAC: 0 |
Hi Tbar. Thanks for the advice, I have added the app info entry and will see how it goes. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I have been getting a heap of these lately, is this what we are talking about? No, it's another (BOINC own) issue. The reason in bold. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Spectrum Send message Joined: 14 Jun 99 Posts: 468 Credit: 53,129,336 RAC: 0 |
Thanks for the reply Raistmer, any known fixes for this as its just wasting cycles? Tbars idea didn't work, still getting errors. |
Wedge009 Send message Joined: 3 Apr 99 Posts: 451 Credit: 431,396,357 RAC: 553 |
I don't think you should be manually setting the flops count - looking at your host's applications, the server has already determined a stable estimate of run times for AstroPulse on your ATI GPU. It may just be that your particular work-unit had a high blanking percentage, which results in more work being done on the CPU than the GPU and prolonging the processing time. I have had the occasional AstroPulse WU which unfortunately had 99+% blanking. Instead of finishing immediately as 100% blanked WUs, it processed it all on the CPU and consequently hit the maximum-elapsed-time-exceeded error after something like five hours (I believe the time limit is 10x normal expected run time and since the WUs are normally processed in less than half an hour: 10 x half an hour = ~5 hours). Back to the topic, I had an AstroPulse WU crash with an odd error - haven't seen this before. But it's possible things just got really messed up for this one and isn't a recurring or reproducible error. Soli Deo Gloria |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Have you freed CPU core`s ? Thats very important on high blanked WU´s. With each crime and every kindness we birth our future. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Thanks for the reply Raistmer, any known fixes for this as its just wasting cycles? That's a strange error you're getting with the old Multibeam program. Of the pages of Errors, some actually worked. Worse yet, they worked fine in the past. I had a similar experience the last time I tried that program, although it was a different Error. It had also worked fine for me in the past. You could try updating your ATI driver, then updating BOINC, you never know. A newer Multibeam App for ATI was released a short while ago, you might try that as well. I haven't tried the newer Multibeam version as I only use the Multibeam App when I can't receive any AstroPulses. Look here for the New ATI Multibeam App, OpenCL apps are available for download on Lunatics Good Luck. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Had another restart this morning, first one in a couple days. I had to look in the log to find it; ap_09dc12ab_B6_P1_00217_20130206_30237.wu_0 2/9/2013 8:46:47 AM | | Starting BOINC client version 7.0.45 for windows_intelx86 ... 2/9/2013 8:46:49 AM | SETI@home | Restarting task ap_28dc12aa_B0_P1_00333_20130128_15160.wu_0 using astropulse_v6 version 601 in slot 2 2/9/2013 8:46:49 AM | SETI@home | Restarting task ap_17dc12aa_B6_P0_00117_20130128_18735.wu_2 using astropulse_v6 version 601 in slot 4 2/9/2013 8:46:49 AM | SETI@home | Restarting task ap_17dc12ab_B5_P1_00004_20130201_24812.wu_0 using astropulse_v6 version 601 in slot 1 2/9/2013 8:46:49 AM | SETI@home | Restarting task ap_09dc12ab_B6_P1_00217_20130206_30237.wu_0 using astropulse_v6 version 604 (ati_opencl_100) in slot 0 2/9/2013 8:46:49 AM | SETI@home | Restarting task 15dc12ac.22337.228355.8.10.51_1 using setiathome_enhanced version 609 (cuda23) in slot 3 2/9/2013 8:46:49 AM | SETI@home | Sending scheduler request: To fetch work. 2/9/2013 8:46:49 AM | SETI@home | Requesting new tasks for NVIDIA and ATI 2/9/2013 8:46:58 AM | SETI@home | Scheduler request completed: got 0 new tasks 2/9/2013 8:46:58 AM | SETI@home | No tasks sent 2/9/2013 8:46:58 AM | SETI@home | No tasks are available for SETI@home Enhanced 2/9/2013 8:46:58 AM | SETI@home | No tasks are available for AstroPulse v6 2/9/2013 8:46:58 AM | SETI@home | This computer has reached a limit on tasks in progress 2/9/2013 8:46:58 AM | SETI@home | Project has no tasks available 2/9/2013 8:48:19 AM | SETI@home | Computation for task ap_09dc12ab_B6_P1_00217_20130206_30237.wu_0 finished 2/9/2013 8:48:19 AM | SETI@home | Starting task ap_09dc12ac_B1_P1_00165_20130206_01989.wu_1 using astropulse_v6 version 604 (ati_opencl_100) in slot 0 2/9/2013 8:48:21 AM | SETI@home | Started upload of ap_09dc12ab_B6_P1_00217_20130206_30237.wu_0_0 2/9/2013 8:48:25 AM | SETI@home | Finished upload of ap_09dc12ab_B6_P1_00217_20130206_30237.wu_0_0 2/9/2013 8:52:04 AM | SETI@home | Sending scheduler request: To fetch work. 2/9/2013 8:52:04 AM | SETI@home | Reporting 1 completed tasks 2/9/2013 8:52:04 AM | SETI@home | Requesting new tasks for ATI 2/9/2013 8:52:10 AM | SETI@home | Scheduler request completed: got 0 new tasks 2/9/2013 8:52:10 AM | SETI@home | No tasks sent 2/9/2013 8:52:10 AM | SETI@home | No tasks are available for SETI@home Enhanced 2/9/2013 8:52:10 AM | SETI@home | No tasks are available for AstroPulse v6 ... Another Success... |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Will provide new build for this issue soon. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Here is updated build that hopefully will catch exception and shutdown gracefully (it's important for BOINC clients that did not do re-run). https://dl.dropbox.com/u/60381958/AP6_win_x86_SSE2_OpenCL_ATI_r1766.7z https://dl.dropbox.com/u/60381958/AP6_win_x86_SSE2_OpenCL_NV_r1766.7z Please continue to post cases of re-runs/restarts concerning this issue. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Here is updated build that hopefully will catch exception and shutdown gracefully (it's important for BOINC clients that did not do re-run). Thanks, I'll give it a go after the servers come back up. I get nervous making edits with so many unreported tasks. I already had to make an edit to change the nVidia card to APs, I'll change to this when I remove the nVidia AP edit. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Back up and running the debug train to...where ever. 2/10/2013 3:38:36 PM | | Starting BOINC client version 7.0.42 for windows_intelx86 2/10/2013 3:38:36 PM | | OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00) 2/10/2013 3:38:36 PM | | CUDA: NVIDIA GPU 0: GeForce 8800 GT (driver version 306.81, CUDA version 5.0, compute capability 1.1, 512MB, 467MB available, 504 GFLOPS peak) 2/10/2013 3:38:36 PM | | CAL: ATI GPU 0: AMD Radeon HD 6800 series (Barts) (CAL version 1.4.1664, 1024MB, 1006MB available, 2976 GFLOPS peak) 2/10/2013 3:38:36 PM | | OpenCL: NVIDIA GPU 0: GeForce 8800 GT (driver version 306.81, device version OpenCL 1.0 CUDA, 512MB, 467MB available, 504 GFLOPS peak) 2/10/2013 3:38:36 PM | | OpenCL: ATI GPU 0: AMD Radeon HD 6800 series (Barts) (driver version CAL 1.4.1664, device version OpenCL 1.1 AMD-APP (851.4), 1024MB, 1006MB available, 2976 GFLOPS peak) 2/10/2013 3:38:36 PM | | Version change (7.0.45 -> 7.0.42) 2/10/2013 3:39:20 PM | SETI@home | Restarting task ap_25jl12ac_B6_P0_00142_20130119_09915.wu_3 using astropulse_v6 version 601 in slot 2 2/10/2013 3:39:20 PM | SETI@home | Restarting task ap_02ja13ac_B5_P0_00380_20130129_23414.wu_1 using astropulse_v6 version 601 in slot 4 2/10/2013 3:39:20 PM | SETI@home | Restarting task 17dc12ac.19577.17220.12.10.101_0 using setiathome_enhanced version 609 (cuda23) in slot 0 2/10/2013 3:41:53 PM | SETI@home | task ap_27dc12ac_B1_P0_00309_20130208_09561.wu_0 resumed by user 2/10/2013 3:41:54 PM | SETI@home | Starting task ap_27dc12ac_B1_P0_00309_20130208_09561.wu_0 using astropulse_v6 version 604 (ati_opencl_100) in slot 1 .... First candidate; ap_27dc12ac_B1_P0_00309_20130208_09561.wu_0 Of course it just has to be one that is heavily blanked... |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
This Guy just showed up in one of my latest Work-Units. He has a few; Error tasks for computer 6846852 The other platform; Computer 6843287... |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
this one has no signs of restart EDIT: blanking ~11% - no so heavy as could be ;) SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
This Guy just showed up in one of my latest Work-Units. he running old rev so no additional info can be extracted. SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
This Guy just showed up in one of my latest Work-Units. None of them have had any Errors, so far. The one I listed was the first one using your new App, if there was some way to sort them by time/date that might be useful. All of them using the new App are also using BOINC 7.0.42. Maybe I should go back to BOINC 7.0.28, most people getting the Errors are using that version and I got repeated Errors using the unroll 2 setting with 7.0.28. This one was killing my ATI App so I arranged for the nVidia card to run it ap_16dc12ac_B6_P0_00338_20130208_22461.wu_1, note the blanking. It would have probably timed-out on the ATI App. Lots of Blanking going on... More Errors; Computer 5736754 Computer 6204844 |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Something else impressive is how well Ubuntu 64-bit crunches CPU AstroPulses. My pieced together Linux system is crunching better than a faster Xeon in both 64-bit OSX and 32-bit XP. The 2.8GHz Xeon takes just under 9 hours in OSX and over 10 hours in 32-bit XP. The 2.4GHz Xeon is doing it around the mid-eights in Ubuntu. We need an better CPU AstroPulse App for 32-bit Windows. I think I've sorted this. The reason the 2.4GHz Intel processor is running better than a 2.8GHz Intel processor is because it's actually running at 3.01GHz. For some reason, when you place an Intel XEON 3060, on an Intel DP43TF board, set the Intel BIOS to 'Automatic', it overclocks the 3060 to 3.01GHz. Since many people were clocking the 3060 to 3.4GHz, I guess I shouldn't be concerned by Intel clocking their own component to 3.01. It seems to work fine, I just need a Full Sized ATX case for it. It will fit in an old Compaq EVO 510 case, I've had it in one before. Once I have it in a case, I'll add SETI video cards... |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I gave up on 7.0.42. It appears your bug finder has scared all the bugs away. I went back to 7.0.28, where all this started. Back then, using the stock App set at the default setting unroll 2, I was getting mostly Errors. This is the first one using your new App, ap_15dc12ac_B4_P1_00152_20130207_30944.wu_2. If you look in the Workunit you will find another one, Workunit 1166143526. No, I didn't plan it that way, it just happened... |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I think we've entered the Twilight Zone. Desperately seeking some evidence of a bug, I moved the original astropulse_6.04_windows_intelx86__opencl_ati_100.exe, astropulse_6.04_windows_intelx86__opencl_ati_100.pdb & AstroPulse_Kernels_r1316.cl from the 'oldApp_backup' folder and began running that App. Not a single bug in over a day. I can't think of any difference over the last time I ran the original Stock App other than I'm now running a 2GB RAM disk in the upper 6GB of ram. I do still have the r1766 debug build in the project folder even though it's not being used. I'll try removing the r1766DB and the ram disk and see what happens. Since I placed the r1766_debug_build in the project folder I haven't had any Errors. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, I removed the Debug Build from the Project folder and nothing changed. There are a couple other major differences from when I was receiving all the Errors with the Stock App. As with most other people receiving many of these Errors, I was running completely Stock, without an app_info file. Since using the app_info file I usually only receive one Error a day, sometimes one Error a week. Another observation is currently the App seems to be using much more CPU time even with Zero blanking. Usually it would use 10-20% CPU with the lightly blanked tasks whereas now it is using 30-50% CPU with light blanking. It appears that Apps not using an app_info file use less CPU time, however, it appears inconclusive. Here is an interesting task, Workunit 1167407957 Note the CPU times, and the number of other Errors the one host has. Also note how the one Host whose results were listed as 'Invalid' due to an Error, actually had the Valid results... shrugs... |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
debug build has no CPU optimizations on instruction level so will slower and consume more CPU. But if you have troubles with finding any crashes with it could you try last opt build I posted in this thread instead? At least we will know if workaround works or not even w/o knowledge where crash occurs. SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.