OpenCL apps are available for download on Lunatics

Message boards : Number crunching : OpenCL apps are available for download on Lunatics
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

AuthorMessage
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1333730 - Posted: 2 Feb 2013, 1:04:43 UTC

Further about doing the transition to r1760, the issue is the clFFTplan file ending with _32768_r1761.bin so that should be deleted unless you're sure it was created by the Astropulse application. The other _r1761.bin files won't be needed so if you like things neat they can also be deleted.
                                                                   Joe
ID: 1333730 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1333793 - Posted: 2 Feb 2013, 3:48:43 UTC

All right here's my updated test report for MB r1760 only (since there haven't been any significant changes for AP between r1363 and r1761 and they're all working just fine, ATI and NV both).


HD 6970

WinXP, Catalyst 11.2/11.7 kludge (only known stable configuration for my Cayman cards).

I never got to try this one with r1761. In fact, I had to pinch a WU from CPU to test r1760. But it seems to be working okay, CPU/GPU usage looks good. Time taken to complete is a little longer than usual (even considering the extra time used for binary compilation), but I've seen some MB WUs take this long even with r426. So I can't comment on relative performance till I get more WUs processed with MB r1760.

I picked this particular WU because it already has a result awaiting validation on the server, but unfortunately uploads (and downloads) seem to be dead for me at time of writing, so I'll have to wait till the server recovers before I can see if my result is valid. At least the spike/pulse/triplet/gaussian counts are consistent.

Edit: Oops, no, strike that. Uploads working again, albeit at a crawl. But the result was marked as valid!


HD 6950

WinXP, Catalyst 11.2/11.7 kludge (only known stable configuration for my Cayman cards).

I kept getting -9 result overflows with MB r1761 previously. I'm quite sure that I had AP r1761 running previously, and given that MB WUs are working again with r1760, I'd say reusing the AP r1761 bin was the problem. Looks like problem solved! The work-unit was marked as valid, in any case.


AMD C-50

Win7 32-bit, Catalyst 12.8.

Unfortunately, the good news doesn't carry over to the APUs. Result appears to be the same as before - after only a few seconds of GPU use, the driver crashed at about 0.792% progress.


AMD E-450

Win7 64-bit, Catalyst 13.1.

Given that the C-50 failed again and that the underlying GPU is very similar, I didn't feel it was worth the effort of rolling back to Catalyst 12.8 to try this one again.
Soli Deo Gloria
ID: 1333793 · Report as offensive
Profile cov_route
Avatar

Send message
Joined: 13 Sep 12
Posts: 342
Credit: 10,270,618
RAC: 0
Canada
Message 1333836 - Posted: 2 Feb 2013, 5:21:38 UTC

I finally broke down and decided to run a bench test.

Ran r390 and r1761 with catalyst 13.1 and a short test wu. First thing, r1761 ran without error. The card is a 6670 (Turks).

Second, 1761 was about 30% slower (iow run time was 40% longer).

Quick timetable 
 
WU : 17dc12ad.28185.12907.8.10.253.wu 
MB6_win_x86_SSE3_OpenCL_ATi_HD5_r390.exe :
  Elapsed 314.227 secs
      CPU 45.084 secs
MB7_win_x86_SSE_OpenCL_ATi_r1761.exe :
  Elapsed 444.233 secs, speedup: -41.37%  ratio: 0.71x
      CPU 45.583 secs, speedup: -1.11%  ratio: 0.99x


So unless there's a science reason to switch I'll stay with r390 for the time being. If time allows I'll bench a few more random wu's in the days to come.

Observation: 1761 produced "WARNING: suboptimal workgroup size for PC_find_pulse_kernel_cl pass x" in stderr. From a quick look at the source I think that's just about wasteful memory alignment, shouldn't have much to do with speed.
ID: 1333836 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1333872 - Posted: 2 Feb 2013, 7:54:15 UTC - in response to Message 1333793.  

All right here's my updated test report for MB r1760 only

Thanks for detailed report.
And what period_iterations_num value did you try already with C-60 ?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1333872 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1333873 - Posted: 2 Feb 2013, 7:55:28 UTC - in response to Message 1333836.  


Observation: 1761 produced "WARNING: suboptimal workgroup size for PC_find_pulse_kernel_cl pass x" in stderr. From a quick look at the source I think that's just about wasteful memory alignment, shouldn't have much to do with speed.

Actually it can affect performace. Suboptibal workgroup size meand underloading GPU. Could you give link on result with that warning ?

SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1333873 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1333879 - Posted: 2 Feb 2013, 8:17:57 UTC

My current app_info.xml have (for both MB and AP, as from the Lunatics_Win32_v0.40_setup.exe):
<plan_class>ati13ati</plan_class>

... and my client_state.xml have 95 lines <plan_class>ati13ati</plan_class>

(I don't have <platform>windows_intelx86</platform> in app_info.xml but client_state.xml have it for every <result>)


And in the new files:
MB7_win_x86_SSE_OpenCL_ATi.aistub
<plan_class>ati_opencl_sah</plan_class>
<plan_class>opencl_ati_sah</plan_class>

AP6_win_x86_SSE2_OpenCL_ATI.aistub
<plan_class>ati_opencl_100</plan_class>
<plan_class>opencl_ati_100</plan_class>


If I (anybody) just replace the sections for MB and AP in app_info.xml I think the tasks will be deleted.

Is there need in .aistub files to be included another section with <plan_class>ati13ati</plan_class>
And what about 64 bit users? Do they need another 3 sections with <platform>windows_x86_64</platform>
Another problem is <version_num>601</version_num> for AP (as from the Lunatics_Win32_v0.40_setup.exe)

So: use the new .aistub files with care (meaning: think before use, duplicate the sections and edit to fit your current situation)


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1333879 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1333910 - Posted: 2 Feb 2013, 10:33:03 UTC - in response to Message 1333836.  

And what period_iterations_num value did you try already with C-60 ?

Mine is a C-50, but essentially the same as your C-60 except for the turbo clock boost. Anyway, I normally use 24, but at 32 I still get a driver crash. Do you suggest going even higher?

Observation: 1761 produced "WARNING: suboptimal workgroup size for PC_find_pulse_kernel_cl pass x" in stderr.

I haven't seen this warning in my WU outputs, nor a consistent slow down in performance, but I still only have a small sample of completed WUs to make observations on.

Regarding the plan classes, I just manually switched them over to opencl_ati_sah, etc, in my client_state.xml before restarting BOINC. No problem... if you remember to do it.
Soli Deo Gloria
ID: 1333910 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1333915 - Posted: 2 Feb 2013, 10:46:39 UTC - in response to Message 1333910.  
Last modified: 2 Feb 2013, 10:47:22 UTC

Mine is a C-50, but essentially the same as your C-60 except for the turbo clock boost. Anyway, I normally use 24, but at 32 I still get a driver crash. Do you suggest going even higher?


Definitely

Try 50 or more.
The GPU part is not that powerfull.


With each crime and every kindness we birth our future.
ID: 1333915 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1333922 - Posted: 2 Feb 2013, 11:36:29 UTC
Last modified: 2 Feb 2013, 11:37:30 UTC

I know very well that it's not very powerful. But I tried values of 64, 128, 256, 512 (I know there's no real reason to use powers of 2, I just did), but all of them resulted in driver crashes within seconds of the GPU being used. If there are any other suggestions, I'll give them a try.

Rolling back to r426 yet again.
Soli Deo Gloria
ID: 1333922 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1333928 - Posted: 2 Feb 2013, 11:55:36 UTC - in response to Message 1333922.  

I know very well that it's not very powerful. But I tried values of 64, 128, 256, 512 (I know there's no real reason to use powers of 2, I just did), but all of them resulted in driver crashes within seconds of the GPU being used. If there are any other suggestions, I'll give them a try.

Rolling back to r426 yet again.


Unfortunately overheating issue on my C-60 netbook only increased, now it shuts down even through AP offline testing with idle CPU. So i can't check how MB7 executes on C-60 for now. Maybe next week after repair...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1333928 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1333933 - Posted: 2 Feb 2013, 12:16:00 UTC

That's crazy. Do you know how to open it and try cleaning out the heat-sink and fan? It's a bit fiddly, but I learned how to open the case to replace the memory - the pre-installed 1 GiB was paltry.

Hopefully, it's not like what happened to my most powerful laptop - pipe between GPU and heat-sink seems to be no longer effective. I can feel the heat being transferred on the pipe between the heat-sink and CPU, but the GPU one is relatively cold, which tells me something is broken. >.< Well, it's only a Mobility Radeon 2600, so the best it could do was the old CAL-based hybrid AP, anyway.
Soli Deo Gloria
ID: 1333933 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1333937 - Posted: 2 Feb 2013, 12:34:49 UTC - in response to Message 1333933.  

That's crazy. Do you know how to open it and try cleaning out the heat-sink and fan?

I tried to oil fan spindle but w/o any success. Completely fan removal requires not only bottom case opening but almost complete netbook disassembly. Wanna try to get repair via service center (if warranty will work, it was bought in USA...) If not will have no choice but disassembly.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1333937 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1333940 - Posted: 2 Feb 2013, 12:40:46 UTC

So sounds like the fan itself is broken or not spinning very well, at best.

Well, never mind, then. Looks like I'll have to be your low-end Fusion tester for the time being. (:
Soli Deo Gloria
ID: 1333940 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1333942 - Posted: 2 Feb 2013, 12:45:29 UTC - in response to Message 1333940.  

So sounds like the fan itself is broken or not spinning very well, at best.

YEs, fan is broken. Sometimes it spins fast and noisless but soon after it begins strong vibration accompanying laud noise and rotation speed drops (and APU temp rises up to point of emergency shutdown near 90C )
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1333942 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1333945 - Posted: 2 Feb 2013, 12:58:47 UTC - in response to Message 1333942.  
Last modified: 2 Feb 2013, 12:59:03 UTC

So sounds like the fan itself is broken or not spinning very well, at best.

YEs, fan is broken. Sometimes it spins fast and noisless but soon after it begins strong vibration accompanying laud noise and rotation speed drops (and APU temp rises up to point of emergency shutdown near 90C )

My C2D T8100 Laptop has doing that, somethimes noisy, sometimes quiet, temps up to 90°C too, complete disassembly of Laptop was required to get at fan assembly, i cleaned the Fan assembly and Fan, and lubricated the Fan shaft with cycle oil, now temps ~65°C, still 5°C too high for my liking, but good enough,

Claggy
ID: 1333945 · Report as offensive
Profile cov_route
Avatar

Send message
Joined: 13 Sep 12
Posts: 342
Credit: 10,270,618
RAC: 0
Canada
Message 1333968 - Posted: 2 Feb 2013, 14:32:09 UTC - in response to Message 1333873.  


Observation: 1761 produced "WARNING: suboptimal workgroup size for PC_find_pulse_kernel_cl pass x" in stderr. From a quick look at the source I think that's just about wasteful memory alignment, shouldn't have much to do with speed.

Actually it can affect performace. Suboptibal workgroup size meand underloading GPU. Could you give link on result with that warning ?

I got that from a benchmark run. I could send you the result somehow if you like.

In other news, I tried benchmarking another longer-running wu (r1761 & cat 13.1) and I got a driver restart.
ID: 1333968 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1333970 - Posted: 2 Feb 2013, 14:36:48 UTC - in response to Message 1333873.  


Observation: 1761 produced "WARNING: suboptimal workgroup size for PC_find_pulse_kernel_cl pass x" in stderr. From a quick look at the source I think that's just about wasteful memory alignment, shouldn't have much to do with speed.

Actually it can affect performace. Suboptibal workgroup size meand underloading GPU. Could you give link on result with that warning ?


It does apear on VHARs only.
Times are still very good.



With each crime and every kindness we birth our future.
ID: 1333970 · Report as offensive
JarrettH

Send message
Joined: 14 Nov 02
Posts: 97
Credit: 25,385,250
RAC: 95
Canada
Message 1334086 - Posted: 2 Feb 2013, 22:04:12 UTC

Core 2 Duo E6600 (2.4 GHz)
GeForce 550 Ti

Astropulse on CPU ~12 hours
Astropulse on GPU 90 minutes for the first unit
ID: 1334086 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1334172 - Posted: 3 Feb 2013, 5:03:31 UTC
Last modified: 3 Feb 2013, 5:05:26 UTC

One more report from me.

HD 4670

WinXP, Catalyst 11.12.

I did this one earlier, but didn't bother reporting it. I didn't expect MB to work since maximum work-group size is only 128, and I was right: r1760 seemed to get stuck on compiling the FFT plans. The bin_V6 file was compiled okay.

This is the same AGP card that had been plagued with several invalid AP results in the past (you might remember helping me with this one, Raistmer). Interesting that in recent months, it seems to have stabilised enough on r1363 and r1761 is also working okay so far. I noticed that Catalyst 12.1 produces a lot more invalid results than Catalyst 11.12.


HD 4850

WinXP, Catalyst 11.12.

This one I have set-up on the computer I made for my brother. No work has been done on the Radeon card for a long, long time, but so far the stability issues I remembered from previously seem to be no longer present. At least from what we've seen so far. AP works great, MB r1760/r1761 also running without apparent issue. But the work-units will take a while to complete, so can't comment on their correctness just yet.
Soli Deo Gloria
ID: 1334172 · Report as offensive
Profile cov_route
Avatar

Send message
Joined: 13 Sep 12
Posts: 342
Credit: 10,270,618
RAC: 0
Canada
Message 1334291 - Posted: 3 Feb 2013, 15:36:28 UTC

TLDR: Only 2 sample points: r1761 run time is 35-55% longer than r390 on a 6670.

Couple of test MB wu's:

HD6670, Phenom II 945, PCI Express 1.1 x16, Win 7 64, Catalyst 12.8

Quick timetable 
 
WU : 04ja13ag.2565.9474.14.10.95.wu 
MB6_win_x86_SSE3_OpenCL_ATi_HD5_r390.exe :
  Elapsed 2260.388 secs
      CPU 159.979 secs
MB7_win_x86_SSE_OpenCL_ATi_r1761.exe  :
  Elapsed 3503.245 secs, speedup: -54.98%  ratio: 0.65x
      CPU 149.901 secs, speedup: 6.30%  ratio: 1.07x
 
WU : 17dc12ad.28185.12907.8.10.253.wu 
MB6_win_x86_SSE3_OpenCL_ATi_HD5_r390.exe :
  Elapsed 349.442 secs
      CPU 45.037 secs
MB7_win_x86_SSE_OpenCL_ATi_r1761.exe  :
  Elapsed 472.977 secs, speedup: -35.35%  ratio: 0.74x
      CPU 46.301 secs, speedup: -2.81%  ratio: 0.97x


ID: 1334291 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

Message boards : Number crunching : OpenCL apps are available for download on Lunatics


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.