ATI OpenCL MultiBeam 6.10 problem..

Message boards : Number crunching : ATI OpenCL MultiBeam 6.10 problem..
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1086502 - Posted: 12 Mar 2011, 18:57:38 UTC - in response to Message 1086495.  

Pepi:
Disabled onboard GPU, now crunching on 5770 :)

That was five pages and 18 days before "Waiting for fix".
ID: 1086502 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1086533 - Posted: 12 Mar 2011, 21:10:54 UTC - in response to Message 1086502.  
Last modified: 12 Mar 2011, 21:19:38 UTC

Pepi:
Disabled onboard GPU, now crunching on 5770 :)

That was five pages and 18 days before "Waiting for fix".

sorry, its a bit difficult for me to follow a thread i can't read...but if i'm understanding you correctly, it sounds like Pepi initially disabled the onboard GPU (whether in BOINC or in Windows altogether - i'm not sure) to fix his problem, but is still on the hunt for a fix that won't force him to disable his onboard GPU?

in the mean time, i've got a few hours before the last of my 6 CPU tasks finishes. all the "ready to start" tasks are manually suspended. as soon as the rest of the running tasks are complete and uploaded, i'll report them to their respective project servers, terminate BOINC, and back up the BOINC data directory. i may get to installing v6.12.18 tonight, i may not...we'll see.


*EDIT* - quick interim question: how much faster does an Astropulse WU take on a GPU than it does on a CPU? i know its going to vary quite a bit depending on the GPU and the CPU, but give me a ballpark figure, in terms of orders of magnitude i suppose...
ID: 1086533 · Report as offensive
Jamie
Volunteer tester

Send message
Joined: 5 Apr 06
Posts: 162
Credit: 9,867,955
RAC: 0
United Kingdom
Message 1086536 - Posted: 12 Mar 2011, 21:32:45 UTC - in response to Message 1086533.  
Last modified: 12 Mar 2011, 21:38:40 UTC


*EDIT* - quick interim question: how much faster does an Astropulse WU take on a GPU than it does on a CPU? i know its going to vary quite a bit depending on the GPU and the CPU, but give me a ballpark figure, in terms of orders of magnitude i suppose...


Ok, some tasks I've completed on my 5670:
0% blanked - 103 minutes vs 41.7 hours
0% blanked - 102 minutes vs 31.3 hours

As this card is a middle of the range ATI card you *should* see some improvements on yours
fasted my 1090t has run a CPU AP task is around 26 hours but this was a while ago so may not be entirely accurate on that

EDIT: If your weren't sure the higher % blanking the slower the task runs
ID: 1086536 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1086586 - Posted: 13 Mar 2011, 0:41:53 UTC

For now Sunny129 has only 2 solutions that allows crunching for SETI:
1) to disable HD3xxx GPU in BIOS
2) to disable BOINC's GPU1 in BOINC.

One with mixed ATI/NV GPUs and working configs - what numbers BOINC gives to NV and ATI GPUs and what numbers app reports?
ID: 1086586 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1086604 - Posted: 13 Mar 2011, 1:21:24 UTC - in response to Message 1086586.  

One with mixed ATI/NV GPUs and working configs - what numbers BOINC gives to NV and ATI GPUs and what numbers app reports?


Boinc reports:

13/03/2011 01:11:21 NVIDIA GPU 0: GeForce GTX 460 (driver version 26724, CUDA version 3020, compute capability 2.1, 993MB, 717 GFLOPS peak)
13/03/2011 01:11:21 ATI GPU 0: ATI Radeon HD5700 series (Juniper) (CAL version 1.4.1016, 1024MB, 1360 GFLOPS peak)

r177_HD5 reports:

Number of period iterations for PulseFind setted to:2
Number of app instances per device setted to:1
Running on device number: 0
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: NVIDIA Corporation
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns 0 device, slots 0 to 0 (including) will be checked
Used slot is 0; Info : Building Program (clBuildProgram):main kernels: OK code 0

x32f reports:

setiathome_CUDA: Found 1 CUDA device(s):
Device 1: GeForce GTX 460, 993 MiB, regsPerBlock 32768
computeCap 2.1, multiProcs 7
clockRate = 1600000
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 460 is okay
SETI@home using CUDA accelerated device GeForce GTX 460
Priority of process raised successfully
Priority of worker thread raised successfully

Physical location of GPU's is GTX460 is in 1st slot, HD5770 in 2nd slot.
But i could swear that at one point Cuda GPU was device 0 (sometime around Cat 10.2)

Claggy
ID: 1086604 · Report as offensive
Jamie
Volunteer tester

Send message
Joined: 5 Apr 06
Posts: 162
Credit: 9,867,955
RAC: 0
United Kingdom
Message 1086697 - Posted: 13 Mar 2011, 11:03:10 UTC - in response to Message 1086586.  
Last modified: 13 Mar 2011, 11:12:44 UTC


One with mixed ATI/NV GPUs and working configs - what numbers BOINC gives to NV and ATI GPUs and what numbers app reports?


5670 is in slot 1, 465 in slot 2

Boinc reports (6.10.58):
NVIDIA GPU 0: GeForce GTX 465 (driver version unknown, CUDA version 3020, compute capability 2.0, 994MB, 1056 GFLOPS peak)
ATI GPU 0: ATI Radeon HD5x00 series (Redwood) (CAL version 1.4.1353, 1024MB, 620 GFLOPS peak)


r516 reports (r177_HD5 reports the same):
DATA_CHUNK_UNROLL setted to:12 
CPU affinity ajustment will be skipped 
FFA thread block override value:6144 
FFA thread fetchblock override value:1536 
Running on device number: 0 
Priority of worker thread raised successfully 
Priority of process adjusted successfully, high priority class used 
OpenCL platform detected: Advanced Micro Devices, Inc. 
BOINC assigns 0 device, slots 0 to 0 (including) will be checked 
Used slot is 0;	AstroPulse v. 5.06 
Non-graphics	FFTW	USE_CONVERSION_OPT


x32f reports:
setiathome_CUDA: Found 1 CUDA device(s): 
  Device 1: GeForce GTX 465, 993 MiB, regsPerBlock 32768 
     computeCap 2.0, multiProcs 11  
     clockRate = 1500000  
setiathome_CUDA: CUDA Device 1 specified, checking... 
   Device 1: GeForce GTX 465 is okay 
ID: 1086697 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1086699 - Posted: 13 Mar 2011, 11:25:49 UTC - in response to Message 1086604.  
Last modified: 13 Mar 2011, 11:31:52 UTC

OK, I think this supports my point, it's BOINC's bug, sorry Richard ;)
CAL-only and OpenCL devices shouldbe reported both as Device 0 in different classes. Just as NV and ATI GPUs. Cause they definitely belong to different types of hardware and should have different plan classes.

I could workout some workaround but proper solution is BOINC bug fix.


BTW, marking devices by brands (especially if brands pruduce many quite different devices) rather silly. We don't have Intel CPU plan class and AMD CPU plan class (I hope for right reason, not just because nobody implement those still ;) ). Hardware differs not by their manufacturers but by their abilities.

That is, in GPU area currently we have 3 big types of hardware:
1) CUDA
2) CAL
3) OpenCL

In turn, CUDA sub-divides on few generations.

As long as BOINC will ignore these facts science apps will ignore its directives to be optimal.
ID: 1086699 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1086702 - Posted: 13 Mar 2011, 11:55:04 UTC
Last modified: 13 Mar 2011, 11:57:44 UTC

What is going on now:

BOINC assigns second device to app.
App tries to follow BOINC suggestion and assigns itself to second OpenCL device, cause it's OpenCL app and it knows nothing about CUDA, CAL and so on and so forth. Surely it fails cause there is no second OpenCL device in system.

What I can do:
I can provide workaround that will once more ignore BOINC's suggestions and chose correct OpenCL device irregarding from what BOINC thinks it assigned to app.

What proper way to fix issue:
To list only compatible devices from BOINC to app. Compatibility should be checked vs specified app requirements in app_info. Maybe OpenCL plan class is already supported, then I suggest to change ati13ati plan class (it's generic CAL class) to OpenCL one.
ID: 1086702 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1086709 - Posted: 13 Mar 2011, 12:32:42 UTC - in response to Message 1086699.  

OK, I think this supports my point, it's BOINC's bug, sorry Richard ;)

Up to a point, I agree. But for the moment, we've only got data in for BOINC v6.10.58, which is over 8 months old (1 July 2010). At that point, ATI support was rudimentary, and OpenCL support non-existent. So, I don't think I would classify this as a 'bug': the situation is more that you are trying to shoehorn new capabilities into an obsolete platform. For it to happen at all, workarounds are going to be needed: what we've come across is a hardware platform (well, two instances, including the instance Pepi reported in beta testing) which defeat (are unsupported by) the current set of workarounds.

CAL-only and OpenCL devices shouldbe reported both as Device 0 in different classes. Just as NV and ATI GPUs. Cause they definitely belong to different types of hardware and should have different plan classes.

I could workout some workaround but proper solution is BOINC bug fix.

It's not quite as easy as that. At this stage in its development, BOINC is reporting hardware devices (0,1). OpenCL is enumeratiing those same devices as (-,0) - because the first one isn't OpenCL capable. You seem to be taking BOINC's "1" and looking for it in the OpenCL software enumeration, and aborting in a way which is not very kind to the project - because there's nothing to stop additional work being fetched, and aborted, until quota locks the host down to one per day.

Perhaps a (temporary) solution might be "If the BOINC device mumber passed to the application is beyond the end of the OpenCL enumeration list, decrement the OpenCL slot number before trying again".

BTW, marking devices by brands (especially if brands pruduce many quite different devices) rather silly. We don't have Intel CPU plan class and AMD CPU plan class (I hope for right reason, not just because nobody implement those still ;) ). Hardware differs not by their manufacturers but by their abilities.

Not an exact parallel. There are formal standards, and cross-licencing agreements, so that to a first approximation the same binary machine code will execute on both Intel and AMD processors (doesn't apply to the rarefied optimisations you guys are working with, of course, but they are designed for compatibility).

But for GPUs, there is no such design standard and no cross-licencing agreement. Competition - the law of the jungle - rules. ATI and NVidia have to be treated differently at the binary machine-code level. OpenCL is slowly emerging and evolving as an intermediary layer to paper over those differences, with all the inefficiencies that middle-management introduces.

That is, in GPU area currently we have 3 big types of hardware:
1) CUDA
2) CAL
3) OpenCL

In turn, CUDA sub-divides on few generations.

As long as BOINC will ignore these facts science apps will ignore its directives to be optimal.

Yes, I agree. OpenCL probably should be treated to a plan_class of its own. But that just shifts the problem upstairs by one level. BOINC will also have the problem of looking at Sunny's machine, and finding

ATI(0)
ATI(1)
OpenCL(0)

and will need to work out that ATI(1) is the same bit of silicon as OpenCL(0): should they both be available for scheduling at the same time? It's going to take a while before BOINC can be entrusted to resolve issues like that, and workarounds are going to be needed for the duration.

Even if only a simple disclaimer: "This application is liable to failure on hosts with a mixture of OpenCL capable and non-capable devices installed."

And I'm happy to try and write some of that up for the boinc_dev mailing list, if you like.
ID: 1086709 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1086716 - Posted: 13 Mar 2011, 12:47:56 UTC - in response to Message 1086702.  

What is going on now:

BOINC assigns second device to app.
App tries to follow BOINC suggestion and assigns itself to second OpenCL device, cause it's OpenCL app and it knows nothing about CUDA, CAL and so on and so forth. Surely it fails cause there is no second OpenCL device in system.

What I can do:
I can provide workaround that will once more ignore BOINC's suggestions and chose correct OpenCL device irregarding from what BOINC thinks it assigned to app.

What proper way to fix issue:
To list only compatible devices from BOINC to app. Compatibility should be checked vs specified app requirements in app_info. Maybe OpenCL plan class is already supported, then I suggest to change ati13ati plan class (it's generic CAL class) to OpenCL one.

As above, I would suggest only interfering with BOINC's hardware device number if it is greater than the highest-numbered OpenCL device enumerated.

That still wouldn't cope with the case (IMHO)

ATI(0)
CUDA(0)
OpenCL(0)
OpenCL(1)

I don't think fiddling with the <plan_class> name is going to help - that's really to do with server-side work allocation decisions. Since this is going to be running under anonymous platform for the forseeable future, how the work is fetched hardly matters.

What you need to be able to do is to include a block

	<coproc>
            <type>OpenCL</type>
            <count>1</count>
	</coproc>

but I don't think that's available, even in BOINC v6.12
ID: 1086716 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1086763 - Posted: 13 Mar 2011, 16:31:53 UTC - in response to Message 1086586.  
Last modified: 13 Mar 2011, 17:07:08 UTC

For now Sunny129 has only 2 solutions that allows crunching for SETI:
1) to disable HD3xxx GPU in BIOS
2) to disable BOINC's GPU1 in BOINC.

One with mixed ATI/NV GPUs and working configs - what numbers BOINC gives to NV and ATI GPUs and what numbers app reports?


I've thought of workaround for Pepi and Sunny129, if they set -instances_per_device 2 for r177 and enable both GPU's,
Boinc will start two OpenCL app instances, the app will now allow two instances to run on same OpenCL GPU,
just like the Milkyway app was doing, (Count should be left set at 1) and Boinc will be able to get DP work from Milkyway too,
(there's still a possibility that it won't work, the app will likely report: BOINC assigns 1 device, slots 2 to 3 (including) will be checked)

I don't think this can be done with the puplically available ATI OpenCL AP r456 app, the multiple instances wasn't introduced until later, around r505,
that app and the latest OpenCL app are only available to Beta tester's at the moment, so they'd have to disable AP work fetch,

I had a look through the top 2000 hosts at ATI/NV hosts, looks as if they all report an ATI as device 0, and the Nvidia as device 1,
if there's a 2nd Nvidia that's device 2, or if there's a 2nd ATI that's device 2,

Claggy
ID: 1086763 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1086808 - Posted: 13 Mar 2011, 18:17:32 UTC - in response to Message 1086709.  
Last modified: 13 Mar 2011, 18:20:06 UTC


And I'm happy to try and write some of that up for the boinc_dev mailing list, if you like.


Yes, I would like if you could handle this.
I will not come with this issue again, I have seen something like this particular issue long ago already and proposed new hardware device abstraction/generalization level (to allow describe new device types through app_info) and it was already ignored by BOINC devs.
They like their own workarounds and bandages instead of bigger generalization level.
Next problem will be in distinguishing Intel,ATI and NV OpenCL IMHO... And we already have problems with FERMI/non-FERMI in single host...
Then some new powerful hardware coprocessor emerges and BOINC will come via new round of incompatibilities and months of lost work for new devices....

Actually, each and every available GPU app (at last for SETI project) does its own hardware device enumeration, not rely on BOINC's one. BOINC's biggest aim is proper sheduling and work fetch. For this tasks it doesn't need to be able to enumerate each and every hardware inside host. It should be able to get info about available compute devices. This info could come in form of supplied device list - and BOINC not able to handle such form now - I see root of evil exactly in this fact. BOINC tries to get all info on its own... And sometimes it gets wrong/insufficient info. [And there is no way to correct it!]
ID: 1086808 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1086809 - Posted: 13 Mar 2011, 18:23:58 UTC - in response to Message 1086763.  
Last modified: 13 Mar 2011, 18:51:22 UTC

For now Sunny129 has only 2 solutions that allows crunching for SETI:
1) to disable HD3xxx GPU in BIOS
2) to disable BOINC's GPU1 in BOINC.

One with mixed ATI/NV GPUs and working configs - what numbers BOINC gives to NV and ATI GPUs and what numbers app reports?


I've thought of workaround for Pepi and Sunny129, if they set -instances_per_device 2 for r177 and enable both GPU's,
Boinc will start two OpenCL app instances, the app will now allow two instances to run on same OpenCL GPU,
just like the Milkyway app was doing, (Count should be left set at 1) and Boinc will be able to get DP work from Milkyway too,
(there's still a possibility that it won't work, the app will likely report: BOINC assigns 1 device, slots 2 to 3 (including) will be checked)

I don't think this can be done with the puplically available ATI OpenCL AP r456 app, the multiple instances wasn't introduced until later, around r505,
that app and the latest OpenCL app are only available to Beta tester's at the moment, so they'd have to disable AP work fetch,

I had a look through the top 2000 hosts at ATI/NV hosts, looks as if they all report an ATI as device 0, and the Nvidia as device 1,
if there's a 2nd Nvidia that's device 2, or if there's a 2nd ATI that's device 2,

Claggy


Worth to try.
ID: 1086809 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1087534 - Posted: 16 Mar 2011, 21:38:52 UTC

OK,

the last time i posted i was in the process of finishing the rest of my running work units so i could install BOINC v6.12.18. and so i did just that. i then removed the CC_config.xml file from the BOINC data directory and installed BOINC v6.12.18, being sure that it was installing to the existing directories. without placing a cc_config.xml file back in the BOINC data directory, i went ahead and ran BOINC. i checked the event log to see the following:

Starting BOINC client version 6.12.18 for windows_intelx86
log flags: file_xfer, sched_ops, task
Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.5
Data directory: D:\Documents and Settings\All Users\Application Data\BOINC
Running under account Eric
Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]
Processor: 512.00 KB cache
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow
OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
Memory: 3.00 GB physical, 4.84 GB virtual
Disk: 109.47 GB total, 95.33 GB free
Local time is UTC -5 hours
ATI GPU 0: (not used) ATI Radeon HD 2300/2400/3200 (RV610) (CAL version 1.4.900, 341MB, 56 GFLOPS peak)
ATI GPU 1: ATI Radeon HD 5800 series (Cypress) (CAL version 1.4.900, 2048MB, 2720 GFLOPS peak)

SETI@home | Found app_info.xml; using anonymous platform
Version change (6.10.58 -> 6.12.18)


...note that v6.12.18 now automatically recognizes the HD 3300 onboard GPU as "not used," unlike v6.10.58 (as seen below):

Starting BOINC client version 6.10.58 for windows_intelx86
log flags: file_xfer, sched_ops, task
Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
Data directory: D:\Documents and Settings\All Users\Application Data\BOINC
Running under account Eric
Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]
Processor: 512.00 KB cache
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow
OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
Memory: 3.00 GB physical, 4.84 GB virtual
Disk: 109.47 GB total, 95.33 GB free
Local time is UTC -5 hours
ATI GPU 0: ATI Radeon HD 2300/2400/3200 (RV610) (CAL version 1.4.900, 341MB, 56 GFLOPS peak)
ATI GPU 1: ATI Radeon HD5800 series (Cypress) (CAL version 1.4.900, 2048MB, 2720 GFLOPS peak)

SETI@home Found app_info.xml; using anonymous platform



...so what i did was run some S@H MB GPU tasks and some MW@H GPU tasks in the 5 different configurations i experimented with while using BOINC v6.10.58: no cc_config.xml file, <ignore_ati_dev>0</ignore_ati_dev>, <ignore_ati_dev>1</ignore_ati_dev>, <use_all_gpus>0</use_all_gpus>, and <use_all_gpus>1</use_all_gpus>. i documented and interpreted to some extent the behavior of those tasks/causes/implications to the best of my knowledge., and then compared them with their respective configurations using v6.10.58. so without further ado...

using BOINC v6.10.58, i have the following scenarios:

1) with no cc_config.xml file, a single S@H GPU task will crunch normally, but all subsequent S@H GPU tasks will immediately error out one after another if they aren't already "suspended" at the time the first task starts running. attempting to start any subsequent S@H GPU task while the first one is running will error out immediately upon resuming from suspension mode. also, MW@H tasks run normally and without errors. they run 2 at a time. there is also no problem fetching new work.

2) with the "<ignore_ati_dev>0</ignore_ati_dev>" cc_config.xml file, BOINC is forced to "ignore" GPU_0 (the HD 3300 onboard GPU), yet S@H GPU tasks error out immediately upon resuming from suspended mode. also, MW@H tasks run normally and without errors. they run 1 at a time in this configuration. there are no problems with fetching new work either.

3) with the <ignore_ati_dev>1</ignore_ati_dev> cc_config.xml file, BOINC is forced to "ignore" GPU_1 (the HD 5870 discrete GPU), yet S@H GPU tasks now run normally, and they do so on the "ignored" GPU. a single task crunches while all the subsequent tasks wait in the que "ready to start," and do not have to be manually controlled via suspending and resuming them in order to prevent them from erroring out. also, MW@H tasks seem to be running oddly. despite the fact that BOINC still sends requests for new work, the MW@H server won't send new work b/c it sees that GPU_1 (the HD 5870) is "ignored" in BOINC. this is not to say that tasks won't run or that they error out - if they already exist in the que when i switch to this configuration and restart BOINC, they continue to run 1 at a time...its just that no new work gets fetched.

4) with the "<use_all_gpus>1</use_all_gpus>" cc_config.xml file, S@H GPU tasks act the same way as they do when no cc_config.xml file is present - a single S@H GPU task will crunch normally, but all subsequent S@H GPU tasks will immediately error out one after another if they aren't already "suspended" at the time the first task starts running. attempting to start any subsequent S@H GPU task while the first one is running will error out immediately upon resuming from suspension mode. also, MW@H tasks run normally and without errors. they run 2 at a time in this configuration. there are no problems with fetching new work either.

5) with the "<use_all_gpus>0</use_all_gpus>" cc_config.xml file, S@H GPU tasks again act the same way as they do when no cc_config.xml file is present - a single S@H GPU task will crunch normally, but all subsequent S@H GPU tasks will immediately error out one after another if they aren't already "suspended" at the time the first task starts running. attempting to start any subsequent S@H GPU task while the first one is running will error out immediately upon resuming from suspension mode. also, MW@H tasks run normally and without errors. they run 2 at a time in this configuration. there are no problems with fetching new work either.

to sum up v6.10.58, configuration 3) seems to be the best for S@H, as it allows S@H GPU tasks to run normally without any errors or manual control. this kinda sucks b/c its also the one configuration that doesn't allow MW@H tasks to run properly, or more specifically, doesn't allow more work to be downloaded from the server. of course, in the mean time, if i need to switch from MW@H to S@H or vice versa at any time, i can just close BOINC and edit a config file or switch to different config file.


and now the BOINC v6.12.18 results that some of you have been waiting for...these scenarios seem to be similar those i experienced w/ BOINC v6.10.58, but different as well:

1) with no cc_config.xml file, S@H GPU tasks error out immediately upon resuming from suspended mode. Here is an example of one such task: http://setiathome.berkeley.edu/result.php?resultid=1836727839. i suspect that this is b/c v6.12.18 automatically recognizes the HD 3300 GPU as "not used" (unlike v6.10.58, which, without a cc_config.xml file, would allow at least one task run without error, though it didn't run error free). also, MW@H tasks run normally and without errors, just as with v6.10.58. they run 1 at a time in this configuration. there are no problems with fetching new work either.

2) with the "<ignore_ati_dev>0</ignore_ati_dev>" cc_config.xml file, BOINC is forced to "ignore" GPU_0 (the HD 3300 onboard GPU) just like it was with v6.10.58. instead of containing this line on start-up: ATI GPU 0: (not used) ATI Radeon HD 2300/2400/3200 (RV610) (CAL version 1.4.900, 341MB, 56 GFLOPS peak), the event log contained this line: ATI GPU 0 (ignored by config): ATI Radeon HD 2300/2400/3200 (RV610) (CAL version 1.4.900, 341MB, 56 GFLOPS peak). obviously "ignoring" GPU_0 has the same results as configuration 1 above in which GPU_0 is "not used", b/c the instant i resumed a S@H GPU task, it errored out. Here is an example of one such task: http://setiathome.berkeley.edu/result.php?resultid=1836667175. also, MW@H tasks run normally and without errors, just as with v6.10.58. they run 1 at a time in this configuration. there are no problems with fetching new work either.

3) with the "<ignore_ati_dev>1</ignore_ati_dev>" cc_config.xml file, BOINC is again forced to "ignore" GPU_1 (the HD 5870 discrete GPU). the event log upon start-up reads as follows:
ATI GPU 0: ATI Radeon HD 2300/2400/3200 (RV610) (CAL version 1.4.900, 341MB, 56 GFLOPS peak)
ATI GPU 1 (ignored by config): ATI Radeon HD 5800 series (Cypress) (CAL version 1.4.900, 2048MB, 2720 GFLOPS peak)

its interesting to note that GPU_0 (the HD 3300 onboard GPU) is no longer "not used" by BOINC, despite the fact that the only change i made - the <ignore_ati_dev>1</ignore_ati_dev> directive in the cc_config.xml file - should have only affected the HD 5870, not the HD 3300. just like with v6.10.58, S@H GPU tasks now run normally, and they do so on the "ignored" GPU. a single task crunches while all the subsequent tasks wait in the que "ready to start," and do not have to be manually controlled via suspending and resuming them in order to prevent them from erroring out. i doubt it'll shed any light on the situation, but here's what a successful task looks like: http://setiathome.berkeley.edu/result.php?resultid=1836667137. also, MW@H tasks seem to be running oddly. despite the fact that BOINC still sends requests for new work, the MW@H server won't send new work b/c it sees that GPU_1 (the HD 5870) is "ignored" in BOINC. this is not to say that tasks won't run or that they error out - if they already exist in the que when i switch to this configuration and restart BOINC, they continue to run 1 at a time...its just that no new work gets fetched.

4) with the "<use_all_gpus>1</use_all_gpus>" cc_config.xml file, BOINC is again forced to "use" both GPUs. but like the v6.10.58 configuration with no cc_config.xml file at all, a single S@H GPU task will crunch normally while all subsequent S@H GPU tasks immediately error out one after another unless they are already in "suspended" mode at the time the first task starts running. i'd link you to a task that errored out in this fashion, but the ones that did seem to have cleared the server already. also, MW@H run normally and without errors. they also run 2 at a time, unlike all other scenarios, in which they ran only 1 at a time. run time per WU is approx. the same when MW@H is running 2 tasks at once.

)5 with the "<use_all_gpus>0</use_all_gpus>" cc_config.xml file, the event log goes back to reading like it did with no cc_config.xml file at all:
ATI GPU 0: (not used) ATI Radeon HD 2300/2400/3200 (RV610) (CAL version 1.4.900, 341MB, 56 GFLOPS peak)
ATI GPU 1: ATI Radeon HD 5800 series (Cypress) (CAL version 1.4.900, 2048MB, 2720 GFLOPS peak)

...and upon resuming any S@H GPU task from suspended mode, it would error out immediately. here is an example of one such task: http://setiathome.berkeley.edu/result.php?resultid=1836667157. also, MW@H tasks run normally and without errors. they run 1 at a time in this configuration. there are no problems with fetching new work either.

to sum up v6.12.18, configuration 3) again seems to be the best for S@H, as it allows S@H GPU tasks to run normally without any errors or manual control. again, this kinda sucks b/c its also the one configuration that doesn't allow MW@H tasks to run properly, or more specifically, doesn't allow more work to be downloaded from the server.

in all honesty, i wouldn't run S@H and MW@H GPU apps at the same time unless i had 2 discrete OpenCL/FP64 capable cards in the same rig. as you all know by now, i only have one such GPU. so until i can afford another GPU cruncher, i just settle for making slight changes to the cc_config.xml file and restarting BOINC whenever i want to switch from S@H to MW@H or vice versa. but it would be nice if i could do that without having to edit/change the cc_config.xml file and restart BOINC. at any rate, there you have it - this is the way v6.12.18 behaves on my host.
ID: 1087534 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1087535 - Posted: 16 Mar 2011, 21:39:24 UTC - in response to Message 1086702.  

What I can do:
I can provide workaround that will once more ignore BOINC's suggestions and chose correct OpenCL device irregarding from what BOINC thinks it assigned to app.

i don't want you to go out of your way to write a workaround for just a handful of hosts with this problem...especially if your next statement might be a possible fix:

What proper way to fix issue:
To list only compatible devices from BOINC to app. Compatibility should be checked vs specified app requirements in app_info. Maybe OpenCL plan class is already supported, then I suggest to change ati13ati plan class (it's generic CAL class) to OpenCL one.

...though it seems Richard is a bit apprehensive about whether this will fix my problem or not. suppose its worth a shot - i vaguely understand what you're saying, but i'm unsure what specifically to do. am i understanding correctly that changes need to be made to S@H's app_info.xml file? if so, here's what mine looks like currently:
<app_info>
<app>
<name>setiathome_enhanced</name>
</app>
<file_info>
<name>AK_v8b_win_SSE3_AMD.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>603</version_num>
<file_ref>
<file_name>AK_v8b_win_SSE3_AMD.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>astropulse_v505</name>
</app>
<file_info>
<name>ap_5.05r409_SSE.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v505</app_name>
<version_num>505</version_num>
<file_ref>
<file_name>ap_5.05r409_SSE.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>astropulse_v505</name>
</app>
<file_info>
<name>ap_5.06_win_x86_SSE2_OpenCL_ATI_r456.exe</name>
<executable/>
</file_info>
<file_info>
<name>AstroPulse_Kernels.cl</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v505</app_name>
<version_num>506</version_num>
<avg_ncpus>0.01</avg_ncpus>
<max_ncpus>0.01</max_ncpus>
<plan_class>ati13ati</plan_class>
<cmdline>-ffa_block 8192 -ffa_block_fetch 2048</cmdline>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<flops>15987654321</flops>
<file_ref>
<file_name>ap_5.06_win_x86_SSE2_OpenCL_ATI_r456.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels.cl</file_name>
<copy_file/>
</file_ref>
</app_version>
<app>
<name>setiathome_enhanced</name>
</app>
<file_info>
<name>MB_6.10_win_SSE3_ATI_HD5_r177.exe</name>
<executable/>
</file_info>
<file_info>
<name>MultiBeam_Kernels.cl</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>610</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>0.05</max_ncpus>
<plan_class>ati13ati</plan_class>
<cmdline>-period_iterations_num 2 -instances_per_device 1</cmdline>
<flops>20987654321</flops>
<file_ref>
<file_name>MB_6.10_win_SSE3_ATI_HD5_r177.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>MultiBeam_Kernels.cl</file_name>
<copy_file/>
</file_ref>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
</app_version>
</app_info>

is it as simple as just replacing all instances of "<plan_class>ati13ati</plan_class>" with "<plan_class>OpenCL</plan_class>?"


For now Sunny129 has only 2 solutions that allows crunching for SETI:
1) to disable HD3xxx GPU in BIOS
2) to disable BOINC's GPU1 in BOINC.

One with mixed ATI/NV GPUs and working configs - what numbers BOINC gives to NV and ATI GPUs and what numbers app reports?


I've thought of workaround for Pepi and Sunny129, if they set -instances_per_device 2 for r177 and enable both GPU's,
Boinc will start two OpenCL app instances, the app will now allow two instances to run on same OpenCL GPU,
just like the Milkyway app was doing, (Count should be left set at 1) and Boinc will be able to get DP work from Milkyway too,
(there's still a possibility that it won't work, the app will likely report: BOINC assigns 1 device, slots 2 to 3 (including) will be checked)

I don't think this can be done with the puplically available ATI OpenCL AP r456 app, the multiple instances wasn't introduced until later, around r505,
that app and the latest OpenCL app are only available to Beta tester's at the moment, so they'd have to disable AP work fetch,

I had a look through the top 2000 hosts at ATI/NV hosts, looks as if they all report an ATI as device 0, and the Nvidia as device 1,
if there's a 2nd Nvidia that's device 2, or if there's a 2nd ATI that's device 2,

Claggy

which .xml file would i find this info in so i can give this a try?

TIA,
Eric
ID: 1087535 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1087559 - Posted: 16 Mar 2011, 22:24:54 UTC - in response to Message 1087535.  

...though it seems Richard is a bit apprehensive about whether this will fix my problem or not. suppose its worth a shot - i vaguely understand what you're saying, but i'm unsure what specifically to do. am i understanding correctly that changes need to be made to S@H's app_info.xml file?

I wouldn't say I was 'apprehensive' - I don't think it will make your situation any worse - but I am perhaps 'pessimistic' - I don't think it'll make it any better, either.

Thanks very much for testing all those combinations, and taking the time and trouble to write them up so thoroughly. I'm sure it's helped all of us to understand much more clearly what is going to be involved in getting these OpenCL applications to run on a wider range of host machines.

It's also sparked off quite a lively debate within the developer community. First the bad news:

Q.
> Am I correct in assuming that defining an "OpenCL" plan class in
> sched_customize.cpp should be enough for clients that use anonymous platform and
> declare an application with the same plan class name in an app_info
> specification? Or does the plan class name have to actually occur somewhere in
> the database?

A.
Not relevant. plan class names in app_info.xml are independent
of what the server uses.

-- David

The questioner is Bernd Machenschalk, Administrator and Developer for the Einstein@Home project, and the reply is from David Anderson, Director and developer of the whole BOINC platform.

And the good news:

We'll add support in the near future for

1) getting OpenCL library version info and passing it to the server;
2) passing OpenCL device IDs to applications.

-- David

(the same David)

So your efforts have, very directly, brought forward the date when there is proper support for mixed-GPU computers like yours, even if it isn't quite here yet.
ID: 1087559 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1087566 - Posted: 16 Mar 2011, 22:33:14 UTC - in response to Message 1087535.  
Last modified: 16 Mar 2011, 22:34:56 UTC

which .xml file would i find this info in so i can give this a try?

TIA,
Eric

Claggy proposed to edit this line <cmdline>-period_iterations_num 2 -instances_per_device 1</cmdline> setting 2 instead of bold 1.
it's app_info.xml in your SETi project dir.

Then to use BOINc config with both GPUs enabled.
I'm still unsure if this will fix your problem, but worth to try, at least to be sure :)

[2 ways that should work I listed earlier, unfortunately, both have their disadvantages]
ID: 1087566 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1087581 - Posted: 16 Mar 2011, 23:08:39 UTC - in response to Message 1087559.  

Thanks very much for testing all those combinations, and taking the time and trouble to write them up so thoroughly. I'm sure it's helped all of us to understand much more clearly what is going to be involved in getting these OpenCL applications to run on a wider range of host machines.

not a problem...its the least i could do considering all interest people have taken in this thread and all the help i've received so far. while nobody has yet been able to solve my problem straight up, all of you have helped me to understand MUCH better why i have the problem i have. on that note, if there is anything specific that anyone would like me to test on this host (regarding the S@H MB GPU app and BOINC v6.12.18), let me know...


It's also sparked off quite a lively debate within the developer community. First the bad news:

Q.
> Am I correct in assuming that defining an "OpenCL" plan class in
> sched_customize.cpp should be enough for clients that use anonymous platform and
> declare an application with the same plan class name in an app_info
> specification? Or does the plan class name have to actually occur somewhere in
> the database?

A.
Not relevant. plan class names in app_info.xml are independent
of what the server uses.

-- David

The questioner is Bernd Machenschalk, Administrator and Developer for the Einstein@Home project, and the reply is from David Anderson, Director and developer of the whole BOINC platform.

And the good news:

We'll add support in the near future for

1) getting OpenCL library version info and passing it to the server;
2) passing OpenCL device IDs to applications.

-- David

(the same David)

So your efforts have, very directly, brought forward the date when there is proper support for mixed-GPU computers like yours, even if it isn't quite here yet.

excellent news. it feels good to know that my host's problems are being talked about by the developers and actually helping to forward the progress distributed computing using mixed-GPU hosts. i'm assuming no "time table" was suggested or laid out though? also, is this conversation taking place on the lunatics forums?


Claggy proposed to edit this line <cmdline>-period_iterations_num 2 -instances_per_device 1</cmdline> setting 2 instead of bold 1.
it's app_info.xml in your SETi project dir.

Then to use BOINc config with both GPUs enabled.
I'm still unsure if this will fix your problem, but worth to try, at least to be sure :)

[2 ways that should work I listed earlier, unfortunately, both have their disadvantages]

thanks for the clarification. i'll give this a try and report back...
ID: 1087581 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1087590 - Posted: 16 Mar 2011, 23:41:30 UTC - in response to Message 1087581.  

also, is this conversation taking place on the lunatics forums?

On BOINC dev mail list
ID: 1087590 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1087591 - Posted: 16 Mar 2011, 23:44:32 UTC - in response to Message 1087581.  

... also, is this conversation taking place on the lunatics forums?

No, the Lunatics group and forums are a loose collection of interested volunteers, with no official standing. The official main-line developers prefer to use email - which I find odd, since they write the message board software for the rest of us to use. The quotes came from an email distribution list called 'boinc_dev', described as

For people developing, debugging or porting the BOINC software (client, server, and Web). Do NOT post questions about how to use the software.
ID: 1087591 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : ATI OpenCL MultiBeam 6.10 problem..


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.