ATI OpenCL MultiBeam 6.10 problem..

Message boards : Number crunching : ATI OpenCL MultiBeam 6.10 problem..
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1085693 - Posted: 10 Mar 2011, 14:31:00 UTC - in response to Message 1085687.  

It's an HD 3300 (integrated video), so he can't remove it, he can eithier just disable Boinc's use of it, and still use it as his Main display, (as i suggested)
or he disables it totally in the Bios and uses the HD5870 for the Display instead, (as Skildude suggested)

Claggy

But if he doesn't have a <use_all_gpus> - which is a BOINC-only config, nothing to do with any OS usage - shouldn't BOINC only try to run GPU apps on the "best" card automatically, with no further work?
ID: 1085693 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1085696 - Posted: 10 Mar 2011, 14:42:38 UTC - in response to Message 1085693.  
Last modified: 10 Mar 2011, 14:52:22 UTC

It's an HD 3300 (integrated video), so he can't remove it, he can eithier just disable Boinc's use of it, and still use it as his Main display, (as i suggested)
or he disables it totally in the Bios and uses the HD5870 for the Display instead, (as Skildude suggested)

Claggy

But if he doesn't have a <use_all_gpus> - which is a BOINC-only config, nothing to do with any OS usage - shouldn't BOINC only try to run GPU apps on the "best" card automatically, with no further work?


Yes, Boinc should only use the best card, or is it the first card?

I've now looked at his host at Milkyway, where it's producing inconclusive results,

Here the HD5870 is device 1 (in CAL Speak, as Opposed to device 0 in OpenCL Speak):

instructed by BOINC client to use device 1
CPU: AMD Phenom(tm) II X6 1090T Processor (6 cores/threads) 3.71266 GHz (189ms)

CAL Runtime: 1.4.900
Found 2 CAL devices

Device 0: ATI Radeon HD2350/2400/3200/4200 (RV610/RV620) 341 MB local RAM (remote 64 MB cached + 288 MB uncached)
GPU core clock: 700 MHz, memory clock: 667 MHz
40 shader units organized in 1 SIMDs with 8 VLIW units (5-issue), wavefront size 32 threads
not supporting double precision

Device 1: ATI Radeon HD5800 series (Cypress) 2048 MB local RAM (remote 64 MB cached + 128 MB uncached)
GPU core clock: 850 MHz, memory clock: 1200 MHz
1600 shader units organized in 20 SIMDs with 16 VLIW units (5-issue), wavefront size 64 threads
supporting double precision

Device 1 not available or not supported.

PLEASE CHECK YOUR HARDWARE AND BOINC CONFIGURATION!

Trying to reassign the WU to another device.
1 WUs already running on GPU 1
Starting WU on GPU 1


So device 0 needs to be disabled not device 1.

Claggy
ID: 1085696 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1085697 - Posted: 10 Mar 2011, 14:57:04 UTC - in response to Message 1085696.  

Yes, Boinc should only use the best card, or is it the first card?

BOINC is designed/written to use the best card only - as many interations of it as there are - plus anything within (IIRC) 30% or so of 'best'. Sequence doesn't matter.

I remember problems in the early days when, for example, the drivers evaluated the two halves of a NV295 slightly differently (less memory on one side), and BOINC would only use one half - until the 30% 'near miss' was introduced.

For the NV world, I remember David writing explicitly what was compared, and in what order - something like Compute Capability first, then memory, then shader count, lastly raw clock speed. Or something like that - posting unchecked from memory only.

I don't know whether a similar value-judgement is made for ATI cards - I haven't got one, and I haven't followed developments there so closely. Might call for a bit of code-reading to find out if it is (I expect so), and what metrics are considered if so.
ID: 1085697 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1085699 - Posted: 10 Mar 2011, 14:59:15 UTC - in response to Message 1085687.  
Last modified: 10 Mar 2011, 15:04:54 UTC

It's an HD 3300 (integrated video), so he can't remove it, he can eithier just disable Boinc's use of it, and still use it as his Main display, (as i suggested)
or he disables it totally in the Bios and uses the HD5870 for the Display instead, (as Skildude suggested)

Claggy

i'm gonna give the former a try first. if a simple cc_config.xml file can keep BOINC from trying to use a 2nd GPU (GPU_1), regardless of whether that GPU is the integrated HD 3300 or a duplicate of the 5870, then that would be the ideal solution, as it would give me the most flexibility. if BOINC is recognizing GPU_1 as the HD 3300 integrated video, then the cc_config.xml file will allow me to leave the HD 3300 enabled and continue to use it for my display. however i doubt this is even the case, as i've shown above that the two GPU's recognized server-side are both 5870's. either way, i see no reason for this "cc_config.xml file" fix not to work. so i'll try it first. if all else fails, then i'll consider disabling the integrated video altogether and using the 5870 for crunching and display purposes.




The BOINC server database doesn't store proper records for each and every GPU card in our systems - would that it did. All you get in the host_detail display is the "best" GPU of each type, and a count of the total number. You should see much better information in your local message log at startup - the BOINC client does keep track of the different cards, it just turns into a summary when it gets to the server.

Actually, BOINC shouldn't be using the secondary HD 3300 card at all, unless you'ce consciously put in a <use_all_gpus> directive. I haven't read the whole thread, but from what Claggy says, you'd be better off taking that back out again.

i most certainly did not go out of my way to add a <use_all_gpus> directive to an .xml file of any sort. my unsure about the situation b/c 1) while my host details show the best GPU (the HD 5870) as you suggested it would, it shows not 1 but 2 of them, and 2) my local message log on startup shows recognition of both the 5870 discrete GPU and the 3300 integrated GPU. so i really don't know whether the 2nd recognized GPU (GPU_1) is in fact the 5870 or the 3300. but it should matter which one it is anyways b/c the cc_config.xml file should disable it in BOINC.



Yes, Boinc should only use the best card, or is it the first card?

I've now looked at his host at Milkyway, where it's also erroring out Wu's

i don't think those MW@H tasks errored out. they just haven't been validated yet. yes, i did get some errored WU's when i first started testing the 5870 w/ MW@H tasks several days ago. but since then all of my MW@H tasks have been crunching to completion without error. at any rate, when i suspend S@H and resume MW@H, the 5870 crunches 2 MW@H tasks at a time and seems to have no problems doing it. i certainly don't encounter errors like i have been with S@H.


i guess the bottom line is that i need to experiment with the cc_config.xml file and figure out if disabling one GPU in BOINC is going to make S@H work correctly but mess things up for MW@H, and if disabling the other GPU on BOINC is going to make MW@H work correctly but mess things up for S@H. i'll also check for device #'s in the startup lines of the BOINc message log.
ID: 1085699 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1085701 - Posted: 10 Mar 2011, 15:00:08 UTC - in response to Message 1085696.  

So device 0 needs to be disabled not device 1.

Claggy

Or hopefully no devices need to be disabled, if he can revert to 'best' only.

If anything needs to be disabled, I'd use the device # from BOINC's own enumeration in the startup lines of the message log.
ID: 1085701 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1085703 - Posted: 10 Mar 2011, 15:07:26 UTC - in response to Message 1085699.  

... while my host details show the best GPU (the HD 5870) as you suggested it would, it shows not 1 but 2 of them,

It's actually trying to say "Your best card is a HD 5870, and you have two GPUs in total" - but not making a very good job of it. It doesn't actually tell you anything at all about the second (or subsequent card) - neither that it is the same as the first, nor that is different.

my local message log on startup shows recognition of both the 5870 discrete GPU and the 3300 integrated GPU. so i really don't know whether the 2nd recognized GPU (GPU_1) is in fact the 5870 or the 3300.

Yes, go by what the local message log is telling you - that's closest to the scene of the action. Doesn't it have device numbers alongside the card descriptions? Why not post a copy of the startup section here for us to pore over.
ID: 1085703 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1085712 - Posted: 10 Mar 2011, 15:28:24 UTC

OK, here's some code from coproc_detect.

For NVidia GPUs:
129 // return 1/-1/0 if device 1 is more/less/same capable than device 2. 
130 // factors (decreasing priority): 
131 // - compute capability 
132 // - software version 
133 // - memory 
134 // - speed 
135 // 
136 // If "loose", ignore FLOPS and tolerate small memory diff 
137 // 
138 int cuda_compare(COPROC_CUDA& c1, COPROC_CUDA& c2, bool loose) { 


For ATI GPUs:
592 // criteria: 
593 // 
594 // - double precision support 
595 // - local RAM 
596 // - speed 
597 // 
598 int ati_compare(COPROC_ATI& c1, COPROC_ATI& c2, bool loose) { 


'loose' applies a tolerance of +1.4x, -0.7x, to video RAM size only.
ID: 1085712 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1085717 - Posted: 10 Mar 2011, 15:40:23 UTC - in response to Message 1085703.  

Yes, go by what the local message log is telling you - that's closest to the scene of the action. Doesn't it have device numbers alongside the card descriptions? Why not post a copy of the startup section here for us to pore over.

b/c i'm currently at work, away from the host we're trying to troubleshoot right now. so i won't be able to check for device #'s in the BOINC message log until i get home this evening...but when i do, i'll copy it and post it up for scrutiny.
ID: 1085717 · Report as offensive
Profile dnolan
Avatar

Send message
Joined: 30 Aug 01
Posts: 1228
Credit: 47,779,411
RAC: 32
United States
Message 1085723 - Posted: 10 Mar 2011, 15:56:41 UTC

FYI, in my dual HD 5870 system, the lines look like this:
[---] ATI GPU 0: ATI Radeon HD 5800 series (Cypress) ...
[---] ATI GPU 1: ATI Radeon HD 5800 series (Cypress) ...

-Dave
ID: 1085723 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1085870 - Posted: 10 Mar 2011, 23:29:20 UTC - in response to Message 1085723.  

well i just got home from work and started to do some investigating. like i said previously, i never went out of my way to give BOINC a <use_all_gpus> directive. however, upon looking at the default cc_config.xml file, i noticed it does contain a <use_all_gpus> directive...here's what it looks like:
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>



...and here's the whole start-up section for scrutiny:
Starting BOINC client version 6.10.58 for windows_intelx86
Config: use all coprocessors
log flags: file_xfer, sched_ops, task
Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
Data directory: D:\Documents and Settings\All Users\Application Data\BOINC
Running under account Eric
Processor: 6 AuthenticAMD AMD Phenom(tm) II X6 1090T Processor [Family 16 Model 10 Stepping 0]
Processor: 512.00 KB cache
Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow
OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
Memory: 3.00 GB physical, 4.84 GB virtual
Disk: 109.47 GB total, 96.27 GB free
Local time is UTC -5 hours
ATI GPU 0: ATI Radeon HD 2300/2400/3200 (RV610) (CAL version 1.4.900, 341MB, 56 GFLOPS peak)
ATI GPU 1: ATI Radeon HD5800 series (Cypress) (CAL version 1.4.900, 2048MB, 2720 GFLOPS peak)
SETI@home Found app_info.xml; using anonymous platform
Einstein@Home URL http://einstein.phys.uwm.edu/; Computer ID 3897627; resource share 50
lhcathome URL http://lhcathome.cern.ch/lhcathome/; Computer ID 9906494; resource share 900
Milkyway@home URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 267861; resource share 50
SETI@home URL http://setiathome.berkeley.edu/; Computer ID 5800349; resource share 50
SETI@home General prefs: from SETI@home (last modified 10-Mar-2011 07:26:58)
SETI@home Computer location: school
General prefs: using separate prefs for school
Reading preferences override file
Preferences:
max memory usage when active: 3070.10MB
max memory usage when idle: 3070.10MB
max disk usage: 10.00GB
(to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
Not using a proxy


ID: 1085870 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1085875 - Posted: 10 Mar 2011, 23:55:04 UTC - in response to Message 1085870.  

well i just got home from work and started to do some investigating. like i said previously, i never went out of my way to give BOINC a <use_all_gpus> directive. however, upon looking at the default cc_config.xml file, i noticed it does contain a <use_all_gpus> directive...here's what it looks like:
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>


There's actually no such thing as a default cc_config.xml file. BOINC doesn't supply one, and 99.9% of users run BOINC without one. Probably, some well-meaning user of this message board suggested that you create one, so long ago that you've forgotten (what's the 'modified' datestamp on that file?).

Just delete the file, restart BOINC (you have to restart, just re-reading the file isn't enough for that particular change), and things should be easier from there on. You'll lose the

Config: use all coprocessors

line from messages, for a start.
ID: 1085875 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1085881 - Posted: 11 Mar 2011, 0:13:52 UTC - in response to Message 1085870.  

well i just got home from work and started to do some investigating. like i said previously, i never went out of my way to give BOINC a <use_all_gpus> directive. however, upon looking at the default cc_config.xml file, i noticed it does contain a <use_all_gpus> directive...here's what it looks like:
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>


There is no such thing as default cc_config.xml
Only You (or another person at the computer) put it there manually!

BOINC installation do not have cc_config.xml file.


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1085881 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1085967 - Posted: 11 Mar 2011, 3:47:39 UTC - in response to Message 1085875.  

There's actually no such thing as a default cc_config.xml file. BOINC doesn't supply one, and 99.9% of users run BOINC without one. Probably, some well-meaning user of this message board suggested that you create one, so long ago that you've forgotten (what's the 'modified' datestamp on that file?).

Just delete the file, restart BOINC (you have to restart, just re-reading the file isn't enough for that particular change), and things should be easier from there on. You'll lose the

Config: use all coprocessors

line from messages, for a start.

thanks for pointing that out. is it possible that the Lunatics v0.37 unified installer placed that cc_config.xml file in the BOINC data directory? i just installed it a few days ago. either way, i removed the file from the directory and restarted BOINC. same thing as before - one task begins crunching without a problem, but as soon as a 2nd tries to start, it errors out. i have to keep the stack of them suspended and resume one at a time, making sure that i don't resume one before another is finished crunching lol.
ID: 1085967 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1085989 - Posted: 11 Mar 2011, 4:54:34 UTC - in response to Message 1085967.  
Last modified: 11 Mar 2011, 5:25:27 UTC

is it possible that the Lunatics v0.37 unified installer placed that cc_config.xml file in the BOINC data directory?


No

All changes that it does are only in ...\projects\setiathome.berkeley.edu\


i removed the file from the directory and restarted BOINC. same thing as before - one task begins crunching without a problem, but as soon as a 2nd tries to start, it errors out.


"from the directory" - which directory exactly?
Copy/Paste the full path here.


If you don't have cc_config.xml file in the BOINC data directory and you have two distinctly different GPUs BOINC is not supposed to start second GPU task (on the weaker GPU).

And you can check that the line "Config: use all coprocessors" in the messages is (or is not?) disappeared.


BOINC will start second GPU task (on the same GPU) if You manually edited in app_info.xml the number "1" to "0.5"in the following:
<coproc>
   <type>ATI</type>
   <count>1</count>
</coproc>


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1085989 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1086009 - Posted: 11 Mar 2011, 8:08:10 UTC - in response to Message 1085989.  


I know only one tool (Script) that will create cc_config.xml automatically
but I doubt you know/use it:

Script for disabling/enabling GPUs on the fly
http://setiathome.berkeley.edu/forum_thread.php?id=62127&nowrap=true#1064910


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1086009 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1086028 - Posted: 11 Mar 2011, 9:35:16 UTC - in response to Message 1085967.  

is it possible that the Lunatics v0.37 unified installer placed that cc_config.xml file in the BOINC data directory?

No, it isn't.

You can examine the contents of the Lunatics installer with 7-zip or any compression/archive tool that supports LZMA compression - you'll seel all the files it contains, and cc_config.xml isn't one of them.
ID: 1086028 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1086035 - Posted: 11 Mar 2011, 10:25:34 UTC - in response to Message 1085967.  

Try disabling the HD3300 now with cc_config.xml i posted earlier, (device 0),
Boinc doesn't know which GPU's are OpenCL capable and which aren't,
so will continue and try to run an instance on each GPU, and fail,

Claggy
ID: 1086035 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1086040 - Posted: 11 Mar 2011, 10:47:41 UTC - in response to Message 1086035.  

I'd like to see him take one step at a time, and verify explicitly whether or not "Config: use all coprocessors" has been properly eliminated from his startup messages.

The two cards that BOINC detected:

ATI GPU 0: ATI Radeon HD 2300/2400/3200 (RV610) (CAL version 1.4.900, 341MB, 56 GFLOPS peak)
ATI GPU 1: ATI Radeon HD5800 series (Cypress) (CAL version 1.4.900, 2048MB, 2720 GFLOPS peak)

are so dissimilar that they should always fail ati_compare, no matter whether 'loose' is specified.

If <use_all_gpus> has been properly de-activated, yet two tasks are still being attempted on different gpus (which isn't completely clear from his description), then we have a BOINC bug on our hands.
ID: 1086040 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1086041 - Posted: 11 Mar 2011, 11:01:06 UTC

Aaaaaarrgghh! He's using v6.10.58, and that doesn't do ATI properly yet - no ati_compare.

Upgrade to v6.12.18, or use Claggy's workround - I'm outa here.
ID: 1086041 · Report as offensive
Profile Sunny129
Avatar

Send message
Joined: 7 Nov 00
Posts: 190
Credit: 3,163,755
RAC: 0
United States
Message 1086048 - Posted: 11 Mar 2011, 12:32:29 UTC - in response to Message 1085989.  
Last modified: 11 Mar 2011, 12:39:23 UTC

i removed the file from the directory and restarted BOINC. same thing as before - one task begins crunching without a problem, but as soon as a 2nd tries to start, it errors out.


"from the directory" - which directory exactly?
Copy/Paste the full path here.

D:\Documents and Settings\All Users\Application Data\BOINC

If you don't have cc_config.xml file in the BOINC data directory and you have two distinctly different GPUs BOINC is not supposed to start second GPU task (on the weaker GPU).

And you can check that the line "Config: use all coprocessors" in the messages is (or is not?) disappeared.

yes, the "config: use all coprocessors" did not reappear in the start-up dialog after removing the cc_config.xml file from the above directory and restarting BOINC.

Aaaaaarrgghh! He's using v6.10.58, and that doesn't do ATI properly yet - no ati_compare.

Upgrade to v6.12.18, or use Claggy's workround - I'm outa here.

hmm...i didn't even know there was a BOINC v6.12.18. even the BOINC website still only has v6.10.58 for download. where would i even get that? a google search doesn't even bring anything up about v6.12.18?

*EDIT* - i found BOINC v6.12.18 at the BOINC website...it wasn't easy to find. at any rate, i won't have time to install and test it this morning before work, so it'll have to wait until this evening. thanks again for the tips everyone, and i'll keep you posted on my progress.


...btw Claggy, i did put the alternative cc-config.xml file you suggested above into the BOINC data directory after removing the one with the <use_all_gpus> directive in it. and that actually caused all tasks to error out, including the first one, which i wasn't have a problem with before.
ID: 1086048 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : ATI OpenCL MultiBeam 6.10 problem..


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.