Message boards :
Number crunching :
ATI OpenCL MultiBeam 6.10 problem..
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Sunny129 Send message Joined: 7 Nov 00 Posts: 190 Credit: 3,163,755 RAC: 0 |
May I, yet again, plead that you - everyone - try to understand what is going on, and propose changes that are appropriate both to the hardware being used, and the skill level of the person using it? We all might learn something then. i appreciate you looking out for those of us who aren't well versed in the workings of BOINC and S@H itself. but at the same time i understand that some of the brighter and more knowledgeable folks on the forums will have a more difficult time "dumming it down" to laymen's terms than they will just explaining things as best as they know how. sure, alot of it may seem like Greek to me at first, but even if i have to work a little harder to understand all the various explanations thrown at me, believe me i'm still learning lots from all of you...and for that, i thank you all. If setting <ignore_ati_dev>1</ignore_ati_dev> results in the device which BOINC itself describes as device 0 being ignored, and device 1 being used without error, then that's good to know. I'm not quite sure how we're going to write it up in the FAQ, though. sorry, i should have specified that i didn't upgrade to BOINC v6.12.18 yet - i'm still on v6.10.58. the only thing i changed was the device # in the <ignore_ati_dev>n</ignore_ati_dev> directive of the cc_config.xcml file from 0 to 1. with regard to the start-up log, it oddly reads like this now: Starting BOINC client version 6.10.58 for windows_intelx86 |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Positively my last for tonight - but somebody with access to two mis-matched ATI cards, and also access to the boinc_alpha bug-reporting list, needs to check that out under v6.12.18 and act accordingly. |
Sunny129 Send message Joined: 7 Nov 00 Posts: 190 Credit: 3,163,755 RAC: 0 |
rats! well i thought all my problems had gone away, save for the minor change in the way MW@H tasks are crunched (but that really wasn't a problem anyways). my new problem is that, after my 5870 finished all the MW@H tasks in the que (i accidentally had MW@H set to "no new tasks"), BOINC would no longer recognize my 5870 as a double precision GPU, and consequently it would not download new MW@H tasks (even after i removed the "no new tasks" limitation). the log shows the following: Milkyway@home resumed by user why MW@H was working fine just a few minutes ago is beyond me. perhaps MW@H recognizes the 5870 as GPU_1 (the GPU i disabled in the cc_config.xml file) even though S@H sees it the other way around?..in other words, maybe the "ATI GPU 1 (ignored by config): ATI Radeon HD5800 series (Cypress) (CAL version 1.4.900, 2048MB, 2720 GFLOPS peak)" line in the start-up log has something to do with it? or perhaps its time to try BOINC v6.12.18? i have no idea if it'll fix MW@H's failure to recognize my 5870 as a double precision GPU. i know this isn't the MW@H forums, but before i start posting out it there, does anyone have any suggestions? *EDIT* - i should also note that i've given up for the evening. i've been messing w/ this stuff since i got home from work, so i'm taking the rest of the night off and will get back on this issue tomorrow afternoon or something. fortunately, the S@H GPU app and the E@H CPU app are working fine together, and that's all that matters for now... |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
Funny, Very funny :)) Or is it "Impressive! Much impressive!" (Darth Vader) Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
So disable the right GPU, post your startup messages then, and tell us why those Wu's error out, they shouldn't. Claggy |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I don't know which card Boinc should have picked up as the primary GPU, probably the HD3300 since that's the one with the Monitor plugged in, so why call them both HD5800's? a lot of the problems here and at Milkyway (i've seen his thread there now) is that Boinc can detect Multiple Cuda or ATI devices that have different capabilities, Boinc just knows them as Cuda or CAL devices, as far as it cares they are the same, So here Boinc tries to run one instance of a CAL app on each GPU, it can't run run them on both GPU's because the app's an OpenCL app, and only one GPU is OpenCL capable, one app runs O.K, the other waits for a device to become available then errors out when it isn't, Boinc at Milkway on the other hand again tries to run an app on each GPU, but can't because the first GPU isn't DP capable, but this app instead of erroring out swiches to the other GPU instead, that's already running one app already, not a problem, they'll just take longer, Since you're disabled Boinc's use of the HD5870, Boinc will only use the HD3300, here at Seti Boinc will tell the app to run on CAL device 0 (the HD3300), the app starts looks for OpenCL devices, finds OpenCL device 0 (the HD5870) and starts on it no problem, (Boinc dosn't know any different), Meanwhile at Milkyway their server won't send you any fresh work because the HD3300 isn't DP capable, Boinc will still try and run that DP app on your HD3300 (it can't tell the difference), the app say it can't run there, and switches to device 1 instead, the only way round this is to disable the HD3300, then deal with any problems that brings up. A question to Richard and others, what happens if you have a normal Cuda card as device 0, and a Fermi as device 1 ?, will they both show up as a normal Cudas GPU's or as Fermis?, and will they get Stock 6.08/6.09 work or Stock 6.10 work? Claggy |
skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60 |
I think Victor had brought this up before. He couldnt find a way to run multiple Fermi WU's while running 6.09 on the 295 card. I would assume you'd have to create special portions of the app_info to declare which app runs with which Card. or to identify which card is which but we are off topic a bit now In a rich man's house there is no place to spit but his face. Diogenes Of Sinope |
Sunny129 Send message Joined: 7 Nov 00 Posts: 190 Credit: 3,163,755 RAC: 0 |
So disable the right GPU, post your startup messages then, and tell us why those Wu's error out, they shouldn't. here's the start-up log: Starting BOINC client version 6.10.58 for windows_intelx86 ...but i think you probably hit the nail on the head in your subsequent response. i was seriously hoping to avoid having to disable the HD 3300 integrated video b/c it made for zero GUI lag, even w/ the CPU & GPU crunching at full tilt simultaneously. i suppose i'll just have to look up tips on minimizing GUI lag then... |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
i was seriously hoping to avoid having to disable the HD 3300 integrated video b/c it made for zero GUI lag, even w/ the CPU & GPU crunching at full tilt simultaneously. i suppose i'll just have to look up tips on minimizing GUI lag then... There is still no need to disable in BIOS the HD 3300 integrated video (disabling it for BOINC will not disable it for Windows) What is the behavior of BOINC now after you use again <ignore_ati_dev>0</ignore_ati_dev> ? The next thing to try is: 1) Rename the cc_config.xml (e.g. to ---cc_config.xml) to disable its usage (BOINC will not see it if it has any different name) 2) Install the newest (beta) BOINC 6.12.18 (32 bit) and see how it handles your two ATI GPUs http://boinc.berkeley.edu/download_all.php Direct link: http://boinc.berkeley.edu/dl/boinc_6.12.18_windows_intelx86.exe  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
...but i think you probably hit the nail on the head in your subsequent response. i was seriously hoping to avoid having to disable the HD 3300 integrated video b/c it made for zero GUI lag, even w/ the CPU & GPU crunching at full tilt simultaneously. i suppose i'll just have to look up tips on minimizing GUI lag then... You shouldn't need to disable the HD3300 now that Boinc ignores it, you can still use it as your display card, If you get GUI lag or driver restarts, have a look at the release notes for the r177 ATI MB apps: Command line parameter that application supports -period_iterations_num <N> splits single longest PulseFind kernes call on N calls -period_iterations_num 1 (default value) If you see lags in GUI or even driver restarts - add this parameter with value >1 (integer numbers). Claggy |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
I'm still thinking about trying to get BOINC to run at all, with a mixture of cards and a mixture of projects. That's the main issue we can learn from, to the benefit of future users. Sunny can work on GUI lag himself, using the tips here and the controls that the app developers have provided. As I see it, we're still testing v6.10.58: any of what follows may need to be reviewed and re-tested under v6.12.18 (or later). I don't know which card Boinc should have picked up as the primary GPU, probably the HD3300 since that's the one with the Monitor plugged in, so why call them both HD5800's? I don't think BOINC has (or needs) any concept of 'primary' or 'secondary' GPU: it just has a numbered list. And the client knows all about the differences: "both HD5800" is a server reporting limitation, and cosmetic only. We've got the device numbering sorted out. ATI GPU 0: ATI Radeon HD 2300/2400/3200 (RV610) (CAL version 1.4.900, 341MB, 56 GFLOPS peak) and <ignore_ati_dev>n</ignore_ati_dev> works as specified: n=0 ignores the 3300 for BOINC's purposes n=1 ignores the 5800 likewise. So far, so good. a lot of the problems here and at Milkyway (i've seen his thread there now) is that Boinc can detect Multiple Cuda or ATI devices that have different capabilities, So, for two different reasons, the only card which can actually crunch is the 5800: the 3300 is ruled out because it has (MW) no DP support, and (SETI) no OpenCL support. So, disabling device 0 should crack it, right? Apparently not. Comparing Sunny's posts of 11 Mar 2011 | 12:39:23 UTC and 11 Mar 2011 | 23:27:02 UTC, I think I'm getting: With Device 1 ignored - so as far as BOINC is concerned, only the 3300 available, both projects crunch OK, but MW refuses to supply new work because the device is not DP capable. SETI does supply new work. With Device 0 ignored - so BOINC confirms that the 5800 is available - both projects supply work, MW crunches it but SETI errors out. These presumably are the tasks like 1834414694 (reported 11 Mar 2011 | 23:12:15 UTC) with Number of period iterations for PulseFind setted to:2 I'm not sure in what sense Raistmer is using the word 'slot' here. In BOINC terminology a 'slot' is a scratchpad workspace allocated (by BOINC) for a task to use for temporary files. "Wait for free slot failed" makes no sense with the BOINC meaning of slot. He must mean some sort of OpenCL 'slot'. Compare with Sunny's successful SETI tasks: Running on device number: 0 I suspect that Sunny's computer (even with a GPU ignored) is reporting two BOINC device numbers, but only one OpenCL slot. Raistmer is (mis-)equating devices to slots, and trying to use the non-existent second OpenCL device. That sounds like an application bug to me. Overall, I'm uncomfortable with this idea that BOINC is managing the devices, but the applications feel it's OK to ignore BOINC's assignment: in effect, the applications are saying "no, I don't like the device you've given me, I'll steal somebody else's instead". For smooth inter-project collaboration, that sounds dangerous: let's hope BOINC v6.12 is better when we start testing it. A question to Richard and others, what happens if you have a normal Cuda card as device 0, and a Fermi as device 1 ?, They will be individually identified by the client, as above - which is what matters. In the server summary report, outsiders will only see the 'best' card, presumably the Fermi. When the client requests work, it will just ask for 'NVidia GPU' work, and the server should allocate work with the 'best' (i.e. fastest) app_version record. We hope that will be the v6.10 Fermi application (assuming stock), because v6.10 Fermi is backwards compatible and will run just fine on the older cards you're describing as "normal Cuda". The case that would worry me would be a stock cruncher who's been running a fast 2xx card for a while, and later adds a slow Fermi. Is there any danger that the server's "fastest" enumeration would clash with the client's "best" enumeration, and continue to allocate v6.09 cuda23 (for the sake of argument) app_versions, even with a Fermi available? I don't know. |
Sunny129 Send message Joined: 7 Nov 00 Posts: 190 Credit: 3,163,755 RAC: 0 |
You shouldn't need to disable the HD3300 now that Boinc ignores it, you can still use it as your display card There is still no need to disable in BIOS the HD 3300 integrated video sorry for the misunderstanding guys. i just assumed that you meant the HD 3300 had to be completely disabled in windows (via the BIOS) b/c even with it disabled in BOINC only, it is still recognized by BOINC (just not used), and this seems to be causing my S@H tasks to error out. Bill, as you know, going from <ignore_ati_dev>0</ignore_ati_dev> to <ignore_ati_dev>1</ignore_ati_dev> allowed the S@H GPU app to run flawlessly, but caused problems for MW@H. going back to <ignore_ati_dev>0</ignore_ati_dev>, i'm still getting the same problem - MW@H works fine, but the instant i resume a S@H GPU task, it errors out. So, disabling device 0 should crack it, right? Apparently not. that is correct - if MW@H GPU tasks already exist in the que, then they will continue to crunch one at a time despite the fact that the GPU doing the crunching (the 5870) is in fact the one being ignored by BOINC. and the MW@H server does not replenish my que with more MW@H GPU tasks for the reason you mentioned above - it sees that BOINC has disabled the double precision GPU, and that the enabled one (the HD 3300) is not DP capable. given you're of the opinion that this may be an application bug, is this something that needs to be addressed in the coding of the S@H GPU app, either Multibeam or Astropulse (or both)? or would it be worth my time to try out BOINC v6.12.18? BillBG thinks that this should be my next move. i could try it without a cc_config.xml file at first, and if i have device management and application runtime problems, i could then add a cc_config.xml file to see how it effects things... what's the consensus? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
what's the consensus? If you can face the hassle, yes, I'd like to see the results of a controlled test with v6.12.18 please. Take it a step at a time, keep notes, and expect the unexpected. I would start by suspending any/all tasks 'ready to run'. Allow the tasks which are running to complete, and once they've uploaded, report them - so you're starting with a clean slate, just suspended tasks ready to release one at a time. Delete any cc_config.xml file and install BOINC v6.12.18 - let's see how it works "raw" first. Note the GPU detection lines from what will now be the 'Event log' - Ctrl+Shift+E, or from the Advanced menu. See how MW and SETI run. If you encounter any errors, make a note of the task name and reporting time, so we can match the task's internal error messages later. Try again, with as many of <ignore_ati_dev>0, <ignore_ati_dev>1, and <use_all_gpus> (one at a time, obviously), and let us know how you get on. That's my vote, anyway. Best of luck. |
Sunny129 Send message Joined: 7 Nov 00 Posts: 190 Credit: 3,163,755 RAC: 0 |
what's the consensus? thank you for the detailed advice. it sounds like a bit of a daunting task for a beginner/non-tester, but its probably nothing i can't handle. just to make it clear ahead of time, i hope to be able to get through this testing by the end of the weekend, but i'm not making any promises b/c i do have some other obligations. but will WILL take it one step at a time, and make detailed notes of everything from the event log to task names, report times, and errors, should they occur. and with that, i'm off to uncharted waters (for me anyways =P)...wish me luck everyone :) |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Richard, this problem has been reported before by Pepi in the Lunatics ATI OpenCL MultiBeam beta testing thread. He disabled onboard GPU in the end, Claggy |
Sunny129 Send message Joined: 7 Nov 00 Posts: 190 Credit: 3,163,755 RAC: 0 |
Richard, this problem has been reported before by Pepi in the Lunatics ATI OpenCL MultiBeam beta testing thread. i can't see that thread, probably b/c i'm not a registered user/forum member yet. unfortunately, registration is currently disabled for whatever reasons, so it may be some time before i can get registered and read through that thread. regardless, you say that this "Pepi" member experienced the same issue as i'm having now, and in the end his solution was to disable the onboard GPU. did he disable it only in BOINC, or did he completely disable it in Windows (via the BIOS)? if he only disabled it in BOINC, did it prevent further S@H GPU tasks from erroring out? i aks b/c, as you already know, when i disable my onboard GPU in BOINC, it actually causes S@H GPU tasks to error out (as opposed to actually fixing things). so if it actually fixed things for him, i'm not sure he had the same problem as i have. then there's also the fact that i'm trying to get both S@H and MW@H GPU tasks working properly..for all i know, Pepi was just addressing a S@H-only issue. at any rate, its quite a time-consuming setback not to be able to access that thread @ lunatics anytime soon. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Richard, this problem has been reported before by Pepi in the Lunatics ATI OpenCL MultiBeam beta testing thread. And I posted immediately afterwards. Pepi had (and posted) the same error message as Sunny is receiving, BOINC assigns 1 device, slots 1 to 1 (including) will be checked There were a few Q & A exchanges, but it never reached the "known issues" in the opening post. I'm still looking for Pepi's eventual workround. Edit - did he really? I see "waiting for fix" at the bottom of page 21, and "Beta closed" on page 24. Nothing related to this problem in between. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I believe he completely disabled his onboard GPU via the Bios: Pepi: Claggy |
Sunny129 Send message Joined: 7 Nov 00 Posts: 190 Credit: 3,163,755 RAC: 0 |
hey thanks for having a look guys...i'd do it myself, but again, i cannot currently view that thread over there, nor can i register at this time... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
hey thanks for having a look guys...i'd do it myself, but again, i cannot currently view that thread over there, nor can i register at this time... Lunatics registration is by invitation only at the moment - and be careful what you wish for, you might get an invite if you carry on like this.... ;-) See edit to my last post..... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.