Message boards :
Number crunching :
Modified SETI MB CUDA + opt AP package for full GPU utilization
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 25 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
Hm.. I suppose that method works as well. Just a little extra work to have to go and modify preferences and all of that. As I said, I found that setting a larger cache when your list is relatively low on work typically does the trick, but usually only if there are AP tasks available for assignment. Yes, your method works well too, I use to do that at first, but found that sometimes all I got was a whole bunch of MB. Setting it to just AP in the preferences eliminated that possibility for me. |
![]() ![]() Send message Joined: 7 Jun 99 Posts: 512 Credit: 148,746,305 RAC: 0 ![]() |
Don't try to understand why. I got too many AP. I deleted some and now have 12 Waiting. And of course SETI goes in high priority mode..... |
![]() Send message Joined: 7 Oct 99 Posts: 25 Credit: 6,533,108 RAC: 0 ![]() |
I just got 5 AP work units, too. I am -now- crunching 2 AP work units through my CPU (AMD Dual Core) and 1 setiathome enchanced 6.05 (cuda) through the GPU (SLI configuration). I haven't make any cc_config.xml file. The 1 cuda work unit uses at most 10% of one of the cores, usually it is a 4%. As for the 6.05 cuda computation erros, it has nothing to do with SLi enabled or disabled, neither the drivers version been used. I tried all of the above combinations with 181.00. Its a no go :-( |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
To avoid further pollution of science database with wrong but passed validation results I suspend my mod distribution. SETI CUDA should produce valid resutls or give computation error but not invalid "overflows" when go on large scale use. So, it should be repaired. It touches stock version too. Look these threads for more info. http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1488 http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.0.html |
MarkJ ![]() ![]() ![]() ![]() Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 ![]() |
To avoid further pollution of science database with wrong but passed validation results I suspend my mod distribution. So you are suggesting we suspend Seti cuda (stock or your version) until Seti can fix their app. Do you know if its fixed in 6.06? BOINC blog |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
So you are suggesting we suspend Seti cuda (stock or your version) until Seti can fix their app. I don't have an answer to your first question, it's probably still being decided, but the answer to your second question is no it's not fixed in 6.06 I can say that it was suggested to me to suspend CUDA for now though. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Certainly I've seen too many overflows with 6.06 too. So apparently it's not fixed. It's needed to check more thoroughly though. But my mod will be down for sure. It's based on 6.05 or even 6.04 codebase. If app gives invalid results then the using of such app is just data falsification. I don't wanna fabricate data when I know about this possibility already. |
MarkJ ![]() ![]() ![]() ![]() Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 ![]() |
Certainly I've seen too many overflows with 6.06 too. So apparently it's not fixed. It's needed to check more thoroughly though. I have aborted my remaining cuda tasks and set my preferences to no-cuda. I had a bunch that I had to abort anyway because they just hang. They get to 20 seconds cpu time with 0% progress and just sit there. I did about 10 yesterday and another 6 this morning. I am now back to crunching the old way. And just to keep the GPU busy I joined GPUGRID :-) Hopefully Eric & Jeff will jump on it and sort it out fairly quickly and we can progress. BOINC blog |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Hope so. Will look toward GPUgrid for keep production hosts busy too :) |
MarkJ ![]() ![]() ![]() ![]() Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 ![]() |
I just got 5 AP work units, too. I don't think SLI mode would have anything to do with your issues. As far as I know all SLI does is make the card appear to have twice the number of cores by linking them together. Whether Seti can use these extra cores is anyones guess. I suspect not and you would get better throughput in non-SLI mode as it would appear as 2 CUDA and therefore be able to process 2 cuda wu at a time. Given the overflow issues I have suspended cuda processing for Seti until Eric or Jeff can fix it. BOINC blog |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
|
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
After reading a few threads I just wanted to confirm, that I should discontinue using CUDA and go back to cpu tasks. It's would be nice if you could check how often CUDA faults occur on completely underclocked GPU and do some GPU hardware tests too. Hardware vs. software fault should be still separated IMHO. I run completely underclocked (450/1500/1600 instead of 600/1700/1800) GPU almost 24 hours already - still no driver crashes at least. Of course it became much slower so processed set oftasks still relatively small but it's worth to know is it hardware or software problem... ADDON: GPU runs @55C now (instead of 57C at stock speeds). Idle temp on stock frequencies is 44C. |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
I specifcailly didn't oc my gpu so all my results are stock. The only "errors" I remember I had were VLAR at main. I didn't have an unusual amount of overflows either. I'll see what's in my completed tasks, though alot of it's already gone from the server. I'd have to run more to do a real test. Edit: If I did run more test, would it be modified app or stock app? Should add, Other than the one crash I had yesterday after doing a day of so of Beta without a restart, I didn't have any crashes, BSoD's or driver issues other than with VLAR and most of those would just get stuck and I'd have to abort them. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
And what is validation ratio ? All except crashed WUs validated OK ? If so maybe it's worth for you just to continue with CUDA as is. And watch for validation of processed WUs. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
And what is validation ratio ? All except crashed WUs validated OK ? If so maybe it's worth for you just to continue with CUDA as is. And watch for validation of processed WUs. ADDN:Oh, right now I got driver crash and restart :/ On completely underclocked GPU... So it seems it's software problem however. Will look if any overflows will follow and switch to GPU checking by tools proposed in another thread. |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
A majority have validated, and look to match what the wingman claimed. I even have overflows that match stock and AKv8 results. I do have this overflow where I'm not matching the others. Though it's the only one I've found that has this problem. Only had a couple of VLAR I had to abort that I remember. I do have a few 0 credit ones I'm waiting on for a result, where a third wingman was sent out. Like I say so far I've only found the one odd result mentioned, will be out for a while and will look further into it when I get back. If it seems to be the case, I'll do as you said and run some more and keep an eye on the validation. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Yes, "Spike count: 29" - seems excess spikes. |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
|
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Yes, that task also seems to be around the time of the crash I mentioned to you yesterday on Beta, so it is possible that might have something to do with the result. Yes, usually I see bunch of overflows just after crashed task. That's why I think it's software, not hardware ... But on the other hand, if crashed task somehow overheated GPU then it needs some time to cool down and while it in overheated state it could generate overflows... But it's pure speculations of course :) |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
Ok, I can't say this is really all there was in the tasks I've done since resualt of tasks I did are already off the server. Of the tasks I found where there were problem, they seem to all be in the time frame I was using the stock app 381977907 383431920 383431970 383432014 gpu was not oc'd but I was using process lasso still, keeping the tasks at real time priority. (I know, I was warned about system instability). Once I started using your modified app, I no longer used process lasso or real time priority, and I don't see the same problems (except for the one instance mentioned), but I do have pending results that could show this at a later time. I'm going to run some more on your app and see what happens, I'll keep the cache low so if I start seeing a bad trend I can stop, hopefully without doing too much harm. BTW could just be my connection but the web site seems very slow when trying to post. The rest of the site seems to be normal. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.