Modified SETI MB CUDA + opt AP package for full GPU utilization

Message boards : Number crunching : Modified SETI MB CUDA + opt AP package for full GPU utilization
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 25 · Next

AuthorMessage
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 844541 - Posted: 24 Dec 2008, 12:48:20 UTC - in response to Message 844498.  

Hm.. I suppose that method works as well. Just a little extra work to have to go and modify preferences and all of that. As I said, I found that setting a larger cache when your list is relatively low on work typically does the trick, but usually only if there are AP tasks available for assignment.

Yes, your method works well too, I use to do that at first, but found that sometimes all I got was a whole bunch of MB. Setting it to just AP in the preferences eliminated that possibility for me.
ID: 844541 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 844556 - Posted: 24 Dec 2008, 14:01:04 UTC - in response to Message 844541.  

Don't try to understand why. I got too many AP. I deleted some and now have 12 Waiting. And of course SETI goes in high priority mode.....
ID: 844556 · Report as offensive
Profile Timi
Volunteer tester

Send message
Joined: 7 Oct 99
Posts: 25
Credit: 6,533,108
RAC: 0
Greece
Message 844632 - Posted: 24 Dec 2008, 17:29:20 UTC - in response to Message 844556.  

I just got 5 AP work units, too.
I am -now- crunching 2 AP work units through my CPU (AMD Dual Core) and 1 setiathome enchanced 6.05 (cuda) through the GPU (SLI configuration).
I haven't make any cc_config.xml file.
The 1 cuda work unit uses at most 10% of one of the cores, usually it is a 4%.

As for the 6.05 cuda computation erros, it has nothing to do with SLi enabled or disabled, neither the drivers version been used. I tried all of the above combinations with 181.00. Its a no go :-(
ID: 844632 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 844708 - Posted: 24 Dec 2008, 20:20:29 UTC - in response to Message 844632.  
Last modified: 24 Dec 2008, 20:24:07 UTC

To avoid further pollution of science database with wrong but passed validation results I suspend my mod distribution.
SETI CUDA should produce valid resutls or give computation error but not invalid "overflows" when go on large scale use. So, it should be repaired.

It touches stock version too.

Look these threads for more info.
http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1488
http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.0.html
ID: 844708 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 844744 - Posted: 24 Dec 2008, 21:23:42 UTC - in response to Message 844708.  
Last modified: 24 Dec 2008, 21:28:15 UTC

To avoid further pollution of science database with wrong but passed validation results I suspend my mod distribution.
SETI CUDA should produce valid resutls or give computation error but not invalid "overflows" when go on large scale use. So, it should be repaired.

It touches stock version too.

Look these threads for more info.
http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1488
http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.0.html


So you are suggesting we suspend Seti cuda (stock or your version) until Seti can fix their app.

Do you know if its fixed in 6.06?
BOINC blog
ID: 844744 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 844746 - Posted: 24 Dec 2008, 21:37:42 UTC - in response to Message 844744.  
Last modified: 24 Dec 2008, 21:40:49 UTC

So you are suggesting we suspend Seti cuda (stock or your version) until Seti can fix their app.

Do you know if its fixed in 6.06?

I don't have an answer to your first question, it's probably still being decided, but the answer to your second question is no it's not fixed in 6.06

I can say that it was suggested to me to suspend CUDA for now though.
ID: 844746 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 844748 - Posted: 24 Dec 2008, 21:42:33 UTC - in response to Message 844744.  

Certainly I've seen too many overflows with 6.06 too. So apparently it's not fixed. It's needed to check more thoroughly though.
But my mod will be down for sure. It's based on 6.05 or even 6.04 codebase.

If app gives invalid results then the using of such app is just data falsification. I don't wanna fabricate data when I know about this possibility already.

ID: 844748 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 844752 - Posted: 24 Dec 2008, 22:19:00 UTC - in response to Message 844748.  

Certainly I've seen too many overflows with 6.06 too. So apparently it's not fixed. It's needed to check more thoroughly though.
But my mod will be down for sure. It's based on 6.05 or even 6.04 codebase.

If app gives invalid results then the using of such app is just data falsification. I don't wanna fabricate data when I know about this possibility already.


I have aborted my remaining cuda tasks and set my preferences to no-cuda. I had a bunch that I had to abort anyway because they just hang. They get to 20 seconds cpu time with 0% progress and just sit there. I did about 10 yesterday and another 6 this morning.

I am now back to crunching the old way. And just to keep the GPU busy I joined GPUGRID :-)

Hopefully Eric & Jeff will jump on it and sort it out fairly quickly and we can progress.


BOINC blog
ID: 844752 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 844754 - Posted: 24 Dec 2008, 22:22:32 UTC - in response to Message 844752.  

Hope so.
Will look toward GPUgrid for keep production hosts busy too :)
ID: 844754 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 844768 - Posted: 24 Dec 2008, 23:42:14 UTC - in response to Message 844632.  

I just got 5 AP work units, too.
I am -now- crunching 2 AP work units through my CPU (AMD Dual Core) and 1 setiathome enchanced 6.05 (cuda) through the GPU (SLI configuration).
I haven't make any cc_config.xml file.
The 1 cuda work unit uses at most 10% of one of the cores, usually it is a 4%.

As for the 6.05 cuda computation erros, it has nothing to do with SLi enabled or disabled, neither the drivers version been used. I tried all of the above combinations with 181.00. Its a no go :-(


I don't think SLI mode would have anything to do with your issues. As far as I know all SLI does is make the card appear to have twice the number of cores by linking them together. Whether Seti can use these extra cores is anyones guess. I suspect not and you would get better throughput in non-SLI mode as it would appear as 2 CUDA and therefore be able to process 2 cuda wu at a time.

Given the overflow issues I have suspended cuda processing for Seti until Eric or Jeff can fix it.
BOINC blog
ID: 844768 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 844996 - Posted: 25 Dec 2008, 15:34:07 UTC

After reading a few threads I just wanted to confirm, that I should discontinue using CUDA and go back to cpu tasks.
ID: 844996 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845002 - Posted: 25 Dec 2008, 15:52:39 UTC - in response to Message 844996.  
Last modified: 25 Dec 2008, 15:56:38 UTC

After reading a few threads I just wanted to confirm, that I should discontinue using CUDA and go back to cpu tasks.


It's would be nice if you could check how often CUDA faults occur on completely underclocked GPU and do some GPU hardware tests too.
Hardware vs. software fault should be still separated IMHO.

I run completely underclocked (450/1500/1600 instead of 600/1700/1800) GPU almost 24 hours already - still no driver crashes at least. Of course it became much slower so processed set oftasks still relatively small but it's worth to know is it hardware or software problem...

ADDON: GPU runs @55C now (instead of 57C at stock speeds). Idle temp on stock frequencies is 44C.
ID: 845002 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 845007 - Posted: 25 Dec 2008, 16:10:51 UTC - in response to Message 845002.  
Last modified: 25 Dec 2008, 16:36:05 UTC

I specifcailly didn't oc my gpu so all my results are stock. The only "errors" I remember I had were VLAR at main. I didn't have an unusual amount of overflows either. I'll see what's in my completed tasks, though alot of it's already gone from the server. I'd have to run more to do a real test.

Edit: If I did run more test, would it be modified app or stock app?
Should add, Other than the one crash I had yesterday after doing a day of so of Beta without a restart, I didn't have any crashes, BSoD's or driver issues other than with VLAR and most of those would just get stuck and I'd have to abort them.
ID: 845007 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845015 - Posted: 25 Dec 2008, 16:53:09 UTC - in response to Message 845007.  

And what is validation ratio ?
All except crashed WUs validated OK ?

If so maybe it's worth for you just to continue with CUDA as is. And watch for validation of processed WUs.
ID: 845015 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845016 - Posted: 25 Dec 2008, 16:53:12 UTC - in response to Message 845007.  
Last modified: 25 Dec 2008, 16:57:19 UTC

And what is validation ratio ?
All except crashed WUs validated OK ?

If so maybe it's worth for you just to continue with CUDA as is. And watch for validation of processed WUs.

ADDN:Oh, right now I got driver crash and restart :/ On completely underclocked GPU... So it seems it's software problem however.

Will look if any overflows will follow and switch to GPU checking by tools proposed in another thread.
ID: 845016 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 845023 - Posted: 25 Dec 2008, 17:14:08 UTC
Last modified: 25 Dec 2008, 17:17:59 UTC

A majority have validated, and look to match what the wingman claimed. I even have overflows that match stock and AKv8 results.

I do have this overflow where I'm not matching the others. Though it's the only one I've found that has this problem.

Only had a couple of VLAR I had to abort that I remember.

I do have a few 0 credit ones I'm waiting on for a result, where a third wingman was sent out.

Like I say so far I've only found the one odd result mentioned, will be out for a while and will look further into it when I get back. If it seems to be the case, I'll do as you said and run some more and keep an eye on the validation.
ID: 845023 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845032 - Posted: 25 Dec 2008, 17:47:08 UTC - in response to Message 845023.  

Yes, "Spike count: 29" - seems excess spikes.
ID: 845032 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 845041 - Posted: 25 Dec 2008, 18:06:47 UTC - in response to Message 845032.  
Last modified: 25 Dec 2008, 18:10:35 UTC

Yes, that task also seems to be around the time of the crash I mentioned to you yesterday on Beta ( and may even be the one I was working on), so it is possible that might have something to do with the result. I had thought it reset to 0.00% after the crsah but I could be wrong.
ID: 845041 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845043 - Posted: 25 Dec 2008, 18:10:34 UTC - in response to Message 845041.  

Yes, that task also seems to be around the time of the crash I mentioned to you yesterday on Beta, so it is possible that might have something to do with the result.


Yes, usually I see bunch of overflows just after crashed task. That's why I think it's software, not hardware ... But on the other hand, if crashed task somehow overheated GPU then it needs some time to cool down and while it in overheated state it could generate overflows... But it's pure speculations of course :)
ID: 845043 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 845071 - Posted: 25 Dec 2008, 19:47:20 UTC - in response to Message 845043.  
Last modified: 25 Dec 2008, 20:23:16 UTC

Ok, I can't say this is really all there was in the tasks I've done since resualt of tasks I did are already off the server. Of the tasks I found where there were problem, they seem to all be in the time frame I was using the stock app

381977907

383431920

383431970

383432014

gpu was not oc'd but I was using process lasso still, keeping the tasks at real time priority. (I know, I was warned about system instability). Once I started using your modified app, I no longer used process lasso or real time priority, and I don't see the same problems (except for the one instance mentioned), but I do have pending results that could show this at a later time.

I'm going to run some more on your app and see what happens, I'll keep the cache low so if I start seeing a bad trend I can stop, hopefully without doing too much harm.

BTW could just be my connection but the web site seems very slow when trying to post. The rest of the site seems to be normal.
ID: 845071 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 25 · Next

Message boards : Number crunching : Modified SETI MB CUDA + opt AP package for full GPU utilization


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.