Modified SETI MB CUDA + opt AP package for full GPU utilization

Message boards : Number crunching : Modified SETI MB CUDA + opt AP package for full GPU utilization
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 25 · Next

AuthorMessage
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 845137 - Posted: 25 Dec 2008, 22:27:28 UTC
Last modified: 25 Dec 2008, 23:16:02 UTC

Even though these are still pending, I'm pretty sure that they're going to be the same as the earlier mentioned validation problem with me showing an overflow and the other two wingmen having a similar result.

They all happen in the same time period of the crash I mentioned. One of them is just at the begining of that time period. 1101014721

The next result was the one mentioned in the earlier post. Then there were what I'm guessing are 3 more results that will probably have the same validation problem once a result is gotten.

1101015020
1101015257
1101015259

I'm now guessing the last one is the crash or could be the one just prior to the crash, and the others are an effect leading up to it.

Like I mentioned I had been crunching tasks at Beta for about a day before without any restarts, freezing, and only had to abort a few VLAR. I then came right over to main and started the tasks so maybe a restart at that point would have been a good idea on my part. I will after I finish the couple of 6.06 I'm doing at Beta, before I start any here.
ID: 845137 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845152 - Posted: 25 Dec 2008, 23:19:06 UTC - in response to Message 845137.  
Last modified: 25 Dec 2008, 23:21:00 UTC

Ok, lets try new version :) It's based on latest available for me revision of CUDA MB.

http://lunatics.kwsn.net/gpu-crunching/modified-seti-mb-cuda-opt-ap-package-for-full-gpu-utilize.0.html

0) Please, watch carefully what results your CUDA-enabled host returns and stop using CUDA SETI MB till new version will be available if you see excessive invalid results rate.

1) This package (Raistmer's_opt_package_V2.rar) can be downloaded from http://files.mail.ru/5MI3EM (and from post on Lunatics forums, see link above).
Targed hosts: Windows x86, SSE3 support for AP, CUDA support for MB.

2) It consist of modified SETI MB CUDA and current SSE3 opt SETI AP binaries with corresponding app_info.xml file
3) Modification that I have done increases CUDA worker thread priority in SETI MB CUDA that allows more fully GPU usage while keeping all CPU cores busy too. That is, using of this build can increase total performance of your host for BOINC tasks.
4) MB binaries based on CUDA MB sources recived from Eric (with small modification), opt AP is just repacking of current Lunatics opt AP release (SSE3 build).
5) It's not "official" Lunatics release so you could blame only me (or yourself, or BOINC bugs and so on and so forth) for any issues you encounter.
6) I still can' check AP+MB work (no AP tasks here) but it works just fine with CUDA MB + einstein@home combination.
7) For best CPU and GPU usage I recommend to set number of processors available for BOINC to real_number_of_cores+1. This will mitigate current BOINC bug with CPU+CUDA scheduling and will allow fully load CPU and GPU.
8) Installation instructions are the same as for any opt app: stop BOINC, decompress all files in archive into SETI project directory, restart BOINC.

Please, report issues here too.
ID: 845152 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 845188 - Posted: 26 Dec 2008, 2:41:32 UTC - in response to Message 845152.  

Ok, lets try new version :) It's based on latest available for me revision of CUDA MB.


When you say "latest available" is that the 6.05 or 6.06 code base?

If it looks to be more stable I might fire up Seti CUDA again. I've done a couple of GPUGRID wu that take from 8 to 10 hours a pop.

Another question I had was if the CUDA app uses more shader units on the graphics card if they have them, or does it used a fixed number regardless of the card having more? Or maybe it uses the processors (which also vary in number)?

My graphics card is a 9800GT running at "stock" speed and temps seem to be around 50 (idle) and 60 (under load). Room temp is currently 29. Its summer in Australia at the moment. I will post some pics on my blog soon.

I have 3 more graphics cards on order for the other machines so hopefully we can iron out the bugs soon and get into full production.
BOINC blog
ID: 845188 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 845219 - Posted: 26 Dec 2008, 6:05:30 UTC

I like the extra added info on the card. Cool 1102600941
ID: 845219 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 845233 - Posted: 26 Dec 2008, 8:25:27 UTC - in response to Message 845219.  

I like the extra added info on the card. Cool 1102600941


Yeah it does have a lot more info about the video card doesn't it? All the GPUGRID wu report about the card is:

<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9800 GT"
# Clock rate: 1512000 kilohertz
# Number of multiprocessors: 14
# Number of cores: 112

Which is why I was asking about the app using the extra cores (sometimes called shaders) or the processors.
BOINC blog
ID: 845233 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 845237 - Posted: 26 Dec 2008, 8:48:33 UTC - in response to Message 845233.  

I like the extra added info on the card. Cool 1102600941


Yeah it does have a lot more info about the video card doesn't it? All the GPUGRID wu report about the card is:

<stderr_txt>
# Using CUDA device 0
# Device 0: "GeForce 9800 GT"
# Clock rate: 1512000 kilohertz
# Number of multiprocessors: 14
# Number of cores: 112

Which is why I was asking about the app using the extra cores (sometimes called shaders) or the processors.

The shaders, or stream processors, are from what I understand, basically dynamic RISC processors. Once they get told to do something, that's all they can do, and they do it very well, and very efficiently, until they are told to do something else.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 845237 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 845240 - Posted: 26 Dec 2008, 8:57:26 UTC
Last modified: 26 Dec 2008, 9:13:20 UTC

That didn't take long. Had a restart while doing this task The lack of information in it seems very odd. Will have to wait on the wingman to even know what AR it is.

The next task afterwards was an overflow. Will have to wait on my wingman to see the outcome of that one too.

Will also have to wait on wingmen for the tasks before these.

I'm now doing another task, which appears to be doing ok.

Edit: One of the pending tasks has been completed by my wingman and appears to be strongly similar, but has not been validated yet.

No error message in event viewer for the reboot.
ID: 845240 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 845252 - Posted: 26 Dec 2008, 10:19:31 UTC
Last modified: 26 Dec 2008, 10:21:46 UTC

Turns out the wingman returned an overflow on the task I had the crash on. AR is 2.718469. Still waiting to see if a third wingman will be sent out, although I can't imagine one won't.

The task completed a couple of tasks after the crash had a similar AR of 2.714647 so it appears I can do the AR, but remains to be seen if it can be done on this app with a valid result.

Kind of like a similar task I ran at Beta with this AR, where it blew my drivers out and had to reinstall them again, but later was able to do the AR there too.
ID: 845252 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9958
Credit: 103,452,613
RAC: 328
United Kingdom
Message 845295 - Posted: 26 Dec 2008, 14:44:09 UTC

I am running your V2 app and soon after I started I got these 2:

1099608329
1099608327

Bernie
ID: 845295 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845314 - Posted: 26 Dec 2008, 15:59:11 UTC - in response to Message 845188.  

When you say "latest available" is that the 6.05 or 6.06 code base?

I used rev380 (head for that moment) revision of berkeley's CUDA branch in repository. Is it 6.05 or 6.06 - ask Eric & Co. In version file stated 6.02 that is apparently false value. Anyway, there is no more recent public accessible code.


Another question I had was if the CUDA app uses more shader units on the graphics card if they have them, or does it used a fixed number regardless of the card having more? Or maybe it uses the processors (which also vary in number)?

CUDA uses huge amounts of threads. They much lightwight than CPU threads. All current cards has different values of simultaneously executable threads, but it's recommended that app has more threads than GPU has. GPU handles threads swapping in so called warps.
So answer is yes. (Not sure can CUDA threads be called shaders or not though).

ID: 845314 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845317 - Posted: 26 Dec 2008, 16:05:17 UTC - in response to Message 845240.  

That didn't take long. Had a restart while doing this task The lack of information in it seems very odd. Will have to wait on the wingman to even know what AR it is.

Strange indeed. No comments...


The next task afterwards was an overflow. Will have to wait on my wingman to see the outcome of that one too.

It's VHAR unit. VHAR correlated with overflows already. Your result just supports that correletion (for CUDA app sure).

ID: 845317 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845319 - Posted: 26 Dec 2008, 16:08:07 UTC - in response to Message 845252.  
Last modified: 26 Dec 2008, 16:08:26 UTC

The task completed a couple of tasks after the crash had a similar AR of 2.714647 so it appears I can do the AR, but remains to be seen if it can be done on this app with a valid result.


Hm, AR=2,7 could be named VHAR too... Interesting will you get valid result here or not.
ID: 845319 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845320 - Posted: 26 Dec 2008, 16:10:26 UTC - in response to Message 845295.  
Last modified: 26 Dec 2008, 16:11:23 UTC

I am running your V2 app and soon after I started I got these 2:

1099608329
1099608327

Bernie

WU true angle range is : 0.012528....

They are VLARS. Look here:
http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1443
ID: 845320 · Report as offensive
Profile S@NL - eFMer - efmer.com/boinc
Volunteer tester
Avatar

Send message
Joined: 7 Jun 99
Posts: 512
Credit: 148,746,305
RAC: 0
United States
Message 845369 - Posted: 26 Dec 2008, 19:38:40 UTC - in response to Message 845320.  

Got some results but as with the previous version. Sometimes a frequent (at least every minute) video driver crash. It doesn't hang the system in Vista but it's hardly a stable situation...
http://setiathome.berkeley.edu/result.php?resultid=1103499305
Almost all task that do run are valid.
And a cuda task that sometimes takes up 50% of the CPU time .....
Sometimes it eats away 20 at a time but I have my doubt about the validation system of SETI. Seen results that are 100% in error and got 40 points for it. I see tasks done by 3 users, 1 is in error and all get points of the 2 valid tasks.
I wait for a more stable solution.
ID: 845369 · Report as offensive
Profile popandbob
Volunteer tester

Send message
Joined: 19 Mar 05
Posts: 551
Credit: 4,673,015
RAC: 0
Canada
Message 845373 - Posted: 26 Dec 2008, 19:50:32 UTC

I just had a task hang at 0:00 CPU time and zero GPU usage...
Task details
Tried to suspend it...Didn't suspend/start new task
Tried to abort it... Didn't Abort...
So I shut it down in task manager...

Also just found out that the CUDA app doesn't trigger performance 3d clocks on my gtx 260...Only low power 3d clocks if I'm lucky... Used ATI tools to set the clocks at performance 3d clocks right across the board.. Now I'm seeing some CRAZY speed :)


Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957
ID: 845373 · Report as offensive
Profile ohiomike
Avatar

Send message
Joined: 14 Mar 04
Posts: 357
Credit: 650,069
RAC: 0
United States
Message 845374 - Posted: 26 Dec 2008, 19:52:38 UTC

I have had two tasks with too many results (waiting on wingmen to verify or not), and this one:

setiathome_CUDA: Found 1 CUDA device(s):
Device 1 : GeForce 8500 GT
totalGlobalMem = 536543232
sharedMemPerBlock = 16384
regsPerBlock = 8192
warpSize = 32
memPitch = 262144
maxThreadsPerBlock = 512
clockRate = 918000
totalConstMem = 65536
major = 1
minor = 1
textureAlignment = 256
deviceOverlap = 1
multiProcessorCount = 2
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce 8500 GT is okay
SETI@home using CUDA accelerated device GeForce 8500 GT
Rise priority modification by Raistmer based on rev380 of SETI@home sources
Priority of worker thread rised successfully
Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_fft.cu' in line 49 : out of memory.
setiathome_CUDA: CUDA runtime ERROR in plan FFT. Falling back to HOST CPU processing...
setiathome_enhanced 6.02 Visual Studio/Microsoft C++
libboinc: 6.3.22

Work Unit Info:
...............
WU true angle range is : 2.715856
Optimal function choices:
-----------------------------------------------------
name
-----------------------------------------------------
v_BaseLineSmooth (no other)
v_GetPowerSpectrum 0.00021 0.00000
v_ChirpData 0.01462 0.00000
v_Transpose4 0.00563 0.00000
FPU opt folding 0.00172 0.00000

Flopcounter: 5215429468138.598600

Spike count: 3
Pulse count: 0
Triplet count: 1
Gaussian count: 0
called boinc_finish



Boinc Button Abuser In Training >My Shrubbers<
ID: 845374 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845378 - Posted: 26 Dec 2008, 20:05:08 UTC - in response to Message 845373.  

I just had a task hang at 0:00 CPU time and zero GPU usage...
Task details
Tried to suspend it...Didn't suspend/start new task
Tried to abort it... Didn't Abort...
So I shut it down in task manager...

Also just found out that the CUDA app doesn't trigger performance 3d clocks on my gtx 260...Only low power 3d clocks if I'm lucky... Used ATI tools to set the clocks at performance 3d clocks right across the board.. Now I'm seeing some CRAZY speed :)


OMG... look here on this your result. It's absolute record about quantity of errors per single result %)

It seems you should check your GPU stability before doing any OCing...
ID: 845378 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845380 - Posted: 26 Dec 2008, 20:11:04 UTC - in response to Message 845374.  

I have had two tasks with too many results (waiting on wingmen to verify or not), and this one:

file 'd:/BTR/seticuda/Berkeley_rep/client/cuda/cudaAcc_fft.cu' in line 49 : out of memory.
setiathome_CUDA: CUDA runtime ERROR in plan FFT. Falling back to HOST CPU processing...

Well, it seems this build can fall back to CPU processing if it encounter CUDA error... nice ability :)
Did this result validated?
ID: 845380 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 845386 - Posted: 26 Dec 2008, 20:26:00 UTC - in response to Message 845320.  

I am running your V2 app and soon after I started I got these 2:

1099608329
1099608327

Bernie

WU true angle range is : 0.012528....

They are VLARS. Look here:
http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=1443

Ah, but the second one has "WU true angle range is : 0.083363", which is a very helpful indication that the problem extends beyond the 0.05 true VLAR range. Anything with angle range 0.03 to 0.35 is quite rare, and there are variations in array sizes and other details of the computations for anything above 0.05. It's quite possible that an 0.079 might be OK even though the 0.083 is bad, for instance. I did spot some 0.147 range work which seemed OK a couple of days ago.
                                                                  Joe
ID: 845386 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 845388 - Posted: 26 Dec 2008, 20:35:29 UTC - in response to Message 845386.  

Will collect statistics about VLAR crashes on my own GPU (it's still underclocked, so I'm pretty sure in hardware stability).
Wanna build some debug version that will write in text file AR of overflowed WU - that way we will have VHAR <-> overflow statistic much easier..
ID: 845388 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 25 · Next

Message boards : Number crunching : Modified SETI MB CUDA + opt AP package for full GPU utilization


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.