CUDA cards: SETI crunching speeds

Message boards : Number crunching : CUDA cards: SETI crunching speeds
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8

AuthorMessage
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 884274 - Posted: 11 Apr 2009, 15:01:56 UTC

If 1 task is going slow with cuda does that mean that all the tasks with
the same start sequence ie 09ef09ad will all be vlar's and thus be slow>

ID: 884274 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 884280 - Posted: 11 Apr 2009, 15:16:38 UTC - in response to Message 884274.  

I'd say its up to which tape it came off of. there are aa ab ac etc. the mutibeam points in multiple directions. I'd assume that if its off the same tape then it will probably have a similar angle range


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 884280 · Report as offensive
Profile Matthew Love
Volunteer tester
Avatar

Send message
Joined: 26 Sep 99
Posts: 7763
Credit: 879,151
RAC: 0
United States
Message 884281 - Posted: 11 Apr 2009, 15:17:44 UTC

I recenetly installed A Geforce 8800 GT card that BOINC version 6.4.7 recognizes as having Cuda. It sure does make A world of Difference in crunching time by using Cuda.

LETS BEGIN IN 2010
ID: 884281 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 884374 - Posted: 11 Apr 2009, 20:33:36 UTC - in response to Message 884274.  

If 1 task is going slow with cuda does that mean that all the tasks with
the same start sequence ie 09ef09ad will all be vlar's and thus be slow>

Not certainly, but it's fairly likely. A 'tape' like 09ef09ad represents about 1.5 hours of telescope time, and observations are usually scheduled in 1 hour chunks. So when the observation is for a specific location in the sky the VLAR set may last for a full hour and that hour may be all within one 'tape'.

What is definite is that for the 107.37 seconds of a specific task, there are 256 subbands * 14 channels = 3584 WUs split. If one of those is VLAR so are all the others, and there will be at least 7168 VLAR tasks sent.
                                                               Joe
ID: 884374 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 884390 - Posted: 11 Apr 2009, 21:49:52 UTC - in response to Message 884374.  

I must have got the whole tape.
I have a pile of 09fe,10fe and 12fe all are taking over
1 hr using cuda and 6.03.
Dave
ID: 884390 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 884419 - Posted: 11 Apr 2009, 23:27:29 UTC - in response to Message 884390.  

I must have got the whole tape.
I have a pile of 09fe,10fe and 12fe all are taking over
1 hr using cuda and 6.03.
Dave

Probably observations from A2133 "The Alfa Ultra-Deep Survey: Deep HI Observations at 0<z<0.16"

Quoting a very small piece of the abstract in the descriptive PDF file:
...it is possible to achieve noise of less than 50micro-Jy with integration times of about 40 hours per pointing, ...

"Integration time" is how much total observing time they want on each target, not a continuous observation. They requested 980 hours of observing time, there's about an hour most days on the Arecibo schedules through May 7 (as far as they go now).
                                                                Joe
ID: 884419 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 884497 - Posted: 12 Apr 2009, 7:43:50 UTC - in response to Message 884419.  

Thats interesting info I will read up and try to remember some of it.
6 gb on my mobo but I have trouble remembering what day it is. LOL
ID: 884497 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 884498 - Posted: 12 Apr 2009, 7:50:35 UTC - in response to Message 884451.  

I have the GA-EX58-UD5 with a 920 on it.The sata connections go out sideways
and dont impeed the video cards.I have just built this rig about 3 weeks ago,
and it is already level with my other rig an AMD 9650 quad.
I did have 2 9600gt cards in the quad but have split them so there is 1 in each rig now
ID: 884498 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 884715 - Posted: 12 Apr 2009, 20:38:18 UTC

The VLAR effect for CUDA is related to long arrays in the pulse folding code. Just for general background, here's the maximum array size plotted against Angle Range (AR).



And here's the data set:

__AR_ MaxPoT
0.0000 32768
0.0500 32768
0.0600 27307
0.0700 23406
0.0799 20506
0.0800 40960
0.0900 36409
0.1000 32768
0.1100 29789
0.1200 27307
0.1300 25206
0.1400 23406
0.1500 21845
0.1599 20493
0.1600 40960
0.1800 36409
0.2000 32768
0.2300 28494
0.2600 25206
0.3000 21845
0.3500 18725
0.4000 16384
0.4500 14564
0.5000 13107
0.6000 10923
0.7500 8738
1.0000 6554
1.4000 4681
2.0000 3277
2.8000 2341
4.0000 1638
5.6000 1170
8.0000 819
10.000 655

Those numbers were calculated for plotting rather than absolute accuracy. If you want a really accurate maximum pulsePoTlen for a particular angle range:

1. Divide 6553.6000917504 by the angle range
2. If the result rounded to nearest integer is greater than 40960, divide the unrounded value by 2 until it is small enough.
3. The rounded integer is the array length.

If the project chose to, they could change some parameters in the WU header and make that calculation meaningless. I do not believe that's likely, most factors have been in place since the year 2000. The ALFA receiver beam width is the most recent change.
                                                                Joe



ID: 884715 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21745
Credit: 7,508,002
RAC: 20
United Kingdom
Message 884927 - Posted: 13 Apr 2009, 14:00:19 UTC - in response to Message 884715.  

The VLAR effect for CUDA is related to long arrays in the pulse folding code. ...

Thanks for the plot, interesting.

Is the problem in that routine the amount of GPU VRAM required exceeding that physically available, or is it the way the pulse find is implimented?

ie: RAM limits or slow code?

Or is the CUDA architechture just simply unsuited for the pulse find?

Cheers,
Martin


See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 884927 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 885199 - Posted: 14 Apr 2009, 1:47:03 UTC - in response to Message 884927.  

The VLAR effect for CUDA is related to long arrays in the pulse folding code. ...

Thanks for the plot, interesting.

Is the problem in that routine the amount of GPU VRAM required exceeding that physically available, or is it the way the pulse find is implimented?

ie: RAM limits or slow code?

Or is the CUDA architechture just simply unsuited for the pulse find?

Cheers,
Martin

It shouldn't be RAM limits, the array being analyzed consists of single floats so 40960*4 = 160 KiB max. The pattern of accessing the data is considerably different than you'd expect for graphics/video processing, though, and it's an open question whether the Nvidia guys gained a clear enough understanding of how fast folding works to produce the most efficient code.
                                                               Joe
ID: 885199 · Report as offensive
Carl Johnson[SETI.USA]
Volunteer tester
Avatar

Send message
Joined: 18 Feb 05
Posts: 33
Credit: 5,269,022
RAC: 0
United States
Message 885291 - Posted: 14 Apr 2009, 12:31:53 UTC

This VLAR thing makes the tasks take so long?
When I went to 6.6.20 I reset the project and all CUDA tasks were taking ~9 minutes. 3 Days later they were taking ~2 hours. I reset the project and viola, back down to ~9 minutes. Problem is, they're creeping back into the half hour range, I have 1000 tasks in queue(don't ask) and I don't want to reset to take the hit on all the tasks, but what other solution is there?

ID: 885291 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19706
Credit: 40,757,560
RAC: 67
United Kingdom
Message 885294 - Posted: 14 Apr 2009, 12:40:59 UTC - in response to Message 885291.  

This VLAR thing makes the tasks take so long?
When I went to 6.6.20 I reset the project and all CUDA tasks were taking ~9 minutes. 3 Days later they were taking ~2 hours. I reset the project and viola, back down to ~9 minutes. Problem is, they're creeping back into the half hour range, I have 1000 tasks in queue(don't ask) and I don't want to reset to take the hit on all the tasks, but what other solution is there?

Do a search for "kill VLAR"
ID: 885294 · Report as offensive
Carl Johnson[SETI.USA]
Volunteer tester
Avatar

Send message
Joined: 18 Feb 05
Posts: 33
Credit: 5,269,022
RAC: 0
United States
Message 885302 - Posted: 14 Apr 2009, 13:42:05 UTC - in response to Message 885294.  
Last modified: 14 Apr 2009, 13:46:44 UTC

Oh my...I don't have any tasks left!
Thanks knight.

Heres the link for everyone else.
kill vlar
There's 2 versions, one for single gpu and the other for multi gpu.
It's a .rar, so if you don't have winrar.
download.com

ID: 885302 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8

Message boards : Number crunching : CUDA cards: SETI crunching speeds


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.