CUDA cards: SETI crunching speeds

Message boards : Number crunching : CUDA cards: SETI crunching speeds
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 8 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 859439 - Posted: 29 Jan 2009, 21:50:20 UTC

Last night, Martin asked if we were ready to do any speed assessments yet.

Here's my first attempt for 9800GT and 9800GTX+ cards:



The numbers on the graph legend show the number of tasks plotted for each card (excluding -9 overflows). That's almost 150 WUs already for the three cards together - see how tightly clustered the points are.

In real-world terms, the shorties (VHAR) are taking 7m30s on the 9800GT, and 6min on the 9800GTX+. All are running stock v6.08 (note the VLAR penalty), on quads with all four cores (sometimes five cores!) loaded with AP or other projects (Einstein, LHC, CPDN). All Windows XP, BOINC v6.4.5, NVidia driver v181.20

One serious point I'll be looking out for - where (what AR) does the VLAR penalty drop out?
ID: 859439 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 859442 - Posted: 29 Jan 2009, 21:58:33 UTC

For those who are interested in checking their own card's performance, here's the script I used to get data out of the local BOINC files:

findstr /p "<true_angle_range>" %1\projects\setiathome.berkeley.edu\*.* >> all_ar.txt
findstr /C:"SETI@home\] Starting" %1\stdoutdae.txt > Start_times.tmp
findstr /V /C:"task" Start_times.tmp > Start_times.txt
del Start_times.tmp
findstr /C:"SETI@home\] Computation" %1\stdoutdae.txt > End_times.txt
start Wordpad all_ar.txt

This is designed to be used from a shortcut with a parameter (%1) passing the location of the BOINC root directory (not the project directory, as in my previous attempt).

The next stage really needs a database, to link the start_time, end_time and AR for each WU name.
ID: 859442 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 859464 - Posted: 29 Jan 2009, 23:00:22 UTC - in response to Message 859442.  

For those who are interested in checking their own card's performance, here's the script I used to get data out of the local BOINC files:

findstr /p "<true_angle_range>" %1\projects\setiathome.berkeley.edu\*.* >> all_ar.txt
findstr /C:"SETI@home\] Starting" %1\stdoutdae.txt > Start_times.tmp
findstr /V /C:"task" Start_times.tmp > Start_times.txt
del Start_times.tmp
findstr /C:"SETI@home\] Computation" %1\stdoutdae.txt > End_times.txt
start Wordpad all_ar.txt

This is designed to be used from a shortcut with a parameter (%1) passing the location of the BOINC root directory (not the project directory, as in my previous attempt).

The next stage really needs a database, to link the start_time, end_time and AR for each WU name.

Nice one, Richard. I'll try to find the time this weekend to start on "son-of-Vac".

Whilst it may be minimal, is there a risk that the file locking whilst the script is going through the local files could cause any problem for normal crunching?

F.
ID: 859464 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 859485 - Posted: 29 Jan 2009, 23:52:48 UTC - in response to Message 859464.  

Whilst it may be minimal, is there a risk that the file locking whilst the script is going through the local files could cause any problem for normal crunching?

F.

To be honest, I don't know. I suppose there must be a risk, but I didn't break anything while developing and testing. Perhaps if your rig is seriously out of balance (humunguous graphics card on a Z80 processor) it might be wise to copy stdoutdae.txt to another directory for searching, but it didn't seem necessary on the Qs.

One problem I'll have to watch for is when BOINC rotates the logs and starts a new stdoutdae.txt - not a good idea to run the script at that moment, or while BOINC is actively downloading or starting/finishing tasks.
ID: 859485 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 859489 - Posted: 29 Jan 2009, 23:58:08 UTC - in response to Message 859439.  

Good work and that's a good 'teaser'. Keep us posted.

Meanwhile, my GeForce 8600 GT (and Crunch3r Linux build CUDA) is rattling through what seems like a thousand or so VHARs. What was Arecibo doing for that lot to be generated?! Never had so many WUs whizz through so quickly!

The system is also chewing on a few APs... Shame they have no CUDA speedup ;-(

Also, the RAC is being very slow to pick up. Is that just a hang-over from Berkeley headaches?...

Happy fast crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 859489 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 859497 - Posted: 30 Jan 2009, 0:18:46 UTC - in response to Message 859489.  

ID: 859497 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 859639 - Posted: 30 Jan 2009, 6:46:44 UTC - in response to Message 859489.  

Good work and that's a good 'teaser'. Keep us posted.

Meanwhile, my GeForce 8600 GT (and Crunch3r Linux build CUDA) is rattling through what seems like a thousand or so VHARs. What was Arecibo doing for that lot to be generated?! Never had so many WUs whizz through so quickly!

The system is also chewing on a few APs... Shame they have no CUDA speedup ;-(

Also, the RAC is being very slow to pick up. Is that just a hang-over from Berkeley headaches?...

Happy fast crunchin',
Martin

If you are chewing on AP's then a combination of the AP validator problems and your increased turnround rate for the VHARs vs your wingmen will be affecting the pick up in RAC. My pendings are now 10x my RAC (largely because of the number of AP's I have returned in the past few days).

F.
ID: 859639 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 859805 - Posted: 30 Jan 2009, 18:27:46 UTC

Enough data to justify an expanded view, I think.


(Direct link)

I'm pleased to say that the single point 'below the curve' (GTX+, AR=0.434021426) was a late-onset -9 overflow: 1139032288. Always nice when an exceptional point confirms the methodology!
ID: 859805 · Report as offensive
Profile Adrian Taylor
Volunteer tester
Avatar

Send message
Joined: 22 Apr 01
Posts: 95
Credit: 10,933,449
RAC: 0
United Kingdom
Message 859845 - Posted: 30 Jan 2009, 21:38:25 UTC - in response to Message 859805.  

ok Richard less of this shirking, where are all the other nvidia cards like my 8800 GT ?

nothing worse than a job half done lol :-)

seriously though your work with these graphs over the last years has been invaluable , cheers
63. (1) (b) "music" includes sounds wholly or predominantly characterised by the emission of a succession of repetitive beats
ID: 859845 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 859847 - Posted: 30 Jan 2009, 21:49:23 UTC - in response to Message 859845.  
Last modified: 30 Jan 2009, 22:16:16 UTC

If anyone else would like to start scraping data and send it to me, I'll happily plot it on the same graph as my own cards.

I would need:

Specification/model name of the CUDA card
Link to the host ID for the results on this board (for checking outliers)

Data in two identifiable columns:
Angle_Range, Crunch_time (ideally in seconds)

(any column seperator will do, so long as it's consistent)

Either by PM to this board, or if the data's too long for that, by email to initial dot surname at btinternet dot com. Zip or 7-zip preferred, any reasonable archive will do.

[Edit - or at a pinch, send me all three output files from the script I posted earlier. But make sure you run it several times over a period of two or three days - the Angle Ranges are collected from future work in your cache, and the timings - do'h - from completed work in your logs. They need to overlap.]
ID: 859847 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 860866 - Posted: 1 Feb 2009, 22:20:25 UTC

Two for the price of one tonight. Additional reportage from BMaytum's E8400 + 8800GT (Thanks Bruce - the second email came through fine too).

General overview:


(Direct link)

Detail:


(Direct link)

NOTE from Bruce: "I'm using Raistmer's optimized Version 8a "Teamwork" applications with ncpus=3 on my dual-core system, so there are CPU-crunched WUs (via AK_V8 app) mixed in with the Cuda WUs" - and you can see that clearly in the graphs. His 8800GT and my 9800GT have the same specifications (112 shaders / 600 MHz graphics clock / 1500 MHz processor clock), so it's good to see that Raistmer's CUDA version matches my stock v6.08 so closely. I've cleaned out some -9 overflows from my own lists, but left Bruce's exactly as they arrived.

I can only speculate about those top-right outliers of Bruce's (~5,000 sec for a shorty) - I think they've been purged from the task list already. But my guess is that they are CPU tasks which spent some time pre-empted, waiting to run: my methodology was really aimed at the CUDA tasks which don't do this sort of thing, so I'm only monitoring start times and end times, not what happens in between.

We're narrowing the gap where the VLAR penalty might kick in, but still nothing to plot between AR=0.014563 and AR=0.259335
ID: 860866 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 860902 - Posted: 1 Feb 2009, 23:33:19 UTC - in response to Message 860866.  


NOTE from Bruce: "I'm using Raistmer's optimized Version 8a "Teamwork"

I can only speculate about those top-right outliers of Bruce's (~5,000 sec for a shorty) - I think they've been purged from the task list already. But my guess is that they are CPU tasks which spent some time pre-empted, waiting to run: my methodology was really aimed at the CUDA tasks which don't do this sort of thing, so I'm only monitoring start times and end times, not what happens in between.




I may have an explanation about those outliers. I'm also running Raistmer's V8b. I have noticed that if it is running one GPU and one or more CPUs when the first GPU WU is finished it will grab one of the WUs off the CPU and finish it instead of drawing a new WU. The CPU then gets another fresh WU for itself.



PROUD MEMBER OF Team Starfire World BOINC
ID: 860902 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 861029 - Posted: 2 Feb 2009, 8:40:55 UTC - in response to Message 860902.  
Last modified: 2 Feb 2009, 8:44:14 UTC


I may have an explanation about those outliers. I'm also running Raistmer's V8b. I have noticed that if it is running one GPU and one or more CPUs when the first GPU WU is finished it will grab one of the WUs off the CPU and finish it instead of drawing a new WU. The CPU then gets another fresh WU for itself.


Hm... It shouldn't do that unless BOINC goes crazy in "High priority" mode...

Higher wall clock times almost certainly can be attributed to CPU-based tasks.
Anyway it's very easy to confirm or reject - one just need to look at result stderr. If there was task passing from CPU to GPU there will be mesasages clearly indicating task restart.

BTW I finished to clear beta task queue (got more than 1000 taks because of server glitch) and wanna participate in runtime data collection here.
9600GSO card onboard
ID: 861029 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 861032 - Posted: 2 Feb 2009, 8:46:23 UTC - in response to Message 861029.  


I may have an explanation about those outliers. I'm also running Raistmer's V8b. I have noticed that if it is running one GPU and one or more CPUs when the first GPU WU is finished it will grab one of the WUs off the CPU and finish it instead of drawing a new WU. The CPU then gets another fresh WU for itself.

Hm... It shouldn't do that unless BOINC goes crazy in "High priority" mode...

Higher wall clock times almost certainly can be attributed to CPU-based tasks.
Anyway it's very easy to confirm or reject - one just need to look at result stderr. If there was task passing from CPU to GPU there will be mesasages clearly indicating task restart.

BTW I finished to clear beta task queue (got more than 1000 taks because of server glitch) and wanna participate in runtime data collection here.
9600GSO card onboard

Welcome aboard. I know I promised to send you a copy of my database, but real life intervened - I'll finish tidying it up this morning and email it over.
ID: 861032 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 861038 - Posted: 2 Feb 2009, 9:55:28 UTC - in response to Message 861032.  

Welcome aboard. I know I promised to send you a copy of my database, but real life intervened - I'll finish tidying it up this morning and email it over.

No prob, will run your script meantime.
ID: 861038 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 861064 - Posted: 2 Feb 2009, 12:36:05 UTC - in response to Message 861029.  



Hm... It shouldn't do that unless BOINC goes crazy in "High priority" mode...





That's exactly what it has been doing Raistmer. The new WUs they are sending me mostly have a completion date of no more than a week so it is running them in high priority for the most part. It's really kind of fun to watch and isn't hurting anything except Richard's chart. :)



PROUD MEMBER OF Team Starfire World BOINC
ID: 861064 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 861159 - Posted: 2 Feb 2009, 17:33:26 UTC - in response to Message 859847.  

If anyone else would like to start scraping data and send it to me, I'll happily plot it on the same graph as my own cards.

I would need:

Specification/model name of the CUDA card
Link to the host ID for the results on this board (for checking outliers)

Data in two identifiable columns:
Angle_Range, Crunch_time (ideally in seconds)

(any column seperator will do, so long as it's consistent)

Either by PM to this board, or if the data's too long for that, by email to initial dot surname at btinternet dot com. Zip or 7-zip preferred, any reasonable archive will do.

[Edit - or at a pinch, send me all three output files from the script I posted earlier. But make sure you run it several times over a period of two or three days - the Angle Ranges are collected from future work in your cache, and the timings - do'h - from completed work in your logs. They need to overlap.]

OK, I've put together a grep version of your instructions to generate the three files. Astropulse WUs are filtered out also.

So... Before I dive into any thoughtful typing... How are you pulling the three files together?

Cheers,
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 861159 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 861175 - Posted: 2 Feb 2009, 17:52:37 UTC - in response to Message 861159.  

OK, I've put together a grep version of your instructions to generate the three files. Astropulse WUs are filtered out also.

So... Before I dive into any thoughtful typing... How are you pulling the three files together?

Cheers,
Martin

I think you really need database technology for that. I do a lightly-filtered text import from the three files, to keep the data and lose some of the excess wordage: then a query with text manipulation to finish knocking them into shape (e.g. you have to change task names into WU names in the 'start' and 'end' tables, by stripping off the _0 or _1).

Once you have three three tables with primary keys on the WU name (easiest way of rejecting duplicates), it's a doddle to to do a 'join' query and a datediff to get the answers we're looking for.

Sorry to say this is a M$ house, so the current version is in Access. I can import it into SQL Server if you want, and let you have the SQL form of the table and view structure, if that would help.
ID: 861175 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 861354 - Posted: 3 Feb 2009, 0:19:39 UTC

This time, we have additional reportage from Raistmer's 9600GSO. As with BMaytum, Raistmer is running his own mod (surprise!), so some times come from the CPU, and some from the GPU.


(Direct link)
ID: 861354 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20289
Credit: 7,508,002
RAC: 20
United Kingdom
Message 861661 - Posted: 3 Feb 2009, 16:54:54 UTC - in response to Message 861175.  

... Once you have three three tables with primary keys on the WU name (easiest way of rejecting duplicates), it's a doddle to to do a 'join' query and a datediff to get the answers we're looking for.

Sorry to say this is a M$ house, so the current version is in Access...

Also a doddle to script it unix-style. (You can have a mighty powerful database with sort, uniq, bash (or awk) and a text file or three... :-) )

I've scraped 67 results so far and they are all a very boring AR=2.7 and they each take about 16mins to run on my nVidia 8600GT. Meanwhile, the CPU (AMD AthlonXP X2) is also running two AP WUs.

Hopefully it will run through something more varied soon-ish...

The script is available for anyone interested.

Happy fast crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 861661 · Report as offensive
1 · 2 · 3 · 4 . . . 8 · Next

Message boards : Number crunching : CUDA cards: SETI crunching speeds


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.