CUDA victim #1

Message boards : Number crunching : CUDA victim #1
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Instytut Dziennikarstwa
Volunteer tester

Send message
Joined: 27 Mar 03
Posts: 19
Credit: 20,629,934
RAC: 0
Poland
Message 841556 - Posted: 18 Dec 2008, 19:20:19 UTC
Last modified: 18 Dec 2008, 19:20:36 UTC

ID: 841556 · Report as offensive
Profile Dorsilfin
Volunteer tester

Send message
Joined: 28 Jul 08
Posts: 69
Credit: 4,484,890
RAC: 0
United States
Message 841560 - Posted: 18 Dec 2008, 19:27:27 UTC

im popping out plenty of those super small WU's in literally 20 seconds of CPU time, 17 seconds to feed the GPU and then 3 seconds of crunching.. = Done.


I saw a few, and im like wow.. sucks for the computer that took that look to do it
My City
ID: 841560 · Report as offensive
Profile Instytut Dziennikarstwa
Volunteer tester

Send message
Joined: 27 Mar 03
Posts: 19
Credit: 20,629,934
RAC: 0
Poland
Message 841563 - Posted: 18 Dec 2008, 19:28:52 UTC - in response to Message 841560.  

well, I don't mind the speed, I mind the completely different result for CUDA vs. non-CUDA crunch
ID: 841563 · Report as offensive
Profile Dorsilfin
Volunteer tester

Send message
Joined: 28 Jul 08
Posts: 69
Credit: 4,484,890
RAC: 0
United States
Message 841564 - Posted: 18 Dec 2008, 19:29:18 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=381210455

http://setiathome.berkeley.edu/workunit.php?wuid=381210455

http://setiathome.berkeley.edu/workunit.php?wuid=381210434

My City
ID: 841564 · Report as offensive
Profile Dorsilfin
Volunteer tester

Send message
Joined: 28 Jul 08
Posts: 69
Credit: 4,484,890
RAC: 0
United States
Message 841566 - Posted: 18 Dec 2008, 19:29:55 UTC - in response to Message 841563.  

well, I don't mind the speed, I mind the completely different result for CUDA vs. non-CUDA crunch



I havnt gotten any in a while, maybe it was miss matching them with Cuda users.. *Shrug*


My City
ID: 841566 · Report as offensive
Profile Euan Holton
Avatar

Send message
Joined: 4 Sep 99
Posts: 65
Credit: 17,441,343
RAC: 0
United Kingdom
Message 841582 - Posted: 18 Dec 2008, 19:45:52 UTC - in response to Message 841563.  

well, I don't mind the speed, I mind the completely different result for CUDA vs. non-CUDA crunch

I was lurking on the boards when the first SSE optimised applications came out, and there was considerable outcry then that they gave an 'unfair' advantage, but it died down once optimised versions became available on more platforms and more people made use of them.

I can imagine that median CPU performance CUDA-enabled machine owners are actually quite glad of this, as it may come to pass that their box will outperform machines that have a high-end CPU and memory but either have ATI graphics or only basic graphics capabilities, eg servers and dedicated crunch boxes.
ID: 841582 · Report as offensive
Profile popandbob
Volunteer tester

Send message
Joined: 19 Mar 05
Posts: 551
Credit: 4,673,015
RAC: 0
Canada
Message 841585 - Posted: 18 Dec 2008, 19:49:21 UTC

These are the result of a complete lack of testing!
In the Beta test they were NOT validated against stock apps.
There is still a problem with high angle ranges.
This app really should not have been released so quick.


Do you Good Search for Seti@Home? http://www.goodsearch.com/?charityid=888957
Or Good Shop? http://www.goodshop.com/?charityid=888957
ID: 841585 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 841602 - Posted: 18 Dec 2008, 20:16:02 UTC
Last modified: 18 Dec 2008, 20:25:51 UTC

Excuse me if I'm mistaken, but most of us spent all of 4-5 days testing it in Beta, and many still had results to upload that never got in becuase the upload server went out. Maybe 1/3 of the tasks I did were validated against. The app couldn't possibly be much more than it was a few days ago when we started testing. Why does the public have it?
ID: 841602 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 841612 - Posted: 18 Dec 2008, 20:29:49 UTC - in response to Message 841556.  

http://setiathome.berkeley.edu/workunit.php?wuid=381019093
What the hell?

It's pretty obvious, actually.

Your result, and the other result did not match. The work unit is being sent out to a third cruncher.

You really need to be much more patient: credit has never been granted unless the work validates, and it has not validated yet. The key word is "yet."

This has been true since SETI@Home moved to BOINC.


ID: 841612 · Report as offensive
Profile Wayne Frazee
Volunteer tester
Avatar

Send message
Joined: 18 Jul 00
Posts: 26
Credit: 1,939,306
RAC: 0
United States
Message 841616 - Posted: 18 Dec 2008, 20:39:16 UTC - in response to Message 841602.  

Excuse me if I'm mistaken, but most of us spent all of 4-5 days testing it in Beta, and many still had results to upload that never got in becuase the upload server went out. Maybe 1/3 of the tasks I did were validated against. The app couldn't possibly be much more than it was a few days ago when we started testing. Why does the public have it?


Agreed. Many of the same issues coming up on the forums were in progress on the beta boards including a number of possible bugs getting the CUDA extensions to work.

Additionally when you release something like this, there is a host of CUDA driver documentation and QandA that you like to have available to the community at the same time.

May I suggest perhaps grabbing a couple long term active technical members from the community to actively contribute to a QandA wiki or similar that provides more comprehensive assistive documentation for the community for these kinds of releases?
-W
"Any sufficiently developed bug is indistinguishable from a feature."
ID: 841616 · Report as offensive
Profile SATAN
Avatar

Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,129,006
RAC: 0
United Kingdom
Message 841620 - Posted: 18 Dec 2008, 20:42:18 UTC

Whilst not being able to benefit from the new version yet. It does appear rushed and untested. If even 10% of the users have a Cuda enabled video card, the project will not be able to maintain enough work.

I am more curious as to where the extra work seems to have gone. I thought when the project switched to multibeam there was going to be 14 times the amount of work to do. Now in 2 years has the crunching power available to this project increased by that fold? No. The project can't cope as it is, asking it to cope with this extra strain is just asking for trouble.
ID: 841620 · Report as offensive
Profile Euan Holton
Avatar

Send message
Joined: 4 Sep 99
Posts: 65
Credit: 17,441,343
RAC: 0
United Kingdom
Message 841654 - Posted: 18 Dec 2008, 21:37:43 UTC - in response to Message 841620.  

Well, to be honest, I do wonder if the almost unseemly haste to get the CUDA software out is connected to nVidia wanting to get some positive PR out at a given moment and being willing to drop the project some much needed cash (and presumably some technical expertese) in return for such.

What I'd really like to see in future versions of BOINC is an ability to farm out WUs to any spare compute resource a machine has, selecting the relevant application according to what's available, but whether the current science application support framework is flexible enough to allow that I do not know.
ID: 841654 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14661
Credit: 200,643,578
RAC: 874
United Kingdom
Message 841662 - Posted: 18 Dec 2008, 21:54:19 UTC - in response to Message 841612.  
Last modified: 18 Dec 2008, 21:54:43 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=381019093
What the hell?

It's pretty obvious, actually.

Your result, and the other result did not match. The work unit is being sent out to a third cruncher.

You really need to be much more patient: credit has never been granted unless the work validates, and it has not validated yet. The key word is "yet."

This has been true since SETI@Home moved to BOINC.

And two more interesting observations:

1) It's the old -9 overflow question again: Linux derivative of Lunatics AK_V8 ran to completion, finding one triplet: stock CUDA found an extra 29 spikes and bailed out early. Which one has the bug? Watch this space.

2) The latest SAH validator seems to have inherited the Astropulse validator bug - "Checked, but no consensus" has been replaced by "Valid", and "pending" has been replaced by "0.00".

We progress - backwards.
ID: 841662 · Report as offensive
Profile SATAN
Avatar

Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,129,006
RAC: 0
United Kingdom
Message 841682 - Posted: 18 Dec 2008, 22:12:05 UTC
Last modified: 18 Dec 2008, 22:32:57 UTC

Richard, totally agree with you. After the massive steps forward made by the optimizers and the switch to MB, things do appear to go backwards.
ID: 841682 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 841685 - Posted: 18 Dec 2008, 22:28:13 UTC - in response to Message 841662.  


1) It's the old -9 overflow question again: Linux derivative of Lunatics AK_V8 ran to completion, finding one triplet: stock CUDA found an extra 29 spikes and bailed out early. Which one has the bug? Watch this space.


At beta I seen next situation on my PC:
After few VLARS that crashed video driver on Vista screen was distorted (look beta forum for example - there is some picture ) and _every_ task after that finished in ~15 seconds with -9 error.
I had to reboot OS to solve situation.

ID: 841685 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 841801 - Posted: 19 Dec 2008, 4:11:21 UTC - in response to Message 841662.  

http://setiathome.berkeley.edu/workunit.php?wuid=381019093
What the hell?

It's pretty obvious, actually.

Your result, and the other result did not match. The work unit is being sent out to a third cruncher.

You really need to be much more patient: credit has never been granted unless the work validates, and it has not validated yet. The key word is "yet."

This has been true since SETI@Home moved to BOINC.

And two more interesting observations:

1) It's the old -9 overflow question again: Linux derivative of Lunatics AK_V8 ran to completion, finding one triplet: stock CUDA found an extra 29 spikes and bailed out early. Which one has the bug? Watch this space.

2) The latest SAH validator seems to have inherited the Astropulse validator bug - "Checked, but no consensus" has been replaced by "Valid", and "pending" has been replaced by "0.00".

We progress - backwards.

If, as a rule of thumb one out of every ten bug fixes introduces a new bug, then it seems intuitively that the one way to not introduce new bugs is to stop fixing existing ones.

I know this isn't a popular position, but given the size of the task to be done, and the size of the staff available to do it, it seems to me that, in general, things are going pretty well.

Sure, it'd be better if some of these had been caught in Beta, and maybe they would have if the Beta had a longer cycle, but if we weren't complaining about the spurious -9 overflows and other issues, we'd be complaining about all the great stuff that's stuck in Beta.
ID: 841801 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 841802 - Posted: 19 Dec 2008, 4:13:07 UTC - in response to Message 841800.  

I am running the CUDA version and the last 20 workunits have finished in an average time of 3 minutes only. My system has two NVidia 260 OCs installed in SLI mode. Does CUDA take advantage of both GPUs? Also, since the NVidia 260 is one of the first GPUs to perform double precision floating point instructions, will this provide a significant edge. Does the SETI client use double precision or single precision floating point in its fourier transform module?


No ...

See:

cudaAcc_initializeDevice: Found 1 CUDA device(s):
Device 1 : GeForce GTX 260
cudaAcc_initializeDevice is determiming what CUDA device to use...
user specified SETI to use CUDA device 1: GeForce GTX 260
SETI@home using CUDA accelerated device GeForce GTX 260

BOINC is only seeing one CUDA device and is using it ... if SLI mode links the two GPU cards ... that is why. You should be able to see if BOINC sees one or two cards on start up of BOINC where it tells you how many devices it found.
ID: 841802 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 841803 - Posted: 19 Dec 2008, 4:17:51 UTC - in response to Message 841802.  

I am running the CUDA version and the last 20 workunits have finished in an average time of 3 minutes only. My system has two NVidia 260 OCs installed in SLI mode. Does CUDA take advantage of both GPUs? Also, since the NVidia 260 is one of the first GPUs to perform double precision floating point instructions, will this provide a significant edge. Does the SETI client use double precision or single precision floating point in its fourier transform module?


No ...

See:

cudaAcc_initializeDevice: Found 1 CUDA device(s):
Device 1 : GeForce GTX 260
cudaAcc_initializeDevice is determiming what CUDA device to use...
user specified SETI to use CUDA device 1: GeForce GTX 260
SETI@home using CUDA accelerated device GeForce GTX 260

BOINC is only seeing one CUDA device and is using it ... if SLI mode links the two GPU cards ... that is why. You should be able to see if BOINC sees one or two cards on start up of BOINC where it tells you how many devices it found.

I would have to imagine that if SLI was in place, CUDA would only see one GPU, because that's the way SLI appears to any games that go to use it, because of the hardware drivers for the cards and the configuration of them. It's like a RAID array. Multiple physical discs, one logical disc. With SLI, multiple physical cards, one logical card.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 841803 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 841804 - Posted: 19 Dec 2008, 4:18:52 UTC

Ned,

Technically it is worse than that. ONe study I had in college days indicated that the "real" number of bugs actually stays constant. Each bug removed installs a new one that is more subtle and or less likely to cause problems.

The one in ten number is actually for each 10 lines of code the AVERAGE programmer makes one error. This goes down with skill level to rise to higher LOC counts with very skilled programmers only making a mistake about one in 200 LOC.

That is why higher level languages are "better" for writing code in that the error rate per feature decreases. 5th gen languages like PowerBuilder and the like greatly reduce the code required where we were making 100 window applications with about 15,000 lines of code and most of those were actually in the framework we were using (in which I had to override code to fix bugs in the FW)...
ID: 841804 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 841809 - Posted: 19 Dec 2008, 4:36:07 UTC - in response to Message 841804.  

Technically it is worse than that. ONe study I had in college days indicated that the "real" number of bugs actually stays constant. Each bug removed installs a new one that is more subtle and or less likely to cause problems.

The one in ten number is actually for each 10 lines of code the AVERAGE programmer makes one error. This goes down with skill level to rise to higher LOC counts with very skilled programmers only making a mistake about one in 200 LOC.

It probably depends whose statistics you use, and how you measure.

Besides, 87.4% of all statistics are made up.
ID: 841809 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : CUDA victim #1


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.