PCIe speed and CUDA performance

Message boards : Number crunching : PCIe speed and CUDA performance
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1066894 - Posted: 15 Jan 2011, 6:47:56 UTC - in response to Message 1066789.  

I hope this is not too far off topic, but has anybody been able to verify a performance difference by changing the PCI buss clock in the bios?
I have always locked mine at the standard 100MHz..
Is there a performance gain by clocking the buss to 105 or 110, should the system handle it?


Try it an let us know! There would be a difference is the rate of transmission across the bus, but I would believe you would have more benefit on Oc'ing your actual video card.

Basically the MHz speed of the bus is the speed limit on the lanes, where are the lanes are physical connections. This goes back to the whole 100 million clock cycles, each lane transferring (1 bit per clock cycle x the number of lanes) etc. of course not taking into account bus overhead, resends, and other quirks of the system. There were guys in the flight simulation community talking about smoother frame rates with their rigs when messing with it, however you are loading multiple Gigabytes of tiles and models all the time while flying.
Traveling through space at ~67,000mph!
ID: 1066894 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20291
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1066943 - Posted: 15 Jan 2011, 13:22:16 UTC - in response to Message 1066789.  

I hope this is not too far off topic, but has anybody been able to verify a performance difference by changing the PCI buss clock in the bios?

Sorry, but I would expect any difference in performance for s@h to be negligible.


Happy fast crunchin',
Martin


See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1066943 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1066952 - Posted: 15 Jan 2011, 14:12:55 UTC - in response to Message 1066943.  
Last modified: 15 Jan 2011, 14:19:50 UTC

I hope this is not too far off topic, but has anybody been able to verify a performance difference by changing the PCI buss clock in the bios?

Sorry, but I would expect any difference in performance for s@h to be negligible.


Happy fast crunchin',
Martin



I tried this by accident when Overclocking the CPU and didn't lock
the PCI-E bus, only gave instabillity. Probably should up the
NorthBridge
Voltage.

And I didn't notice any speed increase for CUDA, only for VIDEO, using an
'older', 9800GTX+, I saw a litle performance increase, but eventhough I had Upped the NorthBridge voltage, the card got too hot and became instable.

IMO, you can't compaire (streaming) VIDEO and CUDA!?
ID: 1066952 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1067041 - Posted: 15 Jan 2011, 18:56:17 UTC - in response to Message 1066952.  


Sorry, but I would expect any difference in performance for s@h to be negligible.


Happy fast crunchin',
Martin



Yeah extremely negligible. In an application that outputs graphic intensive pieces like gaming or cad etc you can see between a 1 and 5 fps difference with a 25% overclock on the PCIe bus. Not exactly what I would call astounding.


I tried this by accident when Overclocking the CPU and didn't lock
the PCI-E bus, only gave instabillity. Probably should up the
NorthBridge
Voltage.

And I didn't notice any speed increase for CUDA, only for VIDEO, using an
'older', 9800GTX+, I saw a litle performance increase, but eventhough I had Upped the NorthBridge voltage, the card got too hot and became instable.

IMO, you can't compaire (streaming) VIDEO and CUDA!?


Yeah most people report that, on my system when I run it at 4Ghz I have to run the PCIe bus at 101 instead of 100. Don't ask me why, as it's a long story, but it makes my machine stable and that's all that matters. ;)

Traveling through space at ~67,000mph!
ID: 1067041 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1067111 - Posted: 15 Jan 2011, 22:28:00 UTC
Last modified: 15 Jan 2011, 22:31:02 UTC

The earlier discussion of speed and bandwidth never went on to discuss latency and other effects which are also important to actual throughput. In the thread http://www.xtremesystems.org/forums/showthread.php?t=225823 there are quite a few screenshots of PCIe peak throughput vs. size of transfer for ATI cards, I'd presume nVidia cards would show similar effects.

For MB CUDA work the largest transfer from CPU to GPU is 8 MiB (the baseline smoothed data) followed by a threshold array for pulse finding at nearly 2 MiB IIRC. Both of those are done just at initialization, later there's mainly only parameters passed with calls to kernels involving maybe 16 or 32 bytes. Transfers from GPU to CPU are more numerous. The largest is probably a Power array for spike finding at 128K FFT length, the array is 512 KiB and there might be a few hundred transferred. Basically, mostly what comes back from the GPU is data which may be a candidate for best_spike, best_pulse, or best_gaussian.

The PCIe Speed Test v0.1 discussed in that thread was replaced by v0.2, available from http://developer.amd.com/GPU/ATISTREAMPOWERTOY/Pages/default.aspx. Perhaps that cures the tendency to crash at the largest transfer sizes. The test is of course only for ATI GPUs, but obviously something similar could be done in OpenCL or CUDA and may already be available someplace which my brief search didn't find.
                                                               Joe
ID: 1067111 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1067158 - Posted: 15 Jan 2011, 23:57:10 UTC - in response to Message 1067111.  
Last modified: 15 Jan 2011, 23:58:25 UTC

Thanks to all!


As non native english speaker it's not easy to follow this whole thread..


Joe, you would install a GTX4xx-5xx grafic card on a PCIe 1.0 x16 slot with only x8 speed?
Currently 3 CUDA apps would need to communicate over this one slot simultaneously.
I would see performance loss?
If you could guess, how much RAC loss [%] would be possible?
ID: 1067158 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34773
Credit: 261,360,520
RAC: 489
Australia
Message 1067162 - Posted: 16 Jan 2011, 0:04:29 UTC - in response to Message 1067158.  

Sutaru, you should be able to use 16x slot that is only 4x compatible without seeing any decrease in SETI performance.

Cheers.
ID: 1067162 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1067233 - Posted: 16 Jan 2011, 4:08:31 UTC - in response to Message 1067158.  

...
Joe, you would install a GTX4xx-5xx grafic card on a PCIe 1.0 x16 slot with only x8 speed?
Currently 3 CUDA apps would need to communicate over this one slot simultaneously.
I would see performance loss?
If you could guess, how much RAC loss [%] would be possible?

I know too little to guess a %.

8 lanes at 250 MBytes per second would clearly be enough, but in practice that rate isn't reached for actual data transfers. Also, a board with PCIe 1.0 is not using the latest and greatest chipset, that may be more important than raw theoretical speed. Information on the web about how PCIe actually performs is often in terms of how many frames per second a system achieves on some game, that doesn't translate back into hard data on how it might perform for S@H but better at games is probably also better for S@H.
                                                               Joe
ID: 1067233 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1067261 - Posted: 16 Jan 2011, 7:07:44 UTC - in response to Message 1067233.  


I know too little to guess a %.

8 lanes at 250 MBytes per second would clearly be enough, but in practice that rate isn't reached for actual data transfers. Also, a board with PCIe 1.0 is not using the latest and greatest chipset, that may be more important than raw theoretical speed. Information on the web about how PCIe actually performs is often in terms of how many frames per second a system achieves on some game, that doesn't translate back into hard data on how it might perform for S@H but better at games is probably also better for S@H.
                                                               Joe


Yeah speaking theoretical versus actual is two different things, but no longer than the lanes are and considering the speed of the current chipsets I wouldn't imagine it would be too far off. I always assume I'm only going to achieve 80% of what's promised so you would still be getting ~200MBps per lane. Even if you only achieved 10MBps per lane I think it would still be more than enough for Seti@Home especially considering some of the larger files are 512 Kb transferred possibly a couple hundred times(per Josef, ie I'm not sure what or when it's transferred) that would still only come out to possibly 4-5MB at the largest? Interesting thought experiment for sure, would love to see what someone in the know could pass along towards the amount of data being moved.
Traveling through space at ~67,000mph!
ID: 1067261 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65752
Credit: 55,293,173
RAC: 49
United States
Message 1067265 - Posted: 16 Jan 2011, 7:27:33 UTC
Last modified: 16 Jan 2011, 7:28:09 UTC

ID: 1067265 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1067293 - Posted: 16 Jan 2011, 11:43:00 UTC - in response to Message 1067158.  

Thanks to all!


As non native english speaker it's not easy to follow this whole thread..


Joe, you would install a GTX4xx-5xx grafic card on a PCIe 1.0 x16 slot with only x8 speed?
Currently 3 CUDA apps would need to communicate over this one slot simultaneously.
I would see performance loss?
If you could guess, how much RAC loss [%] would be possible?


Sutaru, this may help
I have 3 x GTX470's running in a DFI motherboard that has 3 PCIe 16 sockets that run at 8(V2) + 8(V2) + 1(V1). I have not conducted any scientific tests on this arrangement but a check back through my completed units indicates that the 3rd card in the x1, V1 socket takes no longer, on average, to complete a unit than the other 2 in the x8 V2 sockets.

I have another machine which has 3 GTS250's in an identical board. When the card's clock speeds are synch'ed, crunching times are identical.

I think you are worrying over nothing.

T.A.
ID: 1067293 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1067313 - Posted: 16 Jan 2011, 15:10:37 UTC
Last modified: 16 Jan 2011, 15:24:02 UTC

PCIe 1.0 Ñ…1 gives me 166 MB/s. PCIe 1.0 x4 gives me 626 MB/s. Measured with CUDA-Z. Slowing down CPU(3.3->2.5 GHz, E6420 CPU) chops some 2-3 % more. I was going to investigate how slowing down FSB and system memory would affect crunching, but it will take too long. Perhaps some other day.

I ran some tests with two 240s and there is difference when running x1(1.0) and almost no difference, when running x4(1.0). These days I'll run same tests with 460s and will test also GT240 at x4 again when 460s are present in the system.
I have x4 2.0 slot also on other system(one on the X58s), but unfortunately case composition allows me to use only thin, one slot thick, cards and mine aren't.

My guess is, three 460s, running two units each, would get affected performance on other slots with speeds less that x4. I observed noticeably decreased performance with my third GT240, but it was running x1 1.0. I'm especially interested what will happen with third cards, two 460s present, bcs of them running two WUs simultaneously, thus increasing the load on PCIe twice(rough guess). Later there will 2500K system with two x8 2.0 and one x4 2.0(this time usable) slots a, so I'll have space for further investigations.
Test motherboard is DFI UT X48-T2R. Two Ñ…16 2.0, running x16 and one x16 1.0, running x1 or x4, switchable.
ID: 1067313 · Report as offensive
Profile Tim Norton
Volunteer tester
Avatar

Send message
Joined: 2 Jun 99
Posts: 835
Credit: 33,540,164
RAC: 0
United Kingdom
Message 1067323 - Posted: 16 Jan 2011, 16:16:35 UTC - in response to Message 1067313.  

Hbomber

cool that's good info

so basically if you have a Pcie 1.0 x4 slot or better you should not see any noticeable reduction in CUDA performance

actually as a slight aside i thought gpu's were not supported in x1 slots but obviously they do work - i assume your 1x slot is full length or just one of the short ones?
Tim

ID: 1067323 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20291
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1067379 - Posted: 16 Jan 2011, 18:35:23 UTC - in response to Message 1067293.  

... I have 3 x GTX470's running in a DFI motherboard that has 3 PCIe 16 sockets that run at 8(V2) + 8(V2) + 1(V1). I have not conducted any scientific tests on this arrangement but a check back through my completed units indicates that the 3rd card in the x1, V1 socket takes no longer, on average, to complete a unit than the other 2 in the x8 V2 sockets. ...

An interesting confirmation there, thanks.

Which brings up the next question...

Are there any real world applications/instances where an x16 PCIe link to the GPU is actually needed?

Are there any games that 'hammer' the video card PCIe?


Happy fast crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1067379 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1067384 - Posted: 16 Jan 2011, 18:44:46 UTC
Last modified: 16 Jan 2011, 18:54:39 UTC

Yes, my x1/x4 1.0 slot is full size x16.

I used two mid AR WUs, and I'll place results according to WUs - WU1 top, WU2 down, the way the cards are positioned, according to their slot location, has grabbed the WU. Where is said one card - one card calculated WUs, where is said two card - then fast card ran WU1, slow(or second card) ran WU2.
WUs are absolutely similar, u'll see from first three test result pairs.
System is: CPU - C2D E6420@3320 MHz(FSB 415 x 8 multi), memory 2x1 GB @1040 MHz 5-5-5-16, Windows 2003 Server Enterprise R2 x64 SP2. There are no other active programs on the machine during the test, except Firebird database,, but its was idle atm, no connections or automatic stuff running.
The cards are Galaxy GT240 512 MB DDR5, clocked with same clocks at 600/1200/2025.
Here are the numbers(I have screenshots, but I'll write number down here):
One card, alone, running desktop, x16 2.0:
WU1 - 21:44
WU2 - 21:44
One card, running desktop , second present, but not crunching, both x16 2.0:
WU1 - 21:48
WU2 - 21:48
One card, second x16 2.0 slot, first running desktop:
WU1 - 21:46
WU2 - 21:46
Two cards x16 2.0:
WU1 - 21:49
WU2 - 21:50
One card x16 2.0(desktop), one card x1 1.0:
WU1 - 21:51
WU2 - 22:25 - significant!
One card x16 2.0(desktop), one card x4 1.0:
WU1 - 21:55
WU2 - 22:12
And last one, with CPU slowed down(but not FSB and memory) from 3.3 GHz to 2.5 GHz.
One card x16 2.0(desktop), one card x4 1.0:
WU1 - 22:05
WU2 - 22:21 - still significant
Where, interesting part, slowing down slot from x4 to x1, increased whole CPU time with 3 seconds for WU1 - 46->49, running on fast slot, but time for second WU, running on slow slot(x4 1.0), remained almost same(and fast it slightly descreased 50->49 seconds).
Also, slowing down CPU affected fast slot more, than slow slot. For WU1 on fast slot it went 46->54 seconds, for WU2, slow slot it went 50-55 seconds.

Let just mention, that running desktop does matter for time, achieved by card, running WU1. And I didnt mess with the desktop, its just staying idle. Roughly 2 seconds on basis above. SO u can substract these two seconds from any results, ran on card 1, to achieve exact difference with card 2.
Btw, there are also seen speed differences between slot 0 and slot 1, no matter the same speed of slots - x16 2.0. Those differences are very noticeable with 263.00 and 263.09 drivers and are nuisance with 266.35 and 266.44. I ran all tests with 266.35, bcs the are official beta. 266.44 are slightly faster for me, everywhere I tried them, but I don't trust them a much yet :)

So my conclusion is - with this relatively slow cards, u need x8(probably 2.0, judging by this - 4 times speed increase decreased difference between two cards twice) slot, for a card, different than card running the referent result, to achieve approx. same times. But! When reference card(lets say in my case the one running x16 2.0 in any test, WU1) is slowed to x8 too, they'll both suffer perhaps, sharing same bandwidth(as in case of P55 and P67 chipsets).

Maybe someone would say - ot doesnt matter. But it does matter to me and maybe someone else, so better such words not to be spoken, in general..

Well, this is not the conclusion of tests yet. I'll run same tests, as I wrote above, with 460s and I expect them to show more "contrast" results. Same time, I'll run same 340s, while 460 are present - it will clearly give us more detailed view over the case, when we expect for bus being saturated.

I'm running several other tests with 460s, regarding GPU utilization and number of units ran simultaneously, depending on OS and client software(stock vs Lunatics). When I publish these results, I'll do and complete PCIe speed tests.
ID: 1067384 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1067406 - Posted: 16 Jan 2011, 19:50:01 UTC - in response to Message 1067384.  
Last modified: 16 Jan 2011, 19:51:17 UTC

Interesting tests, but I hardly consider the times astounding especially considering no two work units are identical. No difference, in the first ones so I'm just going to look at the ones with differences and make some sense of this.

Two cards x16 2.0:
WU1 - 21:49
WU2 - 21:50


Less than 1% difference. 0.08%

One card x16 2.0(desktop), one card x1 1.0:
WU1 - 21:51
WU2 - 22:25 - significant!


Less than a 3% difference. 2.53%

One card x16 2.0(desktop), one card x4 1.0:
WU1 - 21:55
WU2 - 22:12


Less than 2%. 1.28%

And last one, with CPU slowed down(but not FSB and memory) from 3.3 GHz to 2.5 GHz.

One card x16 2.0(desktop), one card x4 1.0:
WU1 - 22:05
WU2 - 22:21 - still significant


Less than 2% difference. 1.19%

So on average between your different settings you could have a +/- 1.27% on each average WU calculation. I'm showing about the same differences in my system without change the cpu speed and bus speed etc. on my cuda units. I just had two gpu units complete on my system one in 26.07 and the other in 26.31. That's a 1.50% difference in the WU's run on the system in identical setups.

So I'm not quite sure what to make of all of it, but I do know what the link posted before tells me.

Here's something I think some might like to read in the link below this line...

A quick look at chipset PCI Express performance
The P67 brings PCIe 2.0 up to full speed @ the Tech Report


Two Sata6 SSD's in raid can only push the x16 bus out to 278MB/s. And that's really pushing the limits of a PCIe 2.0 x1 lane, well a little over half of it anyways. Bandwidth concerns are not the limiting factor in these calculations, I think that has been proven if you take the time to read through the posts here. It comes down to how heavy loaded your cpu is, I see about an 8% increase in speed in my gpu unit by simply freeing up cpu time to feed the gpu. Interesting none the less.
Traveling through space at ~67,000mph!
ID: 1067406 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1067411 - Posted: 16 Jan 2011, 20:16:17 UTC
Last modified: 16 Jan 2011, 20:28:58 UTC

Lack of difference in calculations in first three results show that two units are identical, especially in sane of testing purposes.
Second result shows that even presence of another PCIe device affects crunching times on fastest possible speed(related to PCIe speeds or not, it must be noted). Third one shows that desktop handling does affect performance and shows for how much.

And, as I told, this two cards are far from saturating the bus, but there is already a difference. it doesn't work alone. If u neglect several factors, giving u small performance increases, u get a really significant summary performance loss. If u consider this percents insignificant, I'm writing this second time, don't assume other think in same way. My tests are for those interested of having any percent under their belt, as Sutaru is.

You persist on continue giving irrelevant examples(the way SSD works and how driver stack, cache managers, I/O queue etc handles it is different than GPUs, way different) and speak about perfect world performance - what I've measures with CUDA-Z shows that actually about 60% of bandwidth is available in fact. If I was you, I won't play with numbers that easy.
We haven't seen any real numbers from you, IIRC.
ID: 1067411 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1067429 - Posted: 16 Jan 2011, 20:56:05 UTC - in response to Message 1067411.  
Last modified: 16 Jan 2011, 21:10:35 UTC

Lack of difference in calculations in first three results show that two units are identical, especially in sane of testing purposes.
Second result shows that even presence of another PCIe device affects crunching times on fastest possible speed(related to PCIe speeds or not, it must be noted). Third one shows that desktop handling does affect performance and shows for how much.

And, as I told, this two cards are far from saturating the bus, but there is already a difference. it doesn't work alone. If u neglect several factors, giving u small performance increases, u get a really significant summary performance loss. If u consider this percents insignificant, I'm writing this second time, don't assume other think in same way. My tests are for those interested of having any percent under their belt, as Sutaru is.

You persist on continue giving irrelevant examples(the way SSD works and how driver stack, cache managers, I/O queue etc handles it is different than GPUs, way different)


Ah yeah I see what you are saying with the workunits, I failed to recognize that until you mentioned it. However 1-3% difference is made up of different units, namely cpu, or maybe the bus from cpu etc.

Irrelevant examples? WTF dude it's SSD's that connect INTO the PCIe bus, humm transferring data over the same bus is irrelevant? You do realize that the same stuff that operates an SSD's memory is the same technology that operates all the memory on a video card. The back ground particulars on how it work, why it works etc. doesn't matter what does matter was showing saturation on the bus, and two of the fastest transfer speeds around can't do it, that's the point that was being made.

and speak about perfect world performance - what I've measures with CUDA-Z shows that actually about 60% of bandwidth is available in fact. If I was you, I won't play with numbers that easy.
We haven't seen any real numbers from you, IIRC.


Really....insults? I have yet to see any real numbers from you either for all I know. CUDA-z shows 1073.64MB/s just sitting on desktop. So oh no I guess I'm only getting 1/8th the speed of my x16! Of course this is running 2 WU's plus 4 cpus's while watching a HD movie, two monitors, second one running BoincTasks, I have 10 fingers that typed this........I show the exact same differences work unit to work unit on a none changing bus speed or cpu speed. You want numbers like always check my stats they are open. So if I was you I would make sure you number show something more significant than the 1-3% loss that your cpu and chipset is having when feeding your gpu.

I hate getting personal, but damn dude, I quote and link actual facts about so have other people and used practical examples and logical figures even using YOUR numbers. And you still fail to see what is being talked about. I'm done because obviously you don't get it. But then again I suppose everyone who has posted there numbers and experience are simply dumb, ignorant, and you simply think you know it all. Either way if this discussion can't stay a discussion and less of an attack I'm done with it as what I've been talking about has already been backed up by links and experience from others. You are the only one reporting what you are talking about.....but anyways.

Even better to Quote Todd Hebert:

I stand corrected - I didn't think that you could install a 295 in a x4 connected slot. However our application is compact and would run within the gpu/frame buffer and would not be the same as say a game with a high transfer of textures - you would see a performance hit there for sure.

Todd


From this thread, but I guess the guy who holds the record doesn't know what he's talking about either. And Todd does know his stuff. He gets it.
Traveling through space at ~67,000mph!
ID: 1067429 · Report as offensive
hbomber
Volunteer tester

Send message
Joined: 2 May 01
Posts: 437
Credit: 50,852,854
RAC: 0
Bulgaria
Message 1067436 - Posted: 16 Jan 2011, 21:25:20 UTC - in response to Message 1067429.  
Last modified: 16 Jan 2011, 21:41:24 UTC

Yeah, anyway. I provide the numbers, you provide the theory.
All the readers can pick whatever they think is relevant.
The weak part is that GENERAL theory explains the world in GENERAL categories and approximate quantities and probabilities.

An example:
If I haven't tested, is running 3 units on 470 faster than running one, I would have bean mislead that this is true is any case, by reading GENERAL statements in forum, widely spread around. And its not true for XP. Its wasn't mentioned anywhere so far.

Its simple test, I know, but measuring PCIe performance with actual hardware is not 10 minutes work. So quit using GENERAL words, which define measurable values. "How much this car costs?", "Well, around 50K $, not sure...", "Here u are, 55K...". You ain't gonna do it, rite?
So long for general stuff. Get down to numbers, put them on table, let the readers decide are they good for them or not. Providing your opinion, without giving them base for comparison and personal assessment, it is delusive.

I can show at least 4 cases from last week, where u wrote absolutely wrong statements, which can be proven easily as such. I exclude current discussion. Wanna list them here?

Leave Todd alone, if he has something say, he can say it, providing some other information, proving me wrong. Don't involve him into potentially unwanted discussion.

And, "amazingly", I'm more astonished by work done by Reimster, Jason or Fred, than anyone holding any record in calculations(Sorry to say it, but it was brought to subject)
ID: 1067436 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1067441 - Posted: 16 Jan 2011, 21:47:05 UTC - in response to Message 1067436.  
Last modified: 16 Jan 2011, 21:58:28 UTC


.......let the readers decide are they good for them or not. Providing your opinion, without giving them base for comparison and personal assessment, it is delusive.


Posting links to sites who have done the testing, and to the people who wrote the standards is far from personal assessment. I think you should read the thread, as most people posting links and comments have not been eye to eye with you.


I can show at least 4 cases from last week, where u wrote absolutely wrong statements, which can be proved easely. I exclude current discussion. Wanna list them here?


Oh really? You can post 4 cases where I have been wrong, corrected, and I didn't acknowledge that I was wrong? wow dude sinking lower and lower. YAR MATIES! Shiver me timbers we got real man ready to fight!

I've never said I was 100% right, everyone makes mistakes. One example would be in one thread where I said something to the effect of, "1155 cpus fit in 1156 motherboards if I'm not mistaken" and I was wrong and somebody linked me. I then acknowledged it and realized it was the heat sinks that carried over and I simply lost track of the facts. But the difference in those conversations versus these, is those people have a thing called tact. Obviously something you are missing. However I do find it funny you are keeping track of my posts, and it's nice to know you're learning. Hard to break a bad habit huh?

Your numbers were interesting and I applaud you for your efforts. But the simple matter of fact is 1-3% difference in work unit times could come from anything, cause guess what, computer don't run 100% identical one to the other all the time *GASP*. Bottom line is you can skew these numbers to match your agenda. I could say 1-3% here is showing only a 9-34 second difference in completion times, oh my how earth shattering! However you can also say but yeah over the course of a year your save........you get the point. Yes there appears to be a difference somewhere on your machine on a 1x vs 4x slot comparison, using a video card on an undoubtedly out dated bottlenecked setup.

From the number 1 man in the show.........


I stand corrected - I didn't think that you could install a 295 in a x4 connected slot. However our application is compact and would run within the gpu/frame buffer and would not be the same as say a game with a high transfer of textures - you would see a performance hit there for sure.

Todd


I'm not bringing Todd into this discussion, as you say. I'm just quoting his opinion on it which I agree with, because obviously I don't know what I'm talking about. Just because you don't agree with someone doesn't automatically make them wrong. I don't agree with you at all but, this all started again because I showed by YOUR numbers that there where no real differences. What's the point of proving numbers when we have numbers, that you provided, that agree with me, and you don't like of understand your own results. Sad state of affairs gentlemen.

As I said this is my last post, conversations that turn into arguments especially without facts, ie you, are no fun it's like breaking glass bare handed. Give me a sampling bigger than your outdated bottleneck equipment showing some real results and we can talk. I know what my machines do and they are showing the same numbers as yours. My dual card machine crunches just as fast as it always has on the video cards going from 16x to 8x/8x. Sorry it's just fact. Besides how would you explain being able to finish work faster if you overclock the cards....humm.....
Traveling through space at ~67,000mph!
ID: 1067441 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : PCIe speed and CUDA performance


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.