Bulldozer vs Vishera vs Sandy Bridge(small comparison)


log in

Advanced search

Message boards : Number crunching : Bulldozer vs Vishera vs Sandy Bridge(small comparison)

Previous · 1 · 2
Author Message
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3291
Credit: 40,949,708
RAC: 59,887
Russia
Message 1331043 - Posted: 25 Jan 2013, 4:53:08 UTC - in response to Message 1330270.


Raistmer, if your Intel OpenCL application can run on Sandy Bitch too, I'm ready to test it. You may contact me on PM with details any time.

Unfortunately, not. Ivy Bridge is the first Intel "APU" with OpenCL support AFAIK.
Sandy Bridge has non-OpenCL GPU in it.

____________
News about SETI opt app releases: https://twitter.com/Raistmer

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3291
Credit: 40,949,708
RAC: 59,887
Russia
Message 1331044 - Posted: 25 Jan 2013, 4:55:32 UTC - in response to Message 1330270.
Last modified: 25 Jan 2013, 4:56:16 UTC

For example, this run, from first post, with 12 simultaneous tasks, where HT efficiency is 16.5%, when I ran it second time, gave me even worse result than 2 sets of 6 tasks, something like "negative HT efficiency".

AstroPulse processing is very processor cache hungry. I think that kills HT advantages because of cache contention between processes.
____________
News about SETI opt app releases: https://twitter.com/Raistmer

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3291
Credit: 40,949,708
RAC: 59,887
Russia
Message 1331045 - Posted: 25 Jan 2013, 4:58:38 UTC - in response to Message 1330203.
Last modified: 25 Jan 2013, 4:59:39 UTC

In that spirit, any chance of having Intel built-in GPUs to work for SETI? You are our only hope :)


Yes, already. But I need testers. So far it works only on my own Ivy Bridge :)


:) nice ! When the new Haswell processors comes. I think the gpu inside the cpu can crunch something additional like a bonus :D


Unfortunately, Intel OpenCL driver exibits same behavior as latest ATi and NV ones. GPU load drops considerably if CPU full busy.
Still evaluating is it worth to free 1 CPU core to run GPU AP or not.

P.S. I hope AP for intel gpu will be available for beta testing soon, now beta test package preparation goes.
____________
News about SETI opt app releases: https://twitter.com/Raistmer

Profile Ex
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 12 Mar 12
Posts: 2892
Credit: 1,564,913
RAC: 1,265
United States
Message 1331046 - Posted: 25 Jan 2013, 5:00:42 UTC - in response to Message 1328357.
Last modified: 25 Jan 2013, 5:04:19 UTC

You should run it again using an AVX optimized App on the Sandy Bridge. AVX is new with Sandy Bridge and appears to be killer, Intel® AVX is a new-256 bit instruction set extension to Intel® SSE and is designed for applications that are Floating Point (FP) intensive. Look at the AstroPulse times from this client, All AstroPulse v6 tasks for computer 5643864 He appears to be running the Linux AVX App from here, AstroPulse for Linux That's about 3.5 hours for a CPU AP. That is Fast, about 3x as fast as the last Windows SSE2 CPU AP App from Lunatics, r557. I had never run r557 so I gave it a try after seeing those numbers, I'm disappointed it's so much slower than the AVX. It almost makes me want to build a new machine that can use AVX. Or at least buy another AMD 6850 that my present machines can use.

Anyone know of a Windows SSE4 CPU AstroPulse App? I can at least run that on a Xeon. I'm envious of those AstroPulse CPU times.


AVX is killer, I'm running AVX linux apps on my server's Xeon E3-1230 (sandy bridge). My RAC of 1700 doesn't seem like much, but it's only 25% use of only one of 4 cores. :-) (note, setting the app to 25% usually results in 33% use)

(Note my RAC is rising now because I bumped up my crunch time/cores earlier)
____________
-Dave #2


3.2.0-33

Mark Lybeck
Send message
Joined: 9 Aug 99
Posts: 209
Credit: 84,297,625
RAC: 112,770
Finland
Message 1333291 - Posted: 31 Jan 2013, 21:08:31 UTC - in response to Message 1331045.


Unfortunately, Intel OpenCL driver exibits same behavior as latest ATi and NV ones. GPU load drops considerably if CPU full busy.
Still evaluating is it worth to free 1 CPU core to run GPU AP or not.

P.S. I hope AP for intel gpu will be available for beta testing soon, now beta test package preparation goes.


Does not process priority take care of giving the GPU loader process enough CPU cycles to feed it?

____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3291
Credit: 40,949,708
RAC: 59,887
Russia
Message 1333354 - Posted: 31 Jan 2013, 23:26:31 UTC - in response to Message 1333291.


Unfortunately, Intel OpenCL driver exibits same behavior as latest ATi and NV ones. GPU load drops considerably if CPU full busy.
Still evaluating is it worth to free 1 CPU core to run GPU AP or not.

P.S. I hope AP for intel gpu will be available for beta testing soon, now beta test package preparation goes.


Does not process priority take care of giving the GPU loader process enough CPU cycles to feed it?


For discrete ATi GPUs answer was "no". Looks like Windows still not too RTOS to obey process priority settings too much.
Still to be tries with Intel GPU.

BTW, app already available for testing. But take care, looks like it quite imprecise (at least on my host).

____________
News about SETI opt app releases: https://twitter.com/Raistmer

Tom
Send message
Joined: 12 Aug 11
Posts: 114
Credit: 4,566,097
RAC: 0
United States
Message 1333399 - Posted: 1 Feb 2013, 3:12:18 UTC
Last modified: 1 Feb 2013, 3:19:26 UTC

Does not process priority take care of giving the GPU loader process enough CPU cycles to feed it?


NO!

I have IMHO the BOBW right now using Jason's Cuda 5 MB's with the stock 6.04
AP's

The stock 6.04 AP's take 40 minutes on my GTX660 when I allocate 1 CPU per AP task and takes 80 minutes when I have something else using that CPU.

I also haver all astropulse and MB's using Elevated priority all the time
in all cases using EfMer's Priority program makes no diff for the OpenCL
Jobs. All Open CL jobs on my GTX660 need a full Core to run the quickest.

Bill

PS Thanks Claggy for your app_config to allow a full core on my i5 whenever
a Astropulse runs as I can use that core for einstein or Test4Theory when I am not running an Astropulse

In other words Jason's non OpenCL MB's only need a priority boost to function
optimally

Mark Lybeck
Send message
Joined: 9 Aug 99
Posts: 209
Credit: 84,297,625
RAC: 112,770
Finland
Message 1333850 - Posted: 2 Feb 2013, 5:58:30 UTC - in response to Message 1333354.

BTW, app already available for testing. But take care, looks like it quite imprecise (at least on my host).


Where did you upload the app? on Lunatics?

____________

Mark Lybeck
Send message
Joined: 9 Aug 99
Posts: 209
Credit: 84,297,625
RAC: 112,770
Finland
Message 1333853 - Posted: 2 Feb 2013, 6:04:12 UTC - in response to Message 1333354.


Unfortunately, Intel OpenCL driver exibits same behavior as latest ATi and NV ones. GPU load drops considerably if CPU full busy.
Still evaluating is it worth to free 1 CPU core to run GPU AP or not.


For discrete ATi GPUs answer was "no". Looks like Windows still not too RTOS to obey process priority settings too much.
Still to be tries with Intel GPU.


So if you have Nvidia card the performance should not be so different then? What I experienced is that the priority of the CPU tasks run on Low and the processes feeding the GPU runs on below average. The CPU tasks will always give way to the GPU apps. I have not found significant speed difference on Intel CPU + Nvidia GPUs depending on whether CPUs are maxxed out.
____________

Profile Ex
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 12 Mar 12
Posts: 2892
Credit: 1,564,913
RAC: 1,265
United States
Message 1333868 - Posted: 2 Feb 2013, 6:32:15 UTC - in response to Message 1333853.


Unfortunately, Intel OpenCL driver exibits same behavior as latest ATi and NV ones. GPU load drops considerably if CPU full busy.
Still evaluating is it worth to free 1 CPU core to run GPU AP or not.


For discrete ATi GPUs answer was "no". Looks like Windows still not too RTOS to obey process priority settings too much.
Still to be tries with Intel GPU.


So if you have Nvidia card the performance should not be so different then? What I experienced is that the priority of the CPU tasks run on Low and the processes feeding the GPU runs on below average. The CPU tasks will always give way to the GPU apps. I have not found significant speed difference on Intel CPU + Nvidia GPUs depending on whether CPUs are maxxed out.

I am curious about this as well. If I could be convinced that I don't need a core free, I would fill all my cores and let boinc fight itself to feed the GPU.
As it stands my Nvidia card is not anything special but I leave a core free anyways even though it's more than likely unnecessary.
____________
-Dave #2


3.2.0-33

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3291
Credit: 40,949,708
RAC: 59,887
Russia
Message 1333875 - Posted: 2 Feb 2013, 7:57:23 UTC - in response to Message 1333850.

BTW, app already available for testing. But take care, looks like it quite imprecise (at least on my host).


Where did you upload the app? on Lunatics?

http://lunatics.kwsn.net/12-gpu-crunching/open-beta-for-intel-opencl-ap-application.msg51109.html;topicseen#msg51109
____________
News about SETI opt app releases: https://twitter.com/Raistmer

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3291
Credit: 40,949,708
RAC: 59,887
Russia
Message 1333877 - Posted: 2 Feb 2013, 8:10:23 UTC - in response to Message 1333853.


So if you have Nvidia card the performance should not be so different then? What I experienced is that the priority of the CPU tasks run on Low and the processes feeding the GPU runs on below average. The CPU tasks will always give way to the GPU apps. I have not found significant speed difference on Intel CPU + Nvidia GPUs depending on whether CPUs are maxxed out.


Situation is quite complex.
First of all, CUDA and OpenCL differs. CUDA has native control on synching mode. OpenCL hasn't. Few proposed tricks to make OpenCL app swtich its synching mode did not have any effect on performance.
2) Synching behavior differs between drivers used. Old enough drivers (OpenCL 1.0 support), up to 267.xx don't need free core. Performance good with full CPU busy.
But after that looks like something imortant (allegedly default synching mode) was changed. Full busy CPU now can't provide adequate GPU load. Moreover (this I observed mostly on ATi GPUs) time to time whole task can be completed almost as fast as with idle CPU but again, time to time task execution times increases in FEW TIMES. Looks like time to time task stuck in its processing (with full loaded CPU). It never happens with free CPU. On my ATi GPU (HD6950) I have statistics of few hundreds of tasks (look corresponding threads with performance pictures I posted) so quite convinced that core freeing is nessesary. On NV GPU I just use old drivers that don't need free cores ("If It Ain’t Broken, Don’t Fix It" approach) so any NV observations about how many free cores needed come from third persons. But yes, once I updated to more recent drivers and saw unstable processing on own GPU (GTX260) so staying with old drivers has the reason.
____________
News about SETI opt app releases: https://twitter.com/Raistmer

Previous · 1 · 2

Message boards : Number crunching : Bulldozer vs Vishera vs Sandy Bridge(small comparison)

Copyright © 2014 University of California