i7 Hyperthreading + GPU + Most Efficient Seti Workload

Message boards : Number crunching : i7 Hyperthreading + GPU + Most Efficient Seti Workload
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1148117 - Posted: 2 Sep 2011, 4:24:34 UTC
Last modified: 2 Sep 2011, 5:08:11 UTC

Im starting an interesting experiment with Seti on my overclocked hyperthreading Intel Sandybridge i7 2600K with nVidia 580 GPU under Mandriva 2011 Linux.

Firstly, I will say that this processor leaves my older Core 2 Duo in the dust!

My goal is to have maximum throughput on the CPU and GPU without dramatically affecting using the PC.

So the Intel i7 2600K 2.8Ghz is an overclockable CPU - I have mine mildly overclocked to 4.6Ghz using an updated air cooler/heatsink. So this should make for decent performance.

The i7 chip has two special features, namely Turbo Boost and Hyperthreading.
Essentially Turbo Boost means that for short burst of time the CPU can run at a higher clock speed, i.e. from 3.4Ghz to 3.8Ghz - in standard form. Thus improving short term performance. As to how much it will boost a long term load, like Seti, I think will in part be dependent on the how much cooling the CPU receives.

Hyperthreading means that to the operating system each CPU core looks like two CPU cores! So the question here is can a core have better throughput if it is 'overloaded' with up to two tasks compared to running a single seti task? Keep in mind that there is the possibility that parts of the Seti processing task may not use the CPU as much, allowing the full use of it by a second seti task on an overloaded core.

Then there is the third part of the equation - the nVidia 580 GPU - it still requires some CPU processing to send work to/from it. If I overload the CPU (For example by running 8 instances of Seti on it) will it 'starve' the Seti GPU/CPU tasks so the video card doesn't process as much as it can?

From previous tests I wont be running more than 2 instances of Seti on my GPU, this ensures I have enough video ram for working, playing games etc.

When I originally set up this PC I ran just one Astropulse task as I built the system, and noticed that the work unit completed in just 4 hours (That gave me a shock compared to its usual 14 to 16 hours!). I watched as the task 'bounced between hyperthreads and core's - each one Turbo Boosting as it did.

So the experiment becomes do I run 3, 4, 6 or 8 Seti tasks to get maximum combined Seti throughput?

I plan to run the experiment like so:

Run with 3 seti CPU instances running with 2 GPU instances and wait until I have a solid performance level established (its still climbing atm - currently around 2600 per day)

Then once its held at the new level for 2 weeks I increase the Seti CPU tasks to 4 and see at what level that holds steady at for 2 weeks.

Then I will increase it to 6 Seti CPU Tasks and see where that holds steady at.

Then I will increase it to 8 Seti CPU Tasks and see where that holds steady at.

I am guessing this process will take between 2 and 4 months. Luckily my 'on the computer time" is relatively the same day to day so that will reduce margin for errors on that side.

So without further a due, let the experiment begin :)

Cheers

Mark
ID: 1148117 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1148118 - Posted: 2 Sep 2011, 4:37:53 UTC
Last modified: 2 Sep 2011, 4:38:31 UTC

In listed form this experiment easely can take longer.
Even if your own up-time steady, we can't say the same about project servers.
Moreover, there are task fluctuations and weird credit-awarding behavior per se.
In general, currently RAC gives _very_ rough estimation of host performance.
And you want to catch quite fine effects of hyperthreading influence.

That is, you can obtain better results if you will log all non-overflowed tasks, separate them by AR (for MB and by blanking % for AP), compute mean and standart deviation for each quite small AR bin and then compare these "performance vs AR" results through different host configs.

For good statistics it still can take few months but after you will be able to present much more confident results than RAC-based ones.
ID: 1148118 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1148124 - Posted: 2 Sep 2011, 5:21:23 UTC

Good luck with your experiments, and look forward to any insights you post as you go.

Meow!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1148124 · Report as offensive
Lee Gresham
Avatar

Send message
Joined: 12 Aug 03
Posts: 159
Credit: 130,116,228
RAC: 0
United States
Message 1148227 - Posted: 2 Sep 2011, 14:47:04 UTC - in response to Message 1148117.  

I built an i7 960 system in July 2010 with a Tesla card and plenty of ram. I run all 8 processor threads at 100% along with the Tesla. It's running Lunatics Seti enhanced. I've seen no conflicts or resource problems and other programs load quickly when needed. CPU tasks take around 3 hours & 20 minutes and cuda tasks take about 15 minutes each. I've tried disabling HT and running the 4 cores and saw a decrease in RAC and went back to HT & all 8 If you have enough memory, turn it loose!
Delta-V
ID: 1148227 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1148231 - Posted: 2 Sep 2011, 14:52:21 UTC - in response to Message 1148118.  

In listed form this experiment easely can take longer.
Even if your own up-time steady, we can't say the same about project servers.
Moreover, there are task fluctuations and weird credit-awarding behavior per se.
In general, currently RAC gives _very_ rough estimation of host performance.
And you want to catch quite fine effects of hyperthreading influence.

That is, you can obtain better results if you will log all non-overflowed tasks, separate them by AR (for MB and by blanking % for AP), compute mean and standart deviation for each quite small AR bin and then compare these "performance vs AR" results through different host configs.

For good statistics it still can take few months but after you will be able to present much more confident results than RAC-based ones.

Do you use a tool to log results on machines in some way? Often I wish I had something so I could log the results over time and see how different configurations work out on different machines. I think it was SetiSpy that I used to use in the classic days that would do this for me. It would make a neat little graph of AR & processing time as well, but if I had something logging the data I could easily make excel or something generate a graph for me.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1148231 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1148247 - Posted: 2 Sep 2011, 15:35:11 UTC - in response to Message 1148231.  

I wrote Perl script for than long time ago...
ID: 1148247 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1148866 - Posted: 4 Sep 2011, 12:39:03 UTC - in response to Message 1148227.  

Hi Lee

Thanks for your feedback. I too haven't had any problems running 8 seti tasks and a couple via CUDA on the 580. However I am not certain that overloading the cores like that result in optimal performance - perhaps it does - but maybe not too, That's why I am doing the experiment.

Its also why I am waiting for the RAC result to be fairly stable for 2 weeks so that I can reduce the possibility of server/work unit/availability etc fluctuations. I'm working on the theory that the CUDA card is processing enough work units of various sizes that any large variances will be swallowed into the whole. (It chops through around 8 CUDA work units an hour, around 100 per day, so 1,400 over a 2 week period.

I am using a similar rational with the cpu seti/astropulse work tasks. Its not totally scientific as I am not breaking it down like Raistmer suggests (I don't have the time to analyse to that degree.)

I also did notice that freeing up one physical core did seem to speed up CUDA processing - that may have been related to smaller tasks sizes as well - but I don't know - hence another reason for the experiment.

Who knows, I may find that there is little difference between running 5 and 8 CPU bound seti tasks. Or perhaps the increase is marginal - say 10%. But I don't know. So I am starting with 3 CPU tasks and ramping up when that 'flat lines'. Currently I am about the tickle the 6000 RAC point. (Six times better than my old PC *laughs*) - and its still accelerating. My guess is that it will flat line at least around the 14,000 RAC point. (Well I hope so anyway lol)

Anyway pop back here from time to time and hopefully I will have some actual measurements. Then once we find the best two settings (assuming they aren't close together setup wise) I can do a battle between those and see which test result holds true for a longer period of time.

Anyway this should be fun :-)

Thanks all for the interest :)

Cheers

Mark
ID: 1148866 · Report as offensive
Brkovip
Avatar

Send message
Joined: 18 May 99
Posts: 274
Credit: 144,414,367
RAC: 0
United States
Message 1148876 - Posted: 4 Sep 2011, 13:32:08 UTC

My system i7 running from 3.8 to 4.2G with 2X GTX480 (overclocked) running 3 tasks per, before the 280 series of drivers, would hit 60K RAC. I would think your system with it optimized should hit at least 25K RAC.

I am not sure what mine will do with the 280 drivers, running 280.26 now, being it hasn't been cool enough for long enough to run the system without fighting the AC in the house. There has also been a lot of issues getting enough tasks to run a good test.
ID: 1148876 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1150236 - Posted: 8 Sep 2011, 22:37:15 UTC

Progress Update

So far the average RAC is ramping upward and about to hit the 10,000 point. It still may take another week to level off as it appears that the PC is processing around 20K work units per day, give or take 2K.

It really reels weird to be processing so much - thats the sort of figure I would normally generate on the old PC over 2 weeks, or more!

Cheers

Mark
ID: 1150236 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1150255 - Posted: 9 Sep 2011, 0:02:08 UTC - in response to Message 1150236.  

Progress Update

So far the average RAC is ramping upward and about to hit the 10,000 point. It still may take another week to level off as it appears that the PC is processing around 20K work units per day, give or take 2K.

It really reels weird to be processing so much - thats the sort of figure I would normally generate on the old PC over 2 weeks, or more!

Cheers

Mark

Hmm, you started the experiment about September 2 so six days so far? The averaging mechanism for RAC is exponential with a design "half-life" of exactly 7 days, meaning whatever move it's making gets half its change in the first week, half the remaining change in the second week, etc. Without even considering the effect of waiting for validation, the levelling off will take about 3 more weeks. At that point it should be within the perhaps +/- 10% band caused by the simple fact that the AR related crunch time variance is not precisely reflected by the estimates which the splitters produce.

Given that your starting post indicated RAC 2600, the change so far is about 7400. Certainly that's not out of line for an eventual 20000 RAC, but the kind of precision you're going to need to draw a reliable conclusion about the best mix suggests your original idea of two weeks of observation after it's more or less stabilized will be necessary. So that's about 6 weeks in total for each test...

Just to be clear, the machine isn't processing "20K work units per day, give or take 2K". Those are credits, not WUs. I'm sure you know the difference, but some readers of the thread might not.
                                                                 Joe
ID: 1150255 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1150293 - Posted: 9 Sep 2011, 2:32:45 UTC - in response to Message 1150255.  

Hi Josef,

Yes, you are right, and I did mistype work units instead of credits - thats what doing a report while still waking up and eating breakfast will do lol.

I expect the RAC to start flattening out after the next 7 to 14 days. I also now expect any changes that appear as I change configurations to take between 2 and 4 weeks to be reflected in the RAC and then at least 2 weeks to stabilise after that point.

Aint experiments fun? :)

Cheers

Mark
ID: 1150293 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1152714 - Posted: 16 Sep 2011, 11:37:30 UTC - in response to Message 1150293.  

Update 16/9/11

Current RAC now up to 12,500 and still climbing. It may take a bit longer than another week (probably two) to level off.

The experiment continues :-)

Cheers

Mark
ID: 1152714 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1154373 - Posted: 21 Sep 2011, 6:10:35 UTC - in response to Message 1152714.  
Last modified: 21 Sep 2011, 6:11:20 UTC

Update - Mysterious PC crashes

OK this is an odd one that I didn't expect. My 5 week old PC is starting to crash (full reset), but it only appears to be happening if the seti CUDA client is running. Though nothing has changed on that front.

I decided to install the latest nvidia drivers - that didn't help.

The nvidia 580 is not hot (hovering around 58c to 61c)

So at this point I am not sure whats going on..... can work units cause CUDA to freak and cause a crash I wonder?

hmmmm
ID: 1154373 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1154444 - Posted: 21 Sep 2011, 11:18:51 UTC - in response to Message 1154373.  

Is there a purpose to running (outdated) x34?
If not, may I suggest updating to x38g (via installer)? Not much use trying to hunt errors in something that was in alpha half a year ago.

Is the GPU (factory) overclocked? I don't know when the code started to push too hard. (i.e. beyond what is used to determine safe overclock). In that case it may help to go down a notch or two.
ID: 1154444 · Report as offensive
Wembley
Volunteer tester
Avatar

Send message
Joined: 16 Sep 09
Posts: 429
Credit: 1,844,293
RAC: 0
United States
Message 1154459 - Posted: 21 Sep 2011, 12:25:43 UTC - in response to Message 1154373.  

Make sure your PSU can handle the requirements of a GPU running full load.

ID: 1154459 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1154468 - Posted: 21 Sep 2011, 12:50:47 UTC - in response to Message 1154444.  
Last modified: 21 Sep 2011, 12:57:27 UTC

Is there a purpose to running (outdated) x34?
If not, may I suggest updating to x38g (via installer)? Not much use trying to hunt errors in something that was in alpha half a year ago.

Is the GPU (factory) overclocked? I don't know when the code started to push too hard. (i.e. beyond what is used to determine safe overclock). In that case it may help to go down a notch or two.


That host is running Linux 2.6.38.7-desktop-1mnb2, so saying run the Installer and install x38g is nonsense, maybe he can get a more upto date Linux Alpha app from the originator of the app,

Claggy
ID: 1154468 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1154471 - Posted: 21 Sep 2011, 13:04:54 UTC - in response to Message 1154468.  
Last modified: 21 Sep 2011, 13:05:44 UTC

Is there a purpose to running (outdated) x34?
If not, may I suggest updating to x38g (via installer)? Not much use trying to hunt errors in something that was in alpha half a year ago.

Is the GPU (factory) overclocked? I don't know when the code started to push too hard. (i.e. beyond what is used to determine safe overclock). In that case it may help to go down a notch or two.


That host is running Linux 2.6.38.7-desktop-1mnb2, so saying run the Installer and install x38g is nonsense, maybe he can get a more upto date Linux Alpha app from the originator,

Claggy


Oops. That is indeed a reason. Thanks Claggy, I failed to spot that, when I checked what was running - stderr looks almost the same as in windows version, being essentially from the same source.

Point about the overclocking remains though - I don't think that would be a matter of Linux/Windows OS, even if drivers behave slightly differently.

Would only be a slightly newer version atm, but might help.
ID: 1154471 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1154487 - Posted: 21 Sep 2011, 14:10:47 UTC - in response to Message 1150293.  

Don't know if this helps, as I run dual quadcore AMDs and quad GT 240s on my two crunchers, but check the CPU time vs. elapsed time on your CPU WUs. I found that running on only 7 cores raised the percent of CPU time/elapsed time from 70-75% to 85-90%, so the net work done was about the same.

Perhaps running the 4 GT 240s interferes with CPU execution when running CPU on all cores?
ID: 1154487 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1154713 - Posted: 21 Sep 2011, 22:54:17 UTC - in response to Message 1154471.  

Update:

The graphics card is a stock Gigabyte GTX 580. It is running at its defaults, no overclocking. Its never gone beyond 63C, and thats with the fan just ticking over at 55%.

The power supply is a Seasonic Gold + 650W. Its claimed to be 91% efficient at 50% load. I have an external power meter measuring the systems power use - which under full SETI load on CPU & GPU is just 350W. So lets assume the power supply is only 80% efficient, that means that its got 520W to play with. So unless the power supply is faulty the systems demands on it shouldn't be taxing it in any way.

I have run Memtest86 on the PC for a few hours - no problems found. (Still makes me wonder if Video Card ram may be an issue)

I have just left the PC run overnight, with 8 seti tasks/threads running - CPU temperature didn't go above 52c, and as importantly the PC didn't reboot. So its definitely a GPU issue.

I was running a more recent seti/CUDA beta for the last 3 weeks, but dropped back to a known stable release to see if that made a difference - but it didn't.

I may try on the weekend to reduce the factory clock on the GPU. But for now I will have to turn the GPU processing off as this is also a work machine and having it reboot every 40 min is both annoying, and could lead to loss of the data I am working on.

It is frustrating however as the system has run fine for 3 weeks with the same combination that is now seeing it crash .... odd odd odd.

Cheers

Mark
ID: 1154713 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1154725 - Posted: 21 Sep 2011, 23:33:15 UTC - in response to Message 1154713.  

With such a system, the PSU can quickly become a problem, instabillity is often
a sign, the +12V rail(s), with 2 GTX580s and an i7 xxx, are overloaded or the 12V
voltage is dropping just too much.
Heat also can be an issue.

My i7-2600 and 2 ATI 5870 GPU, a 650Watt PSU just stopped, while 2 Milkyway
WU were started. (It needs 480 -550Watt, according to a Kill-a-Watt)
After changing this for a 1000Watt 4 12V rails, 2 18A and 2 15A, ofcoarse 5V, 3.3V
+ and -. It works fine.
And numerous other faults.


ID: 1154725 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : i7 Hyperthreading + GPU + Most Efficient Seti Workload


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.