geforce 260 question

Message boards : Number crunching : geforce 260 question
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
zpm
Volunteer tester
Avatar

Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,659,024
RAC: 0
United States
Message 921811 - Posted: 27 Jul 2009, 23:49:13 UTC - in response to Message 921806.  

the only other thing that would be an i7 would a dual i7 or 5500+ processor series, with a lot of gpus..
ID: 921811 · Report as offensive
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 921812 - Posted: 28 Jul 2009, 0:10:20 UTC

The problem I had was boinc.exe takes nearly a whole core when a couple day cache is run with the multi 295 etc rigs.
While getting flops and cache settings etc dialed I once had nearly 1000 tasks in cache. Boinc.exe used so much CPU as to make the manager unable to function and computer lagged hard. (What is default process priority for boinc.exe?) 'Responsiveness' is at 95% now; it would sit at 5-20% with boinc.exe pegged with >~800 tasks.
I have since got it to only fetch about 200-400 tasks at a time and this makes everything work..well,.. like a computer lol


"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 921812 · Report as offensive
zpm
Volunteer tester
Avatar

Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,659,024
RAC: 0
United States
Message 921821 - Posted: 28 Jul 2009, 0:31:18 UTC - in response to Message 921812.  

the best cache i've found on my quad would be no more than 600 lines in the bm.
ID: 921821 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 921925 - Posted: 28 Jul 2009, 11:34:30 UTC - in response to Message 921821.  
Last modified: 28 Jul 2009, 11:38:06 UTC

Does the "Write to disk at most every '60' seconds" setting have any impact on this? Does BOINC.exe hit a magic number of tasks whereby it spends all it's time making updates and comes back round to the beginning again with no free time? (Where # tasks > ~600)

What happens if it can't complete all updates before the default 60 seconds is up - some sort of gridlock?
GPU Users Group



ID: 921925 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 921930 - Posted: 28 Jul 2009, 11:46:34 UTC

Some versions back there were a couple of changes to help reduce the BOINC overhead.

1. Not write to client_state (now has some seperate files flying around).
2. Reduced the update frequency to be something like Checkpoint interval x No of cores.

Were your i7 observations done using a fairly recent BOINC client or was it some versions ago?
BOINC blog
ID: 921930 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 921954 - Posted: 28 Jul 2009, 14:56:25 UTC - in response to Message 921925.  

Does the "Write to disk at most every '60' seconds" setting have any impact on this? Does BOINC.exe hit a magic number of tasks whereby it spends all it's time making updates and comes back round to the beginning again with no free time? (Where # tasks > ~600)

What happens if it can't complete all updates before the default 60 seconds is up - some sort of gridlock?


As I understand it, the "write to disk" setting has been modified in BOINC v6.6.x in that it spreads the writes around based upon the number of cores. For example, a quad-core system with a default "write to" will take a total of 4 minutes for all four applications to save to disk. An 8 core system will take a total of 8 minutes for all four cores to write to disk with the default setting.
ID: 921954 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 922012 - Posted: 29 Jul 2009, 0:09:08 UTC - in response to Message 921954.  

All very interesting. I just checked on two of my machines and hadn't twigged they both have about 1000 WUs each na both set to "write to disk = 60 secs".

i7 has 8 active threads, all threads processing MB and 1 GTS260.
Q6600 quad has 4 active cores, all threads processing MB and 1 GTS275.

Q6600 normal running each CPU core/thread is running MB at 23-35%, boinc.exe 0-2% boincmgr.exe 0-2%, CUDA task0-2%

i7 pretty much the same but 12-13% each CPU running MB

All other processes pretty well 0%

If I suspend a CUDA task, from cold to start processing takes ~20 seconds and grabs a whole core/thread to get loaded but then drops back to 0-2%.

I have just reduced the Q6600 to three cores and it now has ~25% system idle so the GPU isn't trying to use the total capacity of one core.

I reduced the i7 to 6 out of 8 threads and it now has ~25% idle time so the GTS260 also isn't trying to use the total capacity of the 2 threads.

All of this is not optimal as task switching will occur between tasks when the CPU core/threads are shared.

I'm not seeing any odd behaviour / sluggishness etc (I only ever got that when the GPU was running VLAR tasks) or peaks of boinc.exe.

I don't have any multi GPU machines so don't know what happens in that case but it sounds from what I'm reading here as though things don't scale well?

This is all with BOINC 6.6.37, AK SSSE3X and SSE41 and stock CUDA app.

The quad has a RAC of ~8000 and i7 ~8600 at present only usually running about 12hrs per day so in theory 16000 and 17000 per day if 24/7.

On the Q6600 say the new MB tasks take ~2 hrs on CPU and ~30 mins on GPU so in 2 hrs I complete 4 GPU and 4 CPU tasks so GPU is repsonsibe for 50%of RAC.

If I add another 3 GPUs this gives 3 * 8000 + 16000 = 40000 RAC per day (no overclocking of GPU / CPU)

It's late, this is all very approximate and I've probably made some silly error but I'm interested in trying moving a couple of GPUS to one machine (if I can find the crowbar) to see what happens to these figures.

GPU Users Group



ID: 922012 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 922147 - Posted: 29 Jul 2009, 15:22:49 UTC - in response to Message 922012.  

Back to my original point about VLARkill.

If GPU tasks are taking 15 mins each then you can process 96 in a 24 hour period.
If you get a bunch of shorties or if you use VLARkill then you are quickly going to exceed your daily quota of 100 per CPU and the GPUs will go idle.

My suggestion was based around keeping some CPU processing going so you can swap tasks around using "Reschedule" between CPU and GPU to avoid having to kill VLARs. On one machine with 1000 WUs currently over 200 of those are VLARS which have been rescheduled to CPU i.e. a whole processors worth in a quad machine with one GPU.

With no CPU processing your only option with VLARs if you don't want to crunch them is throw the away.

So it 'may' be worth sacrificing a small fraction of GPU processing to keep an extra buffer of CPU tasks which you can swap with GPU when you have a lot of VLAR tasks to help avoid running dry and so keep your RAC up.

These calcs will all change of course with the longer tasks but the principle remains the same.
GPU Users Group



ID: 922147 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 922163 - Posted: 29 Jul 2009, 16:21:16 UTC


I made some test with only 2 or 1 CPU task, but the GPU performance went also down.

To now I didn't used the 'rebranding tool'. AFAIK, I need to do it manually.. so nothing for me..
I like it that everything go automatically. ;-D


If you have a GPU installed, the 'Maximum daily WU quota per CPU 100/day' is disabled and only the 'Maximum daily WU quota per GPU 500/day' is used.

ID: 922163 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 922251 - Posted: 29 Jul 2009, 22:42:46 UTC - in response to Message 922163.  
Last modified: 29 Jul 2009, 22:43:50 UTC


I made some test with only 2 or 1 CPU task, but the GPU performance went also down.

To now I didn't used the 'rebranding tool'. AFAIK, I need to do it manually.. so nothing for me..
I like it that everything go automatically. ;-D


If you have a GPU installed, the 'Maximum daily WU quota per CPU 100/day' is disabled and only the 'Maximum daily WU quota per GPU 500/day' is used.


Hi Sutaru,

In what way does the GPU performance go down? I would like to understand this more to avoid causing myself headaches if I try to run multiple GPUs. If only small anyway I wondered if this would be worth it to avoid running out of WUs and thefore the GPU being idle?

The Reschedule 1.9 has an automatic mode so it will run when you want it to. I understand you wanting to do things automatically with the amount of work you are crunching!

Thanks, I didn't realise about the change in quota for GPU - this explains some things. So are you saying that on my quad I will get 500 for the GPU and 400 for the 4 CPUS i.e. 900 per day or just a total of 500 - I assume just 500? So it would still be possible with a fast GPU and lots of VLARs to run out of work? I think I have seen some of your messages saying sometimes you have troubles getting enough work to keep your monster cruncher active :-)

John.
GPU Users Group



ID: 922251 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 922298 - Posted: 30 Jul 2009, 0:36:08 UTC - in response to Message 922251.  

...
Thanks, I didn't realise about the change in quota for GPU - this explains some things. So are you saying that on my quad I will get 500 for the GPU and 400 for the 4 CPUS i.e. 900 per day or just a total of 500 - I assume just 500? So it would still be possible with a fast GPU and lots of VLARs to run out of work? I think I have seen some of your messages saying sometimes you have troubles getting enough work to keep your monster cruncher active :-)

John.

Sutaru may have been relying on some of my postings, I'm not totally sure they still apply. There was a post about an 8 core system with one GPU which was being limited by quota to 500 tasks/day, and at that time the source code was written so the quota times the number of GPUs multiplied by the <gpu_multiplier> value set in the project config.xml file was the total, except the old CPUs formula was used if there were no GPUs.

The source code has since been rewritten so it ought to be 100 * nCPUs + 500 * nGPUs, it's hard to tell whether that change is in use here or not. Any GPU which averages less than 172.8 seconds (86400/500) elapsed time per task might run into the quota, though, so perhaps Sutaru or Vyper can clarify by observation.

I thought the 16 core system... now running out of work thread indicated the CPUs were still not being fed if a GPU was present, but that may have been a bad guess.
                                                               Joe
ID: 922298 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 922369 - Posted: 30 Jul 2009, 8:04:52 UTC - in response to Message 922298.  
Last modified: 30 Jul 2009, 8:16:30 UTC

Thanks for clarifying the quotas Joe.

Sutaru,
I'm still not clear from your earlier post.

If you have a big WU cache.. you have very big [25 % CPU - 100 % Core] and long boinc.exe peaks in TaskManager.
Is this when you are running WUs on your CPU or none? If none how long for? When my WUs are running on CPU each core is running flat out at about 25% each (i.e. 100% for each core but windows divides by the number of cores) but this is expected.

I do not see big peaks in boinc or System activity but only have a cache 1/3 of yours.

If this CPU activity is when you are not running Seti CPU tasks, what processes are peaking all the cores to maximum?

Every ~ 5 sec. for ~ 5 sec.
This increase with much ULs/DLs.

I can understand all activity peaking while lots of UL/DL happening but isn't that not an unusual case. When things are running smoothly you would have infrequent upload / downloads which should not have a big system impact.
One exception to this could of course be the VLARkill - you download a task and almost immediately it is aborted and more downloading happens.

Also the 'System' in TaskManager have high, up to ~ 13 % CPU.
If the OS is busy this can briefly increase but should only be transient?

If I would crunch also on the CPU, this boinc.exe/System peaks [normal priority] disturb all which have lower priority. CPU and GPU tasks.
The Windows task switching will mean some tasks get a smaller slice of CPU. By the GPU do you mean the "Windows CUDA app"? This is only showing the amount of CPU being used to feed the GPU so is only very small anyway - or is on my system (0-2% of a core). Task switching has an overhead but your "Windows CUDA app" will probably task switch even if not running SETI tasks as it has no CPU affinity.

Because of this, the GPUs would idle/stop from time to time.
Do you mean that you see all your GPU tasks show as waiting and none of them running for a period of time - if so how long.

Old MB AR=0.44x WU on CPU = ~ 60 min.
Same WU on GPU was 6m:45s , now longer WU [same AR].. ~ 10 min.

On the CPU it should be now ~ 120 min.

This mean the GPU is 12 x faster than one CPU Core.
And this are 4 x OCed GTX260-216 with AMD Phenom II X4 940 BE @ 4 x 3.0 GHz.

Are these new figures for GPU tasks (10 mins) for the newer tasks or are they resent older short WUs or do you just have some shorties?

I am seeing GPU tasks completing in 6-10 mins but that is faster than they ran with the shorter WUs. Even with a 30% lift from CUDA2.3 this seems too fast for the longer WUs but I need to monitor for a few days to see some averages.
On another machine I am seeing 17-25 mins from a GTS250 (I am also running CUDA2.3 on all but one machine)

Sorry - so many questions ............

P.S. As another aside, do you run virus checker software and does this include the BOINC data directories?

[Edit]On the machine where I was seeing 10min CUDAs I just saw one at 21mins - I assume this is a new one.[/edit]
GPU Users Group



ID: 922369 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 922778 - Posted: 31 Jul 2009, 21:30:29 UTC
Last modified: 31 Jul 2009, 21:32:49 UTC


The 500 WUs/GPU are..

I had only 2 GPUs -> max. 1,000 WUs/day
Now with 4 GPUs -> max. 2,000 WUs/day




If you have a big WU cache.. you have very big [25 % CPU - 100 % Core] and long boinc.exe peaks in TaskManager.
Is this when you are running WUs on your CPU or none? If none how long for? When my WUs are running on CPU each core is running flat out at about 25% each (i.e. 100% for each core but windows divides by the number of cores) but this is expected.

I do not see big peaks in boinc or System activity but only have a cache 1/3 of yours.

If this CPU activity is when you are not running Seti CPU tasks, what processes are peaking all the cores to maximum?

Both, if CPU/GPU and only GPU - it depend how big WU cache.


Every ~ 5 sec. for ~ 5 sec.
This increase with much ULs/DLs.

I can understand all activity peaking while lots of UL/DL happening but isn't that not an unusual case. When things are running smoothly you would have infrequent upload / downloads which should not have a big system impact.
One exception to this could of course be the VLARkill - you download a task and almost immediately it is aborted and more downloading happens.


Yes, but with high WU cache and the normal unplanned server outages at Berkeley.. the boinc.exe peaks are long and often.


Also the 'System' in TaskManager have high, up to ~ 13 % CPU.
If the OS is busy this can briefly increase but should only be transient?


In the same time if boinc.exe have big (25 % CPU - 100 % Core) acivity, System have ~ the half activity.

Little strange.. if I reduce boinc.exe priority to 'lower as normal'.. the System have no peaks/activities.


If I would crunch also on the CPU, this boinc.exe/System peaks [normal priority] disturb all which have lower priority. CPU and GPU tasks.
The Windows task switching will mean some tasks get a smaller slice of CPU. By the GPU do you mean the "Windows CUDA app"? This is only showing the amount of CPU being used to feed the GPU so is only very small anyway - or is on my system (0-2% of a core). Task switching has an overhead but your "Windows CUDA app" will probably task switch even if not running SETI tasks as it has no CPU affinity.


The OS Windows (XP) isn't intelligent enough for to disturb tasks in their 'priority hierarchy'.
The TaskManger show well this.
For example with BOINC:
CPU tasks have 'low' priority.
GPU tasks have 'lower than normal' priority.
boinc.exe have 'normal' priority.


So - if boinc.exe have activity, CPU and GPU tasks are involved/disturbed.
Yes, the GPU get only CPU support, but if this is 0 % CPU - the GPU stop/idle.
And if you have high performance GPUs (GTX2xx series) this is very bad.


Because of this, the GPUs would idle/stop from time to time.
Do you mean that you see all your GPU tasks show as waiting and none of them running for a period of time - if so how long.


Sometimes the GPU wall clock calculation time was ~ 3 x slower for the same AR WU.


Old MB AR=0.44x WU on CPU = ~ 60 min.
Same WU on GPU was 6m:45s , now longer WU [same AR].. ~ 10 min.

On the CPU it should be now ~ 120 min.

This mean the GPU is 12 x faster than one CPU Core.
And this are 4 x OCed GTX260-216 with AMD Phenom II X4 940 BE @ 4 x 3.0 GHz.


Are these new figures for GPU tasks (10 mins) for the newer tasks or are they resent older short WUs or do you just have some shorties?


This are the new longer MB WUs.

A GTX260-216 should have the double performance than a GTS2xx.


My GPU cruncher is a 100 % pure crunching machine.
So no virus protection or others.
Of course - firewall (Windows XP).. ;-)


I hope I answered all your questions well.


If not.. we could continue in the new thread here?
'Best GPU performance'


ID: 922778 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : geforce 260 question


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.