GPU Processing Speeds

Message boards : Number crunching : GPU Processing Speeds
Message board moderation

To post messages, you must log in.

AuthorMessage
Tazarak_Ordinateur

Send message
Joined: 17 Oct 18
Posts: 4
Credit: 313,898
RAC: 0
United Kingdom
Message 1966660 - Posted: 22 Nov 2018, 21:32:54 UTC

Good evening, everyone. This is my first post here and I'm completely new to BOINC and SETI.

I wanted to ask a question, which might have an obvious answer. I've been trying to familiarise how best to optimize my GPUs for this project, and tonight I updated my app_info and app_config per recommendations in several threads here. These attempts were possibly ham-fisted and not fully understood.

The outcome is I decreased the CPU time, but increased the overall run time:

https://i.imgur.com/uiNt1aq.png

Even thought he computation was vastly quicker, having an overall longer task is counterproductive isn't it?

Does CPU time mean CPU, or does it really mean GPU when running Open_CL work units?

Also, I have 3 Xeon 2660 v4's which I've been using solely in the background for LHC@Home and Cosmology@Home. Is there any point doing Seti@Home for CPU tasks, given it's a GPU project?

Thanks in advance. Criticism also welcomed if I completely got this thing wrong.
ID: 1966660 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1966664 - Posted: 22 Nov 2018, 22:12:48 UTC

Ideally, cpu_time should equal run_time. This means the cpu didn't waste time on other things than running the gpu task as efficiently as possible. The SoG OpenCL gpu application is written to use a full cpu core to support the task. The easiest way to accomplish this is to set it up in an app_config.xml file. You would need to place this:
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
into the section that identifies the SoG application. This tries to provide a full cpu core to support the gpu task. But from your comment about running LHC@Home and Cosmology@Home also which are cpu task projects, I suspect you are overcommitted on the cpu cores with those projects and not providing all the cpu support for the Seti gpu task that it would like. That is the reason that your run_time is so much longer than your cpu_time. Run_time is the elapsed wall clock time the task took to complete. Cpu_time is the actual amount of time a cpu spent on supporting the gpu task.

Also, Seti is not just a gpu project. It is both a cpu and gpu project. You can select whether you want to run cpu or gpu tasks by selecting them in your Project Preferences >> Seti@home Preferences section. With your use of the cpu mainly for your other projects, I would suggest not trying to run Seti cpu tasks. But you will still need to reduce the amount of cpu resources for the other projects to free up more cpu resources for the Seti gpu tasks you want to run. Or, just accept the fact that the Seti gpu tasks aren't running as efficiently as they could.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1966664 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22158
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1966669 - Posted: 22 Nov 2018, 22:34:07 UTC

SETI s NOT a "GPU project", it has applications for a very wide range of processors and operating systems.
So - yes it would be worth adding SETI to the list of projects running on your Xeon 2660 v4 based computers.

Your questions about "optimised applications".

First, at the outset don't worry about the mods to the various configuration files, make sure everything is working before that.
For starters, I see you are using Windows 10 - make sure you have turned "auto driver update" OFF and that you are using the correct drivers directly from nVida. Do a "clean" installation (this is under the "advanced" button)
Use the "Lunatics Installer" to correctly install the applications - this can be found here:- http://mikesworld.eu/download.html - make sure you get the right "bit" version for a 64bit operating system use the 64bit version.....
Choose the right CPU application, that will depend on what features the CPU has (I use CPU-Z to tell me what SSSEEEE version is available on a given CPU).

From what you are running it looks as if you've done most of that "OK".

What do you mean by "increased overall run time" - since the data we get varies quite considerably you might be comparing a "Arecibo VLAR" task with a "BLP normal" task, and on an nVidia GPU the former will run much slower.
Another thing - I see you are using the "optimised CUDA" application, and not the "optimised SoG" one, the latter is MUCH faster (under Windows) - I would re-run the installer, this time choose the "SoG" application, from memory it is buried near the bottom of the "use nVidia GPU" screen.

Your question about "run times" - this can be a bit confusing! - There are three times, one is the "clock" time - that's how long the task was actually running for; then there is CPU time - how much CPU time was used; finally there is the time spent by the GPU doing work. For the SoG application it is fairly common to see all three time within a few percent of each other, clock time will always be the largest.

It is worth noting that if you have different GPUs in one computer (say a GTX1070 and a GTX650) it can be "rather tricky" to get the optimal combination of settings.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1966669 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1966691 - Posted: 23 Nov 2018, 1:22:10 UTC - in response to Message 1966669.  


It is worth noting that if you have different GPUs in one computer (say a GTX1070 and a GTX650) it can be "rather tricky" to get the optimal combination of settings.


AMEN!!!
A proud member of the OFA (Old Farts Association).
ID: 1966691 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1966693 - Posted: 23 Nov 2018, 1:33:17 UTC

Hi,
I was looking at your machines listing and ran across this:https://setiathome.berkeley.edu/show_host_detail.php?hostid=8608043

Is this a dual core or single core machine? With (56/2)/2 it looks like it could be a dual CPU machine with 14 physical cores running hyper-threading?

Are you able to turbo-boost the CPU past 2GHz at all?

High core count machines are a hobby of mine :)

Thank you.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1966693 · Report as offensive
Tazarak_Ordinateur

Send message
Joined: 17 Oct 18
Posts: 4
Credit: 313,898
RAC: 0
United Kingdom
Message 1966732 - Posted: 23 Nov 2018, 8:05:39 UTC - in response to Message 1966664.  

Ideally, cpu_time should equal run_time. This means the cpu didn't waste time on other things than running the gpu task as efficiently as possible. The SoG OpenCL gpu application is written to use a full cpu core to support the task. The easiest way to accomplish this is to set it up in an app_config.xml file. You would need to place this:
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
into the section that identifies the SoG application.


Thank you for pointing this out, as it was set to 0.04 for all applications. By updating this, my run time / cpu time have stabilized at 325 seconds / 315 seconds so are well aligned.

I made a mistake with the Lunatics installer and downloaded a bunch of CUDA50 tasks instead of SoG, so I have to get through that backlog to see how they will react to the SoG versions.
ID: 1966732 · Report as offensive
Tazarak_Ordinateur

Send message
Joined: 17 Oct 18
Posts: 4
Credit: 313,898
RAC: 0
United Kingdom
Message 1966733 - Posted: 23 Nov 2018, 8:09:57 UTC - in response to Message 1966669.  


Another thing - I see you are using the "optimised CUDA" application, and not the "optimised SoG" one, the latter is MUCH faster (under Windows) - I would re-run the installer, this time choose the "SoG" application, from memory it is buried near the bottom of the "use nVidia GPU" screen.


Thanks for this explanation and tip, which I of course botched.

I used display driver uninstaller , reinstalled from nVidia, and then went and picked CUDA50 instead of SoG. So I'll be able to share the results once I get through.

I avoided this blunder on my other computer with GTX 980's, and it seems to be yielding 425 / 415 seconds per WU.

Thanks again, what a nice community.
ID: 1966733 · Report as offensive
Tazarak_Ordinateur

Send message
Joined: 17 Oct 18
Posts: 4
Credit: 313,898
RAC: 0
United Kingdom
Message 1966734 - Posted: 23 Nov 2018, 8:16:51 UTC - in response to Message 1966693.  

Hi,
I was looking at your machines listing and ran across this:https://setiathome.berkeley.edu/show_host_detail.php?hostid=8608043

Is this a dual core or single core machine? With (56/2)/2 it looks like it could be a dual CPU machine with 14 physical cores running hyper-threading?

Are you able to turbo-boost the CPU past 2GHz at all?

High core count machines are a hobby of mine :)

Thank you.

Tom


Good morning, Tom. You might well know more about the hardware, but I did put this machine together as a hobby project.

The computer has two Xeon e5-2660 v4's (ES's from Beijing). So that is 14 cores per processor, hyperthreaded, yielding 56 total threads.

With all 56 threads working at 100%, they run at a 25x mulitplier (so 2.5 GHz). If not fully loaded, one core will boost to 3.1 GHz, but this is really only the case during a reboot.

I have had no luck getting the multipliers to lock to a higher settings with endless BIOS tweaks; and the BCLK overclocking yields very little for the hassle. Both processors have AIO coolers and run at 45c, so could tolerate higher speeds if it were possible (the motherboard is Asus Z10PE-d16).

I have a third one of those processors here waiting to be assembled along with other odds, ends & spares - but am waiting to try and get some RAM of Ebay as the ECC registered stuff is a lot of money right now. If I had wanted to be most cost effective, I would have used an E5 v1 or v2 with the C610 chipset for DDR3 RAM, which is vastly cheaper.

It is really gratifying to watch 56 work unites being processed in BOINC tasks. Feels good man.
ID: 1966734 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1966741 - Posted: 23 Nov 2018, 9:00:45 UTC - in response to Message 1966734.  


Good morning, Tom. You might well know more about the hardware, but I did put this machine together as a hobby project.

I am basically a long time computer hobbyist so I seriously doubt I know any more about the hardware than you do :)


The computer has two Xeon e5-2660 v4's (ES's from Beijing). So that is 14 cores per processor, hyperthreaded, yielding 56 total threads.

With all 56 threads working at 100%, they run at a 25x mulitplier (so 2.5 GHz). If not fully loaded, one core will boost to 3.1 GHz, but this is really only the case during a reboot.

I have had no luck getting the multipliers to lock to a higher settings with endless BIOS tweaks; and the BCLK overclocking yields very little for the hassle. Both processors have AIO coolers and run at 45c, so could tolerate higher speeds if it were possible (the motherboard is Asus Z10PE-d16).

I have a third one of those processors here waiting to be assembled along with other odds, ends & spares - but am waiting to try and get some RAM of Ebay as the ECC registered stuff is a lot of money right now. If I had wanted to be most cost effective, I would have used an E5 v1 or v2 with the C610 chipset for DDR3 RAM, which is vastly cheaper.

It is really gratifying to watch 56 work unites being processed in BOINC tasks. Feels good man.


I understand the feelings about watching the cores crunch. :)

My understanding about turbo boosting is it is difficult to go higher the more cores you have involved. In a "locked" cpu that only has "turbo boost" you are not going to get much higher than the Intels stated multi-core turbo boost.

Some of my bio's will let you turn on turbo boost while leaving speed step off. Others require speed step to be turned on in order to run the turbo boost. It turned out if you disable and/or turn down every mention of "C" states the turbo boost stays on in a loaded system. I found those tricks in the CPU "power plan" and had to use "custom" to enable them. I also had to tell it that I wanted the "performance" plan :)

The results were 2.999 on a 2.6GHz e5-2670v1 (8c/16t) and 3.2999 on a 3GHz e5-2690v2 (10c/20t).
A proud member of the OFA (Old Farts Association).
ID: 1966741 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22158
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1966743 - Posted: 23 Nov 2018, 9:07:12 UTC

Thank you for pointing this out, as it was set to 0.04 for all applications. By updating this, my run time / cpu time have stabilized at 325 seconds / 315 seconds so are well aligned


The value of 0.04 is only a GUIDE to BOINC as to how much CPU will be required to run a task - it is NOT a real value as the GPU will attempt to use as much CPU support as it needs. There is one advantage of having the value close to reality - there is less potential for the CPU being over committed and thus going into paging modes. BOINC looks at those figures and tells the o/s that it thinks it needs so much CPU support - of you have 4 GPU processes each with a "guess" at 0.04 then BOINC will reserve 4x0.04 CPU, but if in reality they need , then the CPU will be over-committed by (4-0.16) - which can be quite significant.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1966743 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1966746 - Posted: 23 Nov 2018, 10:36:37 UTC - in response to Message 1966742.  

The easiest way to accomplish this is to set it up in an app_config.xml file. You would need to place this:
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
into the section that identifies the SoG application. This tries to provide a full cpu core to support the gpu task
Are you sure that you didn't mean App_info.xml ?
Simpler and safer to set up an app_config.xml file. That takes precedence over app_info, but without the risk of losing your entire configuration with a clumsy edit.
ID: 1966746 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1966748 - Posted: 23 Nov 2018, 11:10:21 UTC - in response to Message 1966747.  

You app_config as posted is missing all the tag closures for the second group, as well as the closure for the whole file.

64-bit was tested, but SETI is a very small application. The major difference with doubling everything from 32 to 64 is that it allows programmers to address > 4 GB of GPU memory. We don't need that, and everything shifts around much slower when it's twice as big. So, we kept it lean and mean, and it works faster.
ID: 1966748 · Report as offensive

Message boards : Number crunching : GPU Processing Speeds


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.