Advice on system optimization needed.

Message boards : Number crunching : Advice on system optimization needed.
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6

AuthorMessage
Profile Eric Claussen

Send message
Joined: 31 Jan 00
Posts: 22
Credit: 2,318,750
RAC: 949
United States
Message 2009213 - Posted: 25 Aug 2019, 15:50:11 UTC

I'm going to cut back until I can get more solar on the roof. With the current rates in California this is costing about $6 a day. I think I will set up the timers so it runs during off peak hours for now.

Eric
ID: 2009213 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9904
Credit: 936,094,568
RAC: 1,506,537
United States
Message 2009228 - Posted: 25 Aug 2019, 16:53:44 UTC - in response to Message 2009076.  

No I don't have a problem with overcommitting. That is easy to fix, just reduce the number of cpu cores used until the gap decreases to around 1-2 minutes. I'm OK with that. Since I can't get affinity to work correctly on the problem hosts, I comment out those lines in the script. Problem solved. I can set affinity on the gpu tasks with no issues. On any host. I just want to figure out why the script works differently on cloned identical systems. And works correctly on my one Intel system. Not enough data.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2009228 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9904
Credit: 936,094,568
RAC: 1,506,537
United States
Message 2009229 - Posted: 25 Aug 2019, 16:57:18 UTC - in response to Message 2009213.  

I'm going to cut back until I can get more solar on the roof. With the current rates in California this is costing about $6 a day. I think I will set up the timers so it runs during off peak hours for now.

Eric

I need more solar too. About triple of what I now have. The crunching is costing me $33 a day.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2009229 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4869
Credit: 595,191,701
RAC: 1,398,582
United States
Message 2009262 - Posted: 25 Aug 2019, 22:12:07 UTC - in response to Message 2009228.  

No I don't have a problem with overcommitting. That is easy to fix, just reduce the number of cpu cores used until the gap decreases to around 1-2 minutes. I'm OK with that. Since I can't get affinity to work correctly on the problem hosts, I comment out those lines in the script. Problem solved. I can set affinity on the gpu tasks with no issues. On any host. I just want to figure out why the script works differently on cloned identical systems. And works correctly on my one Intel system. Not enough data.
Hmmmm, so we can ignore what you said here?
...So the only option is to use cpu% to reduce the number of cpu cores used. But the thread scheduler can't keep the task on the same thread and constantly moves it around. And you end up with both an overcommitted cpu and poor cpu_time/run_time tracking to boot.
As far as I know, using cpu% works as it should on all CPUs.
ID: 2009262 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9904
Credit: 936,094,568
RAC: 1,506,537
United States
Message 2009265 - Posted: 25 Aug 2019, 22:29:50 UTC - in response to Message 2009262.  

No I don't have a problem with overcommitting. That is easy to fix, just reduce the number of cpu cores used until the gap decreases to around 1-2 minutes. I'm OK with that. Since I can't get affinity to work correctly on the problem hosts, I comment out those lines in the script. Problem solved. I can set affinity on the gpu tasks with no issues. On any host. I just want to figure out why the script works differently on cloned identical systems. And works correctly on my one Intel system. Not enough data.
Hmmmm, so we can ignore what you said here?
...So the only option is to use cpu% to reduce the number of cpu cores used. But the thread scheduler can't keep the task on the same thread and constantly moves it around. And you end up with both an overcommitted cpu and poor cpu_time/run_time tracking to boot.
As far as I know, using cpu% works as it should on all CPUs.

As usual you post something out of context, just for the sake of being argumentative. The Intel cpu does not move work around the cores. The AMD cpus do for the ones I can't get the script to set affinity working correctly. When the load moves around, the separation between cpu_time and run_time increases, forcing you to drop cores via cpu % to get them back into balance. The one AMD host that does have affinity working correctly can run more cpu % than the others simply because it doesn't move the load around. So it does more work than the others on a like for like basis.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2009265 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4869
Credit: 595,191,701
RAC: 1,398,582
United States
Message 2009266 - Posted: 25 Aug 2019, 22:36:51 UTC - in response to Message 2009265.  
Last modified: 25 Aug 2019, 22:53:19 UTC

Not out of context at all. Your post claimed the %cpu wasn't keeping the CPU from being Over-committed. Read it yourself.
Intels Do move work around cores, All modern CPUs do. It's very easy to see, I first noticed it Years ago when still running BOINC on Windows.
My problem is, in a thread about system optimization, You make a post saying %cpu doesn't work.
That is a problem, and you can expect a response to that post.

I have been playing around with both methods of providing cpu support to gpus today on the 7.16.1 client. All I can say is that if you have a Intel processor either method works and everything runs fine. If on the other hand you have a AMD processor, you will still be cussing the brain-dead Linux AMD cpu thread scheduler and looking for compromises.

Neither way works the way it should. It would be best to set cpu usage to 100% but you will end up with overcommitted cpu threads. And trying to use a max concurrent breaks things entirely. So the only option is to use cpu% to reduce the number of cpu cores used. But the thread scheduler can't keep the task on the same thread and constantly moves it around. And you end up with both an overcommitted cpu and poor cpu_time/run_time tracking to boot.

Maybe in five years of maturity, they will have figured out the scheduler for AMD cpu and have the stability, performance and reliability of the the Intel thread scheduler.
BTW, I don't see CPU affinity mentioned once in that post.
ID: 2009266 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9904
Credit: 936,094,568
RAC: 1,506,537
United States
Message 2009271 - Posted: 25 Aug 2019, 23:10:22 UTC - in response to Message 2009266.  

Well excuse me for not including all the nitty-gritty of my configuration in my first post. I have used my affinity script since moving to Linux and assigned affinity even back in Windows with Process Lasso because of the unique nature of AMD cpus.

Don't know how you are determining that a cpu tasks moves around on Intel or AMD cores. I have never seen my cpu task move around when using affinity. The purpose is to lock the PID of the task to the core it starts on. Process Lasso does it and so does the thread scheduler in Linux. When I can get it to work. My post simply stated that I can't get what should work to work on some of my hosts and that is what I find frustrating.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2009271 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4869
Credit: 595,191,701
RAC: 1,398,582
United States
Message 2009289 - Posted: 26 Aug 2019, 1:04:43 UTC - in response to Message 2009271.  

Your post made it sound as though you couldn't stop cpu over-commitment, you didn't mention affinity. Most people don't bother with affinity, and they also don't bother with over-clocking.
Intel has been moving work around the cores pretty much since they developed multi-core CPUs to balance the loads and temps. It's pretty common knowledge, that's Why they have affinity. Next time you have a chance fire up Windows, use the options in Raistmer's SOG App to lock the CPU core on just a couple tasks, and watch what happens in the CPU monitor. Just run a couple of SOG tasks, and don't use the OS's affinity settings. You should see the tasks move from core to core as the core temps rise, just the way the Developers intended. I decided to stop fighting the Developers some time back, if they think it's important to balance loads and temps, that's good enough for me.
ID: 2009289 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3576
Credit: 213,233,488
RAC: 506,622
United States
Message 2009583 - Posted: 28 Aug 2019, 12:52:37 UTC

On this system: https://setiathome.berkeley.edu/show_host_detail.php?hostid=8696615

The RAC is still increasing with great enthusiasm. To the point where it has cracked the top 20 (at least for the moment).

The gpus are set to use 0.5 per core using the Linux/Tbar/petro All-in-One combo.

The "cpu over committed" issue on the time differences continues. I am guessing that the difference between the wallclock time and the cpu time usage is about 1/3 (taking maybe a third again more wall clock time).

As I said before, I am waiting for the RAC climb to peter out.....

Tom
A proud member of the OFA (Old Farts Association)
"Over the hill? WHAT Hill? I don't REMEMBER any hill...." (from a bumper sticker I bought at a truck stop).
"If its Tourist Season why can't we shoot them?" (another bumper sticker)
ID: 2009583 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17809
Credit: 407,300,268
RAC: 112,672
United Kingdom
Message 2009626 - Posted: 28 Aug 2019, 17:39:36 UTC

The one sure-fire way of stopping CPU over-commitment is to use the "use at most x% of CPUs" - set this to give you 1 core per GPU task running. If the CPU has less cores than the number of GPU tasks being run then you are going to be stuck with over-commitment even if you stop CPU crunching.
Remember the "use 0.5cpu" setting is a "weak target", not an absolute figure - if the GPU tasks need more than 0.5cpu then they will attempt to grab the extra they need, and most likely fail to get it, and in so doing they will slow down both the CPU and GPU tasks....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2009626 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11664
Credit: 174,492,068
RAC: 119,865
Australia
Message 2009754 - Posted: 29 Aug 2019, 6:02:39 UTC

or just reserving 1, or a half, or a third of a CPU thread to support each GPU WU being processed.
Grant
Darwin NT
ID: 2009754 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4869
Credit: 595,191,701
RAC: 1,398,582
United States
Message 2009757 - Posted: 29 Aug 2019, 6:10:32 UTC - in response to Message 2009754.  

All you have to do is Lower the Use at Most ___ % of the CPUs until you reach the point of Not being Over-committed.
Anything else is a Red Herring. That One setting should be all you need to set. That's what the CUDA Developers decided 12 Years ago, and it holds to this day.
All the other stuff is just confusing people. One Setting is All it takes.
ID: 2009757 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3576
Credit: 213,233,488
RAC: 506,622
United States
Message 2009791 - Posted: 29 Aug 2019, 13:28:16 UTC - in response to Message 2009757.  

All you have to do is Lower the Use at Most ___ % of the CPUs until you reach the point of Not being Over-committed.
Anything else is a Red Herring. That One setting should be all you need to set. That's what the CUDA Developers decided 12 Years ago, and it holds to this day.
All the other stuff is just confusing people. One Setting is All it takes.


I think all three of you are referring to the setting that controls the # of cpus/threads that Boinc will use. Right?

Tom
A proud member of the OFA (Old Farts Association)
"Over the hill? WHAT Hill? I don't REMEMBER any hill...." (from a bumper sticker I bought at a truck stop).
"If its Tourist Season why can't we shoot them?" (another bumper sticker)
ID: 2009791 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17809
Credit: 407,300,268
RAC: 112,672
United Kingdom
Message 2009813 - Posted: 29 Aug 2019, 14:48:14 UTC

Yes.
The setting of "use at most" restricts the number of cores available to CPU tasks, but does not impact on the number of cores that GPU tasks can access. Those required by GPUs are drawn preferentially from those not committed elsewhere (BOINC, O/S etc.)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2009813 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1838
Credit: 780,983,594
RAC: 2,616,248
United States
Message 2009816 - Posted: 29 Aug 2019, 15:06:13 UTC - in response to Message 2009813.  

Yes.
The setting of "use at most" restricts the number of cores available to CPU tasks, but does not impact on the number of cores that GPU tasks can access. Those required by GPUs are drawn preferentially from those not committed elsewhere (BOINC, O/S etc.)


which is why, if you are using the -nobs argument, it's a good idea to tell BOINC that you are reserving 1 CPU core for 1 GPU, so that when you set the CPU%, you actually get that number or very close to it, and not have to play trial and error with the CPU% values to get your desired outcome.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2009816 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6

Message boards : Number crunching : Advice on system optimization needed.


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.