Error While Computing (and a weird question, too)

Message boards : Number crunching : Error While Computing (and a weird question, too)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Bubbajms

Send message
Joined: 2 Dec 03
Posts: 29
Credit: 64,248,758
RAC: 0
United States
Message 1449113 - Posted: 2 Dec 2013, 1:26:17 UTC

First machine in question.. http://setiathome.berkeley.edu/results.php?hostid=6784960

In the space of an afternoon I added a GPU to this computer, updated to 7.2.28 - at one point (can't recall before or after the program update) I made sure to install the ATI drivers rather than the Windows Update version.

Looks like I'm getting errors on all GPU work - I've got similar systems with ATI GPUs that are running fine, so I'm not sure what to be looking for as far as an error is concerned. Ideas?

Weird Question - My top two performers are my daily driver and my daily video editing machine. Both get used 40 hours a week - sometimes one or the other will have a little more than that, but they're both basically workstations that do their thing in the offtime.

Top performer, with an i7-2600 (3.4) and a GTX670 - http://setiathome.berkeley.edu/show_host_detail.php?hostid=6818458

#2, with an i7-4770 (3.5) and dual GTX 660s - http://setiathome.berkeley.edu/show_host_detail.php?hostid=6786867

Looking at the credit per day, the top performer runs better - but why? If they're getting equal amounts of use, shouldn't 3.5 and dual 660s trump the 3.4 and single GTX670? Does the extra on the GPU really score that much? Am I missing something somewhere?
ID: 1449113 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1449116 - Posted: 2 Dec 2013, 1:52:57 UTC - in response to Message 1449113.  
Last modified: 2 Dec 2013, 2:03:30 UTC

First machine in question.. http://setiathome.berkeley.edu/results.php?hostid=6784960

In the space of an afternoon I added a GPU to this computer, updated to 7.2.28 - at one point (can't recall before or after the program update) I made sure to install the ATI drivers rather than the Windows Update version.

Looks like I'm getting errors on all GPU work - I've got similar systems with ATI GPUs that are running fine, so I'm not sure what to be looking for as far as an error is concerned. Ideas?

It's running an ancient driver with only SDK 2.3 support, I think the OpenCL MB app needs APP SDK 2.6 support,

ATI Driver Version Cheat Sheet

upgrade to a recent driver, then delete the compilations that the app made under the ancient driver, they are named similar to (depends on your CPU & GPU):

MultiBeam_Kernels_r1843.clHD5_Sumo.bin_V7
r1843_IntelRXeonRCPUE5462280GHz.wisdom
MB_clFFTplan_Sumo_8_r1843.bin -> MB_clFFTplan_Sumo_524288_r1843.bin

Claggy
ID: 1449116 · Report as offensive
Bubbajms

Send message
Joined: 2 Dec 03
Posts: 29
Credit: 64,248,758
RAC: 0
United States
Message 1449131 - Posted: 2 Dec 2013, 3:07:26 UTC

I was sure that I'd updated the driver to the current version - but it looks like I forgot to restart the BOINC client afterwards, so it was referencing a driver that wasn't in use at all. When I restarted the client it downloaded some fresh exe files and now shows to be working. I'll follow up.

Any ideas on the race between my other two machines?
ID: 1449131 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1449140 - Posted: 2 Dec 2013, 4:37:20 UTC - in response to Message 1449131.  



Any ideas on the race between my other two machines?



You are running 4-core processors. I know they hyper-thread, but they are 4 core processors.

Judging by your run-times it looks like you are trying to do at least three, maybe 4 "work units" per video card.

It looks like you are also trying to process CPU "work units" at the same time.

The short version is that you are "starving" the GPU work units by not giving them enough CPU. Also, since GPU work requires *some* CPU time, if the CPUs are kept busy the GPUs work units will show "waiting to run" in BOINC Manager.

This is bad.

Since a GPU is 10-20 times as fast as the CPU you want to be sure the GPUs have all the resources they need before running any CPU work.

I suspect you will be better off running no CPU work on the dual 660 machine and maybe using one, or none, of the CPU cores on the 670 machine. I'd run no more than 3 GPU work units at a time on any given GPU (because of CPU resources).

Running two GPU work units at a time will give you a big boost in output. Running three at a time might give you a little boost in output. Running four at a time will give you a tiny boost if any, or might even cause a small decrease, in output. (once the cards are showing 90-97% GPU utilization in Precision X or Afterburner, you're doing all you need to try to do)

I hope all of that made sense. The moral of the story is that you shouldn't run all of your cores with CPU work and try to do GPU work at the same time.


ID: 1449140 · Report as offensive
Bubbajms

Send message
Joined: 2 Dec 03
Posts: 29
Credit: 64,248,758
RAC: 0
United States
Message 1449293 - Posted: 2 Dec 2013, 19:39:25 UTC

Interesting.. all this GPU business is news to me for sure. It would seem like running as much as possible would give you the most work, but I'll have to do some tinkering and see what I can do. Thanks for the input!
ID: 1449293 · Report as offensive
Bubbajms

Send message
Joined: 2 Dec 03
Posts: 29
Credit: 64,248,758
RAC: 0
United States
Message 1449590 - Posted: 3 Dec 2013, 15:44:54 UTC

Alright, in the office looking at both of these computers this morning.

Top running computer was only running one WU on the GPU - I copied app_config.xml info to the project directory and updated, so now I'm running two GPU WU's at a time.

The way I understand it (and feel free to correct me!) I'm set up to use 50% of the processors on this machine. When I look at the running tasks I see 4 CPU tasks and 2 GPU tasks, with nothing "Waiting to run".

The #2 computer is running 8 CPU tasks, 4 GPU tasks (2 per card with two cards total) and also has nothing "Waiting to run".

There are a couple of things I'm not clear on, which might be issues..

1 - Top computer appears to run whenever CPU usage is under 25%, and does fine with that - it's running right now as I type on it, no problems evident. Computer 2 shows that it should run when CPU usage is under 50%, but it's not running - even though I would think there's basically no usage at this time. If I set it to "Always Run" it fires right up..

2 - Top computer shows .134CPUs and 1 GPU for both of the GPU WUs that are running. Computer 2 shows .04 CPUs and .5 GPU.

More pieces to my puzzle?
ID: 1449590 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1450194 - Posted: 5 Dec 2013, 6:15:22 UTC - in response to Message 1449590.  



More pieces to my puzzle?



"The way I understand it (and feel free to correct me!) I'm set up to use 50% of the processors on this machine. When I look at the running tasks I see 4 CPU tasks and 2 GPU tasks, with nothing "Waiting to run". "

That is correct. You have "eight cores sort-of" and you are committing five. Four CPU tasks and two GPU tasks which are reserving less-than .5 CPUs each. Total "five cores." Nothing should be "waiting."

However... you might find the whole thing faster if you go to 34% of your "cores" (it'll use three) and leave a REAL core open for the GPU's processing. You might even find it even faster if you went to 25% and left two REAL cores open for the GPUs.

I'm running AMD CPUs and your situation is different. However, I eventually came to the conclusion that for fastest through-put I needed to leave a full, real, core open per GPU *work unit*.

Is that the core's fault? Is it the RAM I/O? Is it the way the CPU feeds the PCIe bus?

I don't know.

That's why I eventually quit doing CPU work. The 1/20th output of the CPU/GPU was not worth any possibility of slowing the GPU.

"The #2 computer is running 8 CPU tasks, 4 GPU tasks "

I don't understand this. In theory the GPU tasks should be "requiring" a core (in combination). Unless there's more to the story, you're reporting that you have nine cores busy simultaneously.





ID: 1450194 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1450197 - Posted: 5 Dec 2013, 6:26:35 UTC
Last modified: 5 Dec 2013, 7:01:04 UTC

In your task list does it say, Ready to run, Or does it say waiting to run?

Im going to try something I will edit after I do.

Edit-OK, I went to local preferances and set my CPU uasge to 85% and I did see that one of my work units did say Waiting to run.I went from running 8 work units to seven work units. Plus im running my GPU with one work unit.
So Im guessing that it will say waiting to run untill a core frees up and then it will start running.

edit to my edit. At 85% im showing two work units waiting to run.

Last edit- I watched a core free up and one of my waiting to run tasks went to running.

I hope this helps in diagnosing your problem.
[/quote]

Old James
ID: 1450197 · Report as offensive
Bubbajms

Send message
Joined: 2 Dec 03
Posts: 29
Credit: 64,248,758
RAC: 0
United States
Message 1450805 - Posted: 6 Dec 2013, 21:26:12 UTC



This is a current (as of 4pm on the 6th of December 2013) screenshot of the 2nd Place machine. As you can see, I've got complete tasks, running tasks and "ready to start" tasks.
ID: 1450805 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1452006 - Posted: 9 Dec 2013, 20:43:23 UTC - in response to Message 1450805.  

I really didn't want to just leave this sitting out there as though you were being ignored, nor leave a casual reader believing that your machines have a problem.

The post that I am replying-to makes very little sense to me *unless* you have made adjustments to the way the stock application runs. You have a total of 8 CPU work units working at the same time that you have four GPU work units running.

I am *guessing* that you have made changes in the app_config.xml file to accomplish that.

Sure, that can be done. You can do all sorts of things.

The question is: By making these changes have I helped or hurt myself?

My challenge is trying to infer your limitations by comparing them to the results I've gotten (over time) running "optimized apps" and with a lot of different hardware that is not exactly like yours, and in some cases is nothing like yours. So, I'm not in a position to "just tell you" what the best settings for that machine are, like cookbook instructions.

But I don't have to be able to do that to send you down the right path, nor do I have to have your machine to "tweak" to know that the machine is not optimized. All I have to do is look at your run-times.

If the differences between your settings and optimized settings were even close, the differences in our hardware would make it impossible for me to see a problem. But your machine is nowhere close to turning-out as much work as it can.

I can't get ultra-technical about this, because only the guys who wrote the application could tell you exactly what your settings are causing down at the "reading x bytes and writing y to cache and transferring..." level.

Maybe the best way I can explain what's happening in a broad sense would be to relate it to something OLD. Back when RAM cost $100/MB, it was commonplace to run a system out of memory. Applications would have to use a part of the hard drive "as though it were" RAM. It was fairly easy to bog a machine down to the point that the HDD light stayed on and you had to wait and wait and wait for the machine to catch-up to you while you typed or moved the mouse. No program you were running could get anything done because the computer was constantly having to read and write to the HDD.

What you have happening to you is a mild version of that. There is so much swapping back and forth to memory, so much reading and writing on the "lanes" to the PCIe bus to the video card, and so much "swapping" one task for another that you are losing valuable computing time while the parts that crunch, wait.

Instead of cleaning the attic, then cleaning the basement, then mowing the lawn, then grocery shopping you asking the computer to pick up one box in the basement, walk it half-way across the basement, put it down, go to the attic, turn on the light, go back to the basement and pick up the box, move it another three feet, then go start the lawn mower, stop it, get in the car, get part of the grocery shopping done, leave the buggy on an aisle, drive home, go to the attic, find the mouse-infested chair cushion, pick it up, put it down, go back to the lawn mower, etc.

You will eventually get all of your tasks completed, and you could add "write a poem" and "watch a movie on DVD" to the list of things you are doing, but obviously, even though you would **eventually** get it all done, you've wasted an incredible amount of time by shifting from one task to another.

That's what seems to be going-on with your dual 660 machine.

How do I know?

I'm looking at your run-times on valid completed tasks. They are all over the map, including some CUDA-42 work that's taking entirely too long. The GPU is "waiting" for the CPU to give it something to do. When the CPU gets around to doing what the GPU needs, then something else is having to stop and wait.

So, while it might be anti-intuitive that doing fewer things accomplishes more, it's still true.

IF you want to maximize the throughput / RAC / "credits" of that computer, you should ask it to do fewer things simultaneously. The obvious place to start is to reduce the least productive tasks, first. In this case, that means you want to reduce the number of CPU tasks being worked-on at one time.

Again, my suggestion is for you to reduce the number of CPU tasks running on that machine to *one or two* at a time by adjusting the "on multiprocessor systems use at most _____ percent of the processors" setting in BOINC Manager, Tools, Computing Preferences, until only one CPU task is running. You might can get two to run at a time without slowing you down, but I seriously doubt you can run three at a time without slowing the GPUs down. I could be wrong about the highest possible number of CPU tasks you should run (because of our hardware differences). Maybe. But I'm in the right ballpark.

Give it a try. It's non-destructive. You can always put it back the way you have it if you don't find what I'm saying to be true.
ID: 1452006 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1452042 - Posted: 9 Dec 2013, 22:31:48 UTC

The problem appears to me to be over-commitment of CPU resources, which causes extra swapping of threads running - it's called "thrashing" when it happens to memory rather than CPU. (I used to be a systems programmer for IBM in the bad old days of System 360 and their first virtual memory OS, CP-67, and over-commitment of memory was a real bugaboo at the time, and caused much agony.)

A good deal of CPU is being consumed in system overhead, which I believe is why OP is not getting as much work done as expected. Look at the ratio of CPU time used to wall clock time to get an idea of how bad it may be. On my I7-4771 with dual GTX 680s, I allow 7 CPU threads and 3/GPU, but I also allocate 1 CPU per GPU AP (via app_info) to prevent the system from using > 7 CPU threads (the MB GPU is not a major factor) and I am getting close to 50K RAC from it. The WUs on this machine are rarely more than 105% wall to CPU time.

The suggestion of reserving one virtual core for graphics will help.

If he is doing AP on the GPUs it will be even worse because a GPU thread doing AP really needs a full CPU thread to support it.
ID: 1452042 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1452069 - Posted: 10 Dec 2013, 0:06:21 UTC - in response to Message 1452038.  

... each of those WUs require .04% cpu time ...

Wrong terminology.

Think "has been assigned" .04% CPU - a value you'll find in app_info.xml, updated from a rather arbitrary decision I made three or four years ago. It works well enough not to have been challenged in all that time, at least for one modest GPU and a Q6600. But if you really want to squeeze it until the pips squeak - not something usually requested by a scientific research project - you have to re-research every assumption from the ground up, including that one.
ID: 1452069 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1452094 - Posted: 10 Dec 2013, 1:09:43 UTC
Last modified: 10 Dec 2013, 1:09:53 UTC

Just a tip. Try the new version for AP crunching with Nvidia GPU´s avaiable from Beta, you will notice a lot of decrease in the CPU usage. That changes everything. Thake a look at the CPU times on my ready crunched AP WU.
ID: 1452094 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1452176 - Posted: 10 Dec 2013, 6:10:29 UTC - in response to Message 1452090.  

Well, I free up one core (I7 for two GTX760s) and my overall cpu runs at less than 100% (according to task man or perfmon), so I was assuming freeing up one core was enough.


I haven't tried the beta AP, which as Juan (and others) say frees-up some of the CPU. That's almost certainly the best solution.

BUT... it isn't really a matter of your CPU running less than 100%. It doesn't run during "wait-states" and it waits to issue or receive instructions. So that 20% of "headroom" you see may be the CPU "waiting."

I can't swear anything about an i7. I don't have one.

I tried that whole "try to get the resources to show me almost 100% usage" thing and on my multi-core AMD FX and Phenoms if I use more than a peak 60-70% work units start slowing down (and it is different for the Phenom IIs and the FX processors and differs among FX processors, as well). There could be all kinds of reasons --- and that ten or twenty extra MB tasks I might be able to get if I were "clever" wasn't worth the bother.

Yeah, we used to call it "thrashing the hard drive" when it was reading and writing to the swap file continuously. That's one reason I really liked OS/2 2.1 and did not like OS/2 Warp. Even SCSI drives on Adaptec caching controllers were too slow to make a "swap file" a practical solution to the high cost of RAM.

I'm not sure how much the "assignment" has to do with how much CPU is really consumed. If you go in and "assign" .75 CPUs, I'm not sure it uses any more or less than if you assign it .001 CPUs. What using 75% will do is cause two GPU units to force two CPU units not to run.

You guys can just ignore me if you want-to, but the way to tweak this is to run no CPU work, then start adding CPU cores one at a time until you see your GPU work slow down, then back off one core. I think you'll be surprised by how quickly that happens with these Open CL applications.

It didn't used to be like this and you could run more stuff simultaneously.

As always, I'm just trying to be helpful based solely on my own experiences keeping multiple triple, double, and single GPU rigs with anywhere from one P4 core with hyper-threading, an old dual core, and several 4, 6, and one 8 core machine running everything from XP to Windows 8...

If you know I'm wrong, just know I'm wrong with the best of intentions and ignore me.
ID: 1452176 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1452199 - Posted: 10 Dec 2013, 6:40:46 UTC - in response to Message 1452038.  

I was looking at this because I, too, am trying to figure out the most efficient setup.

Looks like to me in this case that each of those GPUs are running TWO WUs, each of those WUs require .04% cpu time. So it looks like to me that you need a total of (.04 * 4) = .16 cpu to feed all that's going on with the GPUs. So it looks like freeing up ONE of your EIGHT (hyperthreaded) cores will make it most efficient.


Ok... no. That's not the way it works. .04 "cores" "reserved" does not really mean .04 cores used. You can't make it use more of a CPU core by reserving .5 "cores" per work unit. You can "reserve" .5 cores so that any three GPU work units will sum to 1.5 cores "reserved" and, therefore, BOINC (not the crunching app) will tell your machine not to use two of your CPU cores to run CPU work. They are "reserved" for GPU work.

THE most efficient setup is to run as many work units on any given GPU as that GPU will run without slowing down. For most of my cards that's three, although some run best with two and the lesser cards run no better or worse (to speak of) whether I run one or two at a time.



I don't see consistent results between my computers and am currently trying to figure out why some need more cpu time and others need less cpu time.



If you get all of those answers figured-out, I'd love to know. I've suspected everything from L2 and L3 cache sizes to multipliers to raw clocking speed to RAM speeds.


(and as a side note, I don't see any real explanations on the lunatics site why I should download what ever they have to download and run it.)



In their defense, I think they are like the lab these days; Too busy to stop and tell us what they are doing.

Yeah, I can't straighten this out with authority, but I think you'll find the crunching application (the worker) is not bound by the number of BOINC's "assigned" cores. BOINC(the management)just uses that to figure-out how many "workers" they can employ at any given time.

That is to say, the "assigned" cores are "reserving" CPU only in the sense that BOINC (the manager, not the crunching program)won't simultaneously "allow" CPU tasks to run on two cores if all of the "assignments" for GPUs add-up to more than one.

And now I am given to believe that different steps in processing every work unit are requiring differing amounts of CPU resources as that work unit runs, not to mention the variations in the work done on individual work units. How would you (royal "you") know when the crunching applications have accidentally reached their maximum load on the CPU at the same time? You'd have to get lucky and be watching when it happened.
ID: 1452199 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1452209 - Posted: 10 Dec 2013, 7:28:21 UTC
Last modified: 10 Dec 2013, 7:34:46 UTC

As I am just learning on how to modify settings and files I will ask that you bear with me.

The other night when the OP posted his screen shot I was at work and saw that he was running 8 cores and 2 work units on his two GPU's. The IT guys where I work dont mind us looking on the web during breaks or lunch But would be very P.O.'d if I logged on to the site.

The OP intitally complained of work units waiting to run. I did a test which You probally read That the only way I could do that was to drop another core from crunching. When the work unit was done the waiting to run then started.

Now my guess is the OP is doing what tbret suggested. He is running work units on 8 cores and 2 each work units on each card. So to me that would explain why he is seeing waiting to run. He has to much going on at the same time to crunch.

Now I just had a look at my one machine that is crunching a AP and a MB on my GPU. What im seeing according to BOINC is the MB is using 0.04 CPU's and 0.47 GPU's. For the AP it says 0.67 CPU's and 0.051 GPU's. I am running 7 cores.

So for my I7 3770 and my GTX 550 Ti freeing one core seems to work for the CPU, But I dontn know what a GPU core is. (If anyone can explain that) Plus my app_info file that Richard posted at our team website, Lets me run only one AP and one MB or two MB's at a time on my card.


Id say the OP needs to free up two cores to feed his GPU's. Maybe even three.
I hope I havent confused matters more. Im still very confused learing this stuff.
[/quote]

Old James
ID: 1452209 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1452217 - Posted: 10 Dec 2013, 8:14:22 UTC - in response to Message 1452209.  

As Richard said, those arbitrary CPU numbers displayed by BOINC actually don't mean much. If you want to see the actual process usage, download and run SIV. From the main window right click on the 'Windows' button and select 'BOINC Status'. There will be a CPU graph for each BOINC process. Mouse over the graph and the breakdown of Usage will appear. You will see actual usage varies from moment to moment. Of course, things will change dramatically when the AP App begins one of those mislabeled APs that should be labeled as a CPU task verses a GPU task. IMHO, any task that uses over 50% CPU should be labeled as a CPU task. Actually, I think any AP blanked over about 40% should be labeled a CPU task and sent to the CPUs, but, that's just me ;-)
ID: 1452217 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Error While Computing (and a weird question, too)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.