Message boards :
Number crunching :
Error While Computing (and a weird question, too)
Message board moderation
Author | Message |
---|---|
Bubbajms Send message Joined: 2 Dec 03 Posts: 29 Credit: 64,248,758 RAC: 0 |
First machine in question.. http://setiathome.berkeley.edu/results.php?hostid=6784960 In the space of an afternoon I added a GPU to this computer, updated to 7.2.28 - at one point (can't recall before or after the program update) I made sure to install the ATI drivers rather than the Windows Update version. Looks like I'm getting errors on all GPU work - I've got similar systems with ATI GPUs that are running fine, so I'm not sure what to be looking for as far as an error is concerned. Ideas? Weird Question - My top two performers are my daily driver and my daily video editing machine. Both get used 40 hours a week - sometimes one or the other will have a little more than that, but they're both basically workstations that do their thing in the offtime. Top performer, with an i7-2600 (3.4) and a GTX670 - http://setiathome.berkeley.edu/show_host_detail.php?hostid=6818458 #2, with an i7-4770 (3.5) and dual GTX 660s - http://setiathome.berkeley.edu/show_host_detail.php?hostid=6786867 Looking at the credit per day, the top performer runs better - but why? If they're getting equal amounts of use, shouldn't 3.5 and dual 660s trump the 3.4 and single GTX670? Does the extra on the GPU really score that much? Am I missing something somewhere? |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
First machine in question.. http://setiathome.berkeley.edu/results.php?hostid=6784960 It's running an ancient driver with only SDK 2.3 support, I think the OpenCL MB app needs APP SDK 2.6 support, ATI Driver Version Cheat Sheet upgrade to a recent driver, then delete the compilations that the app made under the ancient driver, they are named similar to (depends on your CPU & GPU): MultiBeam_Kernels_r1843.clHD5_Sumo.bin_V7 r1843_IntelRXeonRCPUE5462280GHz.wisdom MB_clFFTplan_Sumo_8_r1843.bin -> MB_clFFTplan_Sumo_524288_r1843.bin Claggy |
Bubbajms Send message Joined: 2 Dec 03 Posts: 29 Credit: 64,248,758 RAC: 0 |
I was sure that I'd updated the driver to the current version - but it looks like I forgot to restart the BOINC client afterwards, so it was referencing a driver that wasn't in use at all. When I restarted the client it downloaded some fresh exe files and now shows to be working. I'll follow up. Any ideas on the race between my other two machines? |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
You are running 4-core processors. I know they hyper-thread, but they are 4 core processors. Judging by your run-times it looks like you are trying to do at least three, maybe 4 "work units" per video card. It looks like you are also trying to process CPU "work units" at the same time. The short version is that you are "starving" the GPU work units by not giving them enough CPU. Also, since GPU work requires *some* CPU time, if the CPUs are kept busy the GPUs work units will show "waiting to run" in BOINC Manager. This is bad. Since a GPU is 10-20 times as fast as the CPU you want to be sure the GPUs have all the resources they need before running any CPU work. I suspect you will be better off running no CPU work on the dual 660 machine and maybe using one, or none, of the CPU cores on the 670 machine. I'd run no more than 3 GPU work units at a time on any given GPU (because of CPU resources). Running two GPU work units at a time will give you a big boost in output. Running three at a time might give you a little boost in output. Running four at a time will give you a tiny boost if any, or might even cause a small decrease, in output. (once the cards are showing 90-97% GPU utilization in Precision X or Afterburner, you're doing all you need to try to do) I hope all of that made sense. The moral of the story is that you shouldn't run all of your cores with CPU work and try to do GPU work at the same time. |
Bubbajms Send message Joined: 2 Dec 03 Posts: 29 Credit: 64,248,758 RAC: 0 |
Interesting.. all this GPU business is news to me for sure. It would seem like running as much as possible would give you the most work, but I'll have to do some tinkering and see what I can do. Thanks for the input! |
Bubbajms Send message Joined: 2 Dec 03 Posts: 29 Credit: 64,248,758 RAC: 0 |
Alright, in the office looking at both of these computers this morning. Top running computer was only running one WU on the GPU - I copied app_config.xml info to the project directory and updated, so now I'm running two GPU WU's at a time. The way I understand it (and feel free to correct me!) I'm set up to use 50% of the processors on this machine. When I look at the running tasks I see 4 CPU tasks and 2 GPU tasks, with nothing "Waiting to run". The #2 computer is running 8 CPU tasks, 4 GPU tasks (2 per card with two cards total) and also has nothing "Waiting to run". There are a couple of things I'm not clear on, which might be issues.. 1 - Top computer appears to run whenever CPU usage is under 25%, and does fine with that - it's running right now as I type on it, no problems evident. Computer 2 shows that it should run when CPU usage is under 50%, but it's not running - even though I would think there's basically no usage at this time. If I set it to "Always Run" it fires right up.. 2 - Top computer shows .134CPUs and 1 GPU for both of the GPU WUs that are running. Computer 2 shows .04 CPUs and .5 GPU. More pieces to my puzzle? |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
"The way I understand it (and feel free to correct me!) I'm set up to use 50% of the processors on this machine. When I look at the running tasks I see 4 CPU tasks and 2 GPU tasks, with nothing "Waiting to run". " That is correct. You have "eight cores sort-of" and you are committing five. Four CPU tasks and two GPU tasks which are reserving less-than .5 CPUs each. Total "five cores." Nothing should be "waiting." However... you might find the whole thing faster if you go to 34% of your "cores" (it'll use three) and leave a REAL core open for the GPU's processing. You might even find it even faster if you went to 25% and left two REAL cores open for the GPUs. I'm running AMD CPUs and your situation is different. However, I eventually came to the conclusion that for fastest through-put I needed to leave a full, real, core open per GPU *work unit*. Is that the core's fault? Is it the RAM I/O? Is it the way the CPU feeds the PCIe bus? I don't know. That's why I eventually quit doing CPU work. The 1/20th output of the CPU/GPU was not worth any possibility of slowing the GPU. "The #2 computer is running 8 CPU tasks, 4 GPU tasks " I don't understand this. In theory the GPU tasks should be "requiring" a core (in combination). Unless there's more to the story, you're reporting that you have nine cores busy simultaneously. |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
In your task list does it say, Ready to run, Or does it say waiting to run? Im going to try something I will edit after I do. Edit-OK, I went to local preferances and set my CPU uasge to 85% and I did see that one of my work units did say Waiting to run.I went from running 8 work units to seven work units. Plus im running my GPU with one work unit. So Im guessing that it will say waiting to run untill a core frees up and then it will start running. edit to my edit. At 85% im showing two work units waiting to run. Last edit- I watched a core free up and one of my waiting to run tasks went to running. I hope this helps in diagnosing your problem. [/quote] Old James |
Bubbajms Send message Joined: 2 Dec 03 Posts: 29 Credit: 64,248,758 RAC: 0 |
|
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
I really didn't want to just leave this sitting out there as though you were being ignored, nor leave a casual reader believing that your machines have a problem. The post that I am replying-to makes very little sense to me *unless* you have made adjustments to the way the stock application runs. You have a total of 8 CPU work units working at the same time that you have four GPU work units running. I am *guessing* that you have made changes in the app_config.xml file to accomplish that. Sure, that can be done. You can do all sorts of things. The question is: By making these changes have I helped or hurt myself? My challenge is trying to infer your limitations by comparing them to the results I've gotten (over time) running "optimized apps" and with a lot of different hardware that is not exactly like yours, and in some cases is nothing like yours. So, I'm not in a position to "just tell you" what the best settings for that machine are, like cookbook instructions. But I don't have to be able to do that to send you down the right path, nor do I have to have your machine to "tweak" to know that the machine is not optimized. All I have to do is look at your run-times. If the differences between your settings and optimized settings were even close, the differences in our hardware would make it impossible for me to see a problem. But your machine is nowhere close to turning-out as much work as it can. I can't get ultra-technical about this, because only the guys who wrote the application could tell you exactly what your settings are causing down at the "reading x bytes and writing y to cache and transferring..." level. Maybe the best way I can explain what's happening in a broad sense would be to relate it to something OLD. Back when RAM cost $100/MB, it was commonplace to run a system out of memory. Applications would have to use a part of the hard drive "as though it were" RAM. It was fairly easy to bog a machine down to the point that the HDD light stayed on and you had to wait and wait and wait for the machine to catch-up to you while you typed or moved the mouse. No program you were running could get anything done because the computer was constantly having to read and write to the HDD. What you have happening to you is a mild version of that. There is so much swapping back and forth to memory, so much reading and writing on the "lanes" to the PCIe bus to the video card, and so much "swapping" one task for another that you are losing valuable computing time while the parts that crunch, wait. Instead of cleaning the attic, then cleaning the basement, then mowing the lawn, then grocery shopping you asking the computer to pick up one box in the basement, walk it half-way across the basement, put it down, go to the attic, turn on the light, go back to the basement and pick up the box, move it another three feet, then go start the lawn mower, stop it, get in the car, get part of the grocery shopping done, leave the buggy on an aisle, drive home, go to the attic, find the mouse-infested chair cushion, pick it up, put it down, go back to the lawn mower, etc. You will eventually get all of your tasks completed, and you could add "write a poem" and "watch a movie on DVD" to the list of things you are doing, but obviously, even though you would **eventually** get it all done, you've wasted an incredible amount of time by shifting from one task to another. That's what seems to be going-on with your dual 660 machine. How do I know? I'm looking at your run-times on valid completed tasks. They are all over the map, including some CUDA-42 work that's taking entirely too long. The GPU is "waiting" for the CPU to give it something to do. When the CPU gets around to doing what the GPU needs, then something else is having to stop and wait. So, while it might be anti-intuitive that doing fewer things accomplishes more, it's still true. IF you want to maximize the throughput / RAC / "credits" of that computer, you should ask it to do fewer things simultaneously. The obvious place to start is to reduce the least productive tasks, first. In this case, that means you want to reduce the number of CPU tasks being worked-on at one time. Again, my suggestion is for you to reduce the number of CPU tasks running on that machine to *one or two* at a time by adjusting the "on multiprocessor systems use at most _____ percent of the processors" setting in BOINC Manager, Tools, Computing Preferences, until only one CPU task is running. You might can get two to run at a time without slowing you down, but I seriously doubt you can run three at a time without slowing the GPUs down. I could be wrong about the highest possible number of CPU tasks you should run (because of our hardware differences). Maybe. But I'm in the right ballpark. Give it a try. It's non-destructive. You can always put it back the way you have it if you don't find what I'm saying to be true. |
Cruncher-American Send message Joined: 25 Mar 02 Posts: 1513 Credit: 370,893,186 RAC: 340 |
The problem appears to me to be over-commitment of CPU resources, which causes extra swapping of threads running - it's called "thrashing" when it happens to memory rather than CPU. (I used to be a systems programmer for IBM in the bad old days of System 360 and their first virtual memory OS, CP-67, and over-commitment of memory was a real bugaboo at the time, and caused much agony.) A good deal of CPU is being consumed in system overhead, which I believe is why OP is not getting as much work done as expected. Look at the ratio of CPU time used to wall clock time to get an idea of how bad it may be. On my I7-4771 with dual GTX 680s, I allow 7 CPU threads and 3/GPU, but I also allocate 1 CPU per GPU AP (via app_info) to prevent the system from using > 7 CPU threads (the MB GPU is not a major factor) and I am getting close to 50K RAC from it. The WUs on this machine are rarely more than 105% wall to CPU time. The suggestion of reserving one virtual core for graphics will help. If he is doing AP on the GPUs it will be even worse because a GPU thread doing AP really needs a full CPU thread to support it. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
... each of those WUs require .04% cpu time ... Wrong terminology. Think "has been assigned" .04% CPU - a value you'll find in app_info.xml, updated from a rather arbitrary decision I made three or four years ago. It works well enough not to have been challenged in all that time, at least for one modest GPU and a Q6600. But if you really want to squeeze it until the pips squeak - not something usually requested by a scientific research project - you have to re-research every assumption from the ground up, including that one. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Just a tip. Try the new version for AP crunching with Nvidia GPU´s avaiable from Beta, you will notice a lot of decrease in the CPU usage. That changes everything. Thake a look at the CPU times on my ready crunched AP WU. |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
Well, I free up one core (I7 for two GTX760s) and my overall cpu runs at less than 100% (according to task man or perfmon), so I was assuming freeing up one core was enough. I haven't tried the beta AP, which as Juan (and others) say frees-up some of the CPU. That's almost certainly the best solution. BUT... it isn't really a matter of your CPU running less than 100%. It doesn't run during "wait-states" and it waits to issue or receive instructions. So that 20% of "headroom" you see may be the CPU "waiting." I can't swear anything about an i7. I don't have one. I tried that whole "try to get the resources to show me almost 100% usage" thing and on my multi-core AMD FX and Phenoms if I use more than a peak 60-70% work units start slowing down (and it is different for the Phenom IIs and the FX processors and differs among FX processors, as well). There could be all kinds of reasons --- and that ten or twenty extra MB tasks I might be able to get if I were "clever" wasn't worth the bother. Yeah, we used to call it "thrashing the hard drive" when it was reading and writing to the swap file continuously. That's one reason I really liked OS/2 2.1 and did not like OS/2 Warp. Even SCSI drives on Adaptec caching controllers were too slow to make a "swap file" a practical solution to the high cost of RAM. I'm not sure how much the "assignment" has to do with how much CPU is really consumed. If you go in and "assign" .75 CPUs, I'm not sure it uses any more or less than if you assign it .001 CPUs. What using 75% will do is cause two GPU units to force two CPU units not to run. You guys can just ignore me if you want-to, but the way to tweak this is to run no CPU work, then start adding CPU cores one at a time until you see your GPU work slow down, then back off one core. I think you'll be surprised by how quickly that happens with these Open CL applications. It didn't used to be like this and you could run more stuff simultaneously. As always, I'm just trying to be helpful based solely on my own experiences keeping multiple triple, double, and single GPU rigs with anywhere from one P4 core with hyper-threading, an old dual core, and several 4, 6, and one 8 core machine running everything from XP to Windows 8... If you know I'm wrong, just know I'm wrong with the best of intentions and ignore me. |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
I was looking at this because I, too, am trying to figure out the most efficient setup. Ok... no. That's not the way it works. .04 "cores" "reserved" does not really mean .04 cores used. You can't make it use more of a CPU core by reserving .5 "cores" per work unit. You can "reserve" .5 cores so that any three GPU work units will sum to 1.5 cores "reserved" and, therefore, BOINC (not the crunching app) will tell your machine not to use two of your CPU cores to run CPU work. They are "reserved" for GPU work. THE most efficient setup is to run as many work units on any given GPU as that GPU will run without slowing down. For most of my cards that's three, although some run best with two and the lesser cards run no better or worse (to speak of) whether I run one or two at a time.
If you get all of those answers figured-out, I'd love to know. I've suspected everything from L2 and L3 cache sizes to multipliers to raw clocking speed to RAM speeds.
In their defense, I think they are like the lab these days; Too busy to stop and tell us what they are doing. Yeah, I can't straighten this out with authority, but I think you'll find the crunching application (the worker) is not bound by the number of BOINC's "assigned" cores. BOINC(the management)just uses that to figure-out how many "workers" they can employ at any given time. That is to say, the "assigned" cores are "reserving" CPU only in the sense that BOINC (the manager, not the crunching program)won't simultaneously "allow" CPU tasks to run on two cores if all of the "assignments" for GPUs add-up to more than one. And now I am given to believe that different steps in processing every work unit are requiring differing amounts of CPU resources as that work unit runs, not to mention the variations in the work done on individual work units. How would you (royal "you") know when the crunching applications have accidentally reached their maximum load on the CPU at the same time? You'd have to get lucky and be watching when it happened. |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
As I am just learning on how to modify settings and files I will ask that you bear with me. The other night when the OP posted his screen shot I was at work and saw that he was running 8 cores and 2 work units on his two GPU's. The IT guys where I work dont mind us looking on the web during breaks or lunch But would be very P.O.'d if I logged on to the site. The OP intitally complained of work units waiting to run. I did a test which You probally read That the only way I could do that was to drop another core from crunching. When the work unit was done the waiting to run then started. Now my guess is the OP is doing what tbret suggested. He is running work units on 8 cores and 2 each work units on each card. So to me that would explain why he is seeing waiting to run. He has to much going on at the same time to crunch. Now I just had a look at my one machine that is crunching a AP and a MB on my GPU. What im seeing according to BOINC is the MB is using 0.04 CPU's and 0.47 GPU's. For the AP it says 0.67 CPU's and 0.051 GPU's. I am running 7 cores. So for my I7 3770 and my GTX 550 Ti freeing one core seems to work for the CPU, But I dontn know what a GPU core is. (If anyone can explain that) Plus my app_info file that Richard posted at our team website, Lets me run only one AP and one MB or two MB's at a time on my card. Id say the OP needs to free up two cores to feed his GPU's. Maybe even three. I hope I havent confused matters more. Im still very confused learing this stuff. [/quote] Old James |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
As Richard said, those arbitrary CPU numbers displayed by BOINC actually don't mean much. If you want to see the actual process usage, download and run SIV. From the main window right click on the 'Windows' button and select 'BOINC Status'. There will be a CPU graph for each BOINC process. Mouse over the graph and the breakdown of Usage will appear. You will see actual usage varies from moment to moment. Of course, things will change dramatically when the AP App begins one of those mislabeled APs that should be labeled as a CPU task verses a GPU task. IMHO, any task that uses over 50% CPU should be labeled as a CPU task. Actually, I think any AP blanked over about 40% should be labeled a CPU task and sent to the CPUs, but, that's just me ;-) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.