Linux CUDA 'Special' App finally available, featuring Low CPU use

Author	Message
Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1888205 - Posted: 6 Sep 2017, 21:19:57 UTC - in response to Message 1888139. Stephen, Sorry if I missed what CPU you are using. But you have a problem that you need to sort, you can either stick a band-aid on it by playing with things in client_state.xml, or you can fix it on a more permanent basis by letting a good few CPU tasks run to completion on the CPU, and, while doing so stop rescheduling. Think about it - if rescheduling were really such a good thing Brent's computer with its GPUs overclocked quite hard (10% I think) and running in P0, would be far more than 10k ahead of mine which runs stock setting and P?? (as decided by the thermal load). Yes, I do loose a bit during the weekly outrage, but overall I'm about 5% behind him, and mine just sits there without any interference from me. . . For the short term I am going to work on the theory that d/l'ing CPU tasks and running them on the GPU is the main issue. By d/l'ing only GPU tasks and stashing them temporarily in the CPU Q to supplement the GPU cache during the outage will not cause the same issues (I think that was the gist of TBar's recommendation). I shall see just what that accomplishes. Stephen ?? ID: 1888205 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1888231 - Posted: 6 Sep 2017, 23:40:17 UTC - in response to Message 1888205. Actually there isn't any theory involved, or even magic. Just a basic understanding of how BOINC works. It's a Fact BOINC uses the Average Processing Rate to calculate an Apps Estimated Runtimes. It's a fact the higher the APR the shorter the estimated runtimes. It's also a fact that if you artificially manipulate an Apps APR you will obtain values the device is not capable of. The rest is just a logical result of the facts. It would help if you could affect the Average with more CPU results, otherwise, your going to have to affect it with fewer GPU results. Obviously, BOINC wasn't designed to work the way you want to use it, it was designed to work the way most people use it, which works for those not artificially manipulating the results. ID: 1888231 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1888234 - Posted: 6 Sep 2017, 23:54:39 UTC - in response to Message 1888231. Actually there isn't any theory involved, or even magic. Just a basic understanding of how BOINC works. It's a Fact BOINC uses the Average Processing Rate to calculate an Apps Estimated Runtimes. It's a fact the higher the APR the shorter the estimated runtimes. It's also a fact that if you artificially manipulate an Apps APR you will obtain values the device is not capable of. The rest is just a logical result of the facts. It would help if you could affect the Average with more CPU results, otherwise, your going to have to affect it with fewer GPU results. Obviously, BOINC wasn't designed to work the way you want to use it, it was designed to work the way most people use it, which works for those not artificially manipulating the results. . . So I am getting the impression that you and Rob are kind of anti re-scheduling ... ?? Stephen :) ID: 1888234 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1888235 - Posted: 7 Sep 2017, 0:37:29 UTC - in response to Message 1888234. Basically we are anti-illogical actions. There isn't any need to run CPU tasks on you GPUs, most of the time. It's entirely possible to run the tasks assigned to the correct devices with a little planning, and it avoids Known problems. ID: 1888235 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1888242 - Posted: 7 Sep 2017, 0:59:34 UTC I have been hesitant to post to the current conversation. But I can make the case that there are times when a little judicious rescheduling is warranted. Mostly in the desire for efficiency. I can make the case that less power is used by processing a Arecibo shorty on the GPU versus the CPU. I can also make the case that processing a BLC VLAR is more efficient on the CPU than a Arecibo standard AR. That situation is even more warranted with the Ryzen system since it has an abundance of CPU cores and is very efficient with BLC CPU tasks because of its more efficient use of AVX pathways. Compared to my FX systems. I reschedule on all my systems and almost always it is a 1 for 1 swap. CPU<>GPU. I haven't run into a case yet of a task timing out because of an artificially elevated APR. As I said, you need to use rescheduling judiciously. You can't just constantly move tasks one way or you will end up in Stephen's condition. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1888242 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1888278 - Posted: 7 Sep 2017, 4:17:56 UTC - in response to Message 1888139. Last modified: 7 Sep 2017, 4:18:30 UTC Think about it - if rescheduling were really such a good thing Brent's computer with its GPUs overclocked quite hard (10% I think) and running in P0, would be far more than 10k ahead of mine which runs stock setting and P?? (as decided by the thermal load). Yes, I do loose a bit during the weekly outrage, but overall I'm about 5% behind him, and mine just sits there without any interference from me. Just for the record my 10xx cards are not overclocked or overanything'd - bone stock, my 980 is, and my 750's are tweaked a touch since they are already OC'd.. The last 2 weeks I have completely missed the pre maintenance loading since I forgot to, on top of that both weeks I lost internet for an additional 8 hours and have been running every single computer completely dry :(( Ohh well 'IT' happens. I mostly reschedule to try and keep my CPUs running BLC/VLARs but I think that makes very little difference for RAC compared to what the GPUs do on their own. ID: 1888278 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1888286 - Posted: 7 Sep 2017, 4:58:00 UTC - in response to Message 1888234. So I am getting the impression that you and Rob are kind of anti re-scheduling ... ?? As a concept - no. In moderation - possibly Over used - yes. Over used - when you end up causing BOINC to start to cut off your work, cause your system to error-out tasks due to BOINC having a very invalid value for one of it main control parameters, when it could potentially adversely affect the way credit is calculated (two of the feed values for the calculation of credit are APR & run time). Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1888286 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1888299 - Posted: 7 Sep 2017, 5:45:43 UTC The kitties just runs 'em as they gets 'em. Meow. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1888299 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1888301 - Posted: 7 Sep 2017, 5:52:00 UTC - in response to Message 1888278. I lost track of the days this holiday weekend and Tuesday snuck up on me. I was out of work before 9AM on the linux box. All systems were bone dry of GPU work by noon. I also got greedy with a overclock on the gpus in the linux box and managed to produce 10 hours of inconclusives and invalids. Started back up the Windows systems with Einstein and MilkyWay. My RACs are plummeting. $@^! happens. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1888301 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1888379 - Posted: 7 Sep 2017, 13:36:58 UTC I find this interesting. On Linux the GPUs scale quite well. On widows they seem to do so. But when the data is combined You find something else. Top GPU models The following lists show the most productive GPU models on different platforms. Relative speeds, measured by average elapsed time of tasks, are shown in parentheses. NVIDIA Total (1.000) GeForce GTX 1080 Tu (0.248) GeForce GTX 1070 (0.244) GeForce GTX 980 Ti (0.228) GeForce GTX 1080 Ti (0.207) GeForce GTX 1060 3GB (0.190) GeForce GTX 1080 (0.168) GeForce GTX 970 (0.153) GeForce GTX 780 (0.147) GeForce GTX 980 (0.145) GeForce GTX 760 (0.130) GeForce GTX 1050 (0.124) GeForce GTX 1060 6GB (0.114) GeForce GTX 1050 Ti (0.112) GeForce GTX 770 (0.111) GeForce GTX 960 (0.111) GeForce GTX 660 Ti (0.102) GeForce GTX 950 (0.101) GeForce GTX 750 Ti (0.099) GeForce GTX 660 (0.079) GeForce GTX 460 (0.057) GeForce GTX 650 Windows (1.000) GeForce GTX 980 Ti (0.980) GeForce GTX 1070 (0.941) GeForce GTX 1080 Ti (0.758) GeForce GTX 1080 (0.684) GeForce GTX 970 (0.628) GeForce GTX 780 (0.592) GeForce GTX 760 (0.573) GeForce GTX 980 (0.508) GeForce GTX 1060 6GB (0.457) GeForce GTX 960 (0.455) GeForce GTX 660 Ti (0.452) GeForce GTX 770 (0.451) GeForce GTX 1050 (0.422) GeForce GTX 750 Ti (0.419) GeForce GTX 950 (0.414) GeForce GTX 1050 Ti (0.366) GeForce GTX 660 (0.285) GeForce GTX 460 Linux (1.000) GeForce GTX 1080 Tu (0.715) GeForce GTX 1070 (0.537) GeForce GTX 1080 (0.503) GeForce GTX 980 (0.270) GeForce GTX 1060 3GB (0.253) GeForce GTX 980 Ti (0.195) GeForce GTX 970 (0.180) GeForce GTX 1050 (0.155) GeForce GTX 660 (0.140) GeForce GTX 670 (0.139) GeForce GT 640 (0.129) GeForce GTX 1050 Ti (0.123) GeForce GTX 460 (0.109) GeForce GTX 960M (0.103) GeForce GTX 650 (0.077) GeForce GTX 750 Ti (0.075) GeForce GTX TITAN Black (0.044) GeForce GT 730 (0.039) Quadro K620 Mac (1.000) GeForce GTX 1070 (0.876) GeForce GTX 1050 Ti (0.452) GeForce GTX 750 Ti (0.366) GeForce GTX 680 (0.304) GeForce GTX 775M (0.290) GeForce GTX 680MX (0.241) GeForce GTX 780M (0.139) GeForce GT 755M (0.128) GeForce GTX 660M (0.110) GeForce GT 640M (0.089) GeForce GT 650M (0.062) GeForce GT 750M To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1888379 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1888387 - Posted: 7 Sep 2017, 14:16:40 UTC This is possibly due to the Windows boys running multiple tasks and the Linux boys running your magic sauce Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1888387 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1888391 - Posted: 7 Sep 2017, 14:43:07 UTC - in response to Message 1888387. This is possibly due to the Windows boys running multiple tasks and the Linux boys running your magic sauce Yup, 2.5 times runtime --> 0.4 performance and two at a time 2 times runtime 0.5 performance. 0.4 * 0.5 = 0.2 and slightly more since some run on Linux. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1888391 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1888429 - Posted: 7 Sep 2017, 17:53:55 UTC - in response to Message 1888286. ...when it could potentially adversely affect the way credit is calculated (two of the feed values for the calculation of credit are APR & run time). When this subject was raised in June of last year, several other experienced users posted to the effect that the theory of adverse credit impact from rescheduling had been pretty well debunked. At that time I had just recently started doing VLAR rescheduling on my own machines and was able to put together some hard data that also seemed to confirm that, as I said at the time, there was "...no more impact to the credits than is caused by the random number generator that assigns them in the first place." See my post in Message 1799300. I've seen nothing in the last 14 months to change that assessment. If you have hard data to support a different conclusion, you need to present it. In addition, as Keith noted, Guppi VLARs actually run more efficiently on CPUs than do Arecibo tasks with normal ARs, while the reverse is true for GPUs (at least on the NVIDIA cards that I have). The end result, at least in Windows, is about a 5-6% boost in overall throughput and productivity by keeping the CPUs busy with a steady diet of VLARs, while swapping all non-VLARs to the GPUs. The GPUs still end up crunching an awful lot of Guppi VLARs, but the overall improvement is quite noticeable. For me, if I can increase the number of tasks that my machines process in a given period of time, I'm certainly going to want to do that. It provides the project with more results for the same amount of donated electricity. Since switching my main crunch-only machines to Linux, I haven't been doing that sort of VLAR-specific rescheduling, inasmuch as I haven't had time to do the same sort of analysis. However, I just started to do that last week on one of my boxes and it appears that the conclusions reached in Windows still hold, though perhaps not with as wide a performance differential. So, it's highly likely that I'll resume daily VLAR rescheduling very soon. ID: 1888429 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1888454 - Posted: 7 Sep 2017, 19:53:27 UTC You continue with your assumption that the creditNew function works as you THINK not as the code says. If you end up, as Stephen has done, with a totally screwed APR you will have an adverse on the calibration of the APR input to the calculation. Pushing it up erroneously high, as has happened to Stephen will cause the calculation to re-calibrate in a the wrong direction, and thus reduce the credit estimate for the computer concerned on every work unit. When the credit is awarded the lower of the two credit estimates is converted to the awarded credit for both. parties. In your case, you appear to be achieving a fairly reasonable balance between GPU-to-CPU and CPU-to-GPU reschedules, thus are having a fairly small, probably insignificant, impact on the CPU APR. However if you just do a "one way traffic" re-shcedule the impact is quite dramatic - just look at the APR for Stephen's i5 and compare it the APR for your own CPUs, and his GPUs - it is approaching that of a GPU. This will impact on the way the calibrator works, and reduce the potential credit estimation for every task he runs. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1888454 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1888471 - Posted: 7 Sep 2017, 21:22:54 UTC - in response to Message 1888454. You continue with your assumption that the creditNew function works as you THINK not as the code says. I wasn't making any assumptions. I was reporting actual real-world results. That's why I suggested you should do the same if you wish to actually support your argument. However if you just do a "one way traffic" re-shcedule the impact is quite dramatic Yes, that was the issue with Stephen's rescheduling. It appeared to be almost entirely from CPU to GPU. Obviously it affected the estimated run times for his tasks. Whether or not it affected the credit granted is a subject for someone else to analyze. I only focused on ensuring that my own rescheduling didn't materially impact anyone else's credits. My goal is simply improved productivity. ID: 1888471 ·

W3Perl Volunteer tester Send message Joined: 29 Apr 99 Posts: 251 Credit: 3,696,783,867 RAC: 12,606	Message 1888473 - Posted: 7 Sep 2017, 21:27:34 UTC - in response to Message 1888379. (1.000) GeForce GTX 1080 Tu (0.248) GeForce GTX 1070 Petri number one ! You're the only one to have a 1080 Tu ;) ID: 1888473 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1888479 - Posted: 7 Sep 2017, 22:17:33 UTC - in response to Message 1888471. However if you just do a "one way traffic" re-shcedule the impact is quite dramatic Yes, that was the issue with Stephen's rescheduling. It appeared to be almost entirely from CPU to GPU. Obviously it affected the estimated run times for his tasks. Whether or not it affected the credit granted is a subject for someone else to analyze. I only focused on ensuring that my own rescheduling didn't materially impact anyone else's credits. My goal is simply improved productivity. . . As I have explained to them before. Despite their description of "constant re-scheduling" my situation came about as a result of 2 occasions of rescheduling in one direction, BECAUSE I only had a unidirectional re-scheduling tool. Those were over 2 consecutive weeks to survive the outrages. Not to my liking but you work with what you have. Laurent has kindly improved his (and Petri's) app so now it works in both directions and that issue is resolved. BUT, the problem is with the continuing distortion in the numbers for that rig. Again as I have previously explained I ran Arecibo VLAR tasks on the CPU for several days completing some 50 or 60 tasks taking an hour each and yet it does not seem to have made any improvement in that imbalance. . . When the weather gets a bit hotter, and I need to remove the 1050 for better cooling, I will be able to begin a programme of crunching tasks on the CPU to try and redress this problem but the indications are that it will take quite a while. . . In the meantime I am not seeing the dire problem of a catastrophic decrease in credit awards for the tasks being processed that they are so convinced will occur. Maybe later on it will manifest itself, maybe not. Stephen :( ID: 1888479 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1888493 - Posted: 7 Sep 2017, 23:00:55 UTC - in response to Message 1888479. You might be interested in a post by Joe Segur from several years ago in a thread I had started concerning the volatility of the APR values I was seeing when I was trying to find some stable yardstick to assess changes I had made in one of my configurations. The averaging is performed using "Brown's Simple Exponential Smoothing" with a 0.01 parameter. In effect each new sample is multiplied by 0.01 and added to the previous average multiplied by 1 - 0.01 (though the actual code uses a different form). That kind of exponential average only needs a single saved value. With that 0.01 value, if a user changes something which affects production it takes about 69 validated results from the new configuration for APR to adjust halfway to the productivity change, and over 300 to adjust within 5% of final. There's some other interesting stuff about the APR in that thread, as well, should you have the inclination to dig deeper into it. ID: 1888493 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1888497 - Posted: 7 Sep 2017, 23:39:07 UTC - in response to Message 1888493. Good info to know Jeff. Thanks for posting. APR was always "magic" to me as I never understood how it was calculated. I knew it wasn't a simple average or median smoothing. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1888497 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1888500 - Posted: 8 Sep 2017, 0:01:54 UTC - in response to Message 1888493. You might be interested in a post by Joe Segur from several years ago in a thread I had started concerning the volatility of the APR values I was seeing when I was trying to find some stable yardstick to assess changes I had made in one of my configurations. The averaging is performed using "Brown's Simple Exponential Smoothing" with a 0.01 parameter. In effect each new sample is multiplied by 0.01 and added to the previous average multiplied by 1 - 0.01 (though the actual code uses a different form). That kind of exponential average only needs a single saved value. With that 0.01 value, if a user changes something which affects production it takes about 69 validated results from the new configuration for APR to adjust halfway to the productivity change, and over 300 to adjust within 5% of final. There's some other interesting stuff about the APR in that thread, as well, should you have the inclination to dig deeper into it. . . well that confirms that, as it is doing 10 to 12 tasks per day, it will take a month of sundays to come close to repairing the APR for that rig. So I will stick to the plan and wait until I have to pull the 1050 from the cluster. Then I can give it one whole CPU core and do maybe 30 a day, that way it should be sorted out in a week or so. It will proabably be a couple of weeks before I do that but the final time frame will be about the same either way. . . Thanks Jeff . ID: 1888500 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.