Linux CUDA 'Special' App finally available, featuring Low CPU use

Author	Message
Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1885988 - Posted: 25 Aug 2017, 14:40:10 UTC - in response to Message 1885984. and since users normally have 3 or 4 GPUs you end up downloading Hundreds of tasks at a time. I usually download close to 300 GPU tasks at a time on my 3 GPU machine, which means it doesn't take long to fill a cache. It takes a bit longer on the two GPU machine, but, you don't have to download as many tasks ;-) Doesn't work for me that way at all. The MAXIMUM tasks I have ever downloaded at one time is the grand total of 22. More normal are downloaded bunches of 14 or 16. It takes me easily over an hour to replenish the cpu cache after rescheduling. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1885988 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1886000 - Posted: 25 Aug 2017, 15:17:49 UTC - in response to Message 1885988. A number of people have stated they also get send around a hundred tasks in one download. All I can say is try it and see what you get. If you have 300 GPU tasks, suspend around 250, Stop BOINC, assign them to your CPU, then wait until the 5 minutes is up to where when you start BOINC it will immediately download new tasks, and see how many it sends. I get the full 250...at once. Did you enter that line in your cc_config file? <max_file_xfers_per_project>8</max_file_xfers_per_project> It works better 8 at a time instead of 2 at a time. The default is to transfer 2 tasks at a time, http://boinc.berkeley.edu/wiki/Client_configuration When you get enough to last 12 Hours, reassign them back to the GPUs. Hopefully you can remember that the first CPU tasks in the Manager are usually the Real CPU tasks and leave them assigned to the CPUs. The newer tasks are the ones that go back to the GPUs. ID: 1886000 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1886007 - Posted: 25 Aug 2017, 15:41:20 UTC - in response to Message 1886000. I know that many others receive hundreds or at least tens of task per download. I have never and have always been envious of others that seem to have the golden finger. I'm positive I have tried the cc_config setting you mention. But I once again have set it to 8 and will see it if has any effect. What's the definition of insanity .... doing the same thing exactly the same way and expecting a different result. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1886007 ·

JohnDK Volunteer tester Send message Joined: 28 May 00 Posts: 1222 Credit: 451,243,443 RAC: 1,127	Message 1886008 - Posted: 25 Aug 2017, 15:41:41 UTC It's seems very difficult and it seems it would take quite some time, I have to reboot every time I exit BOINC. Petri have somehow found a way to fool the severs into thinking you have many more GPUs installed than you really have, much more clean metode. ID: 1886008 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1886098 - Posted: 25 Aug 2017, 22:45:07 UTC - in response to Message 1885984. ...and wait for new CPU tasks to download (which may take a while if you exceeded the daily limit :( ...it works well with slow graphics card (as GTX 1050 Ti) but require to restart the procedure as downloading only 100 new tasks is not enough for fast graphic card to be able to compute for the next 12 hours.... Hmmm, this sounds similar to what the Rescheduler does, download mostly CPU tasks, run them on the GPU, which ends up destroying your CPU's APR to the point a CPU task will error out when run on a CPU. Anyway you can change it to download GPU tasks so it ends up running GPU tasks on the GPU? That's the way my method works, and since users normally have 3 or 4 GPUs you end up downloading Hundreds of tasks at a time. I usually download close to 300 GPU tasks at a time on my 3 GPU machine, which means it doesn't take long to fill a cache. It takes a bit longer on the two GPU machine, but, you don't have to download as many tasks ;-) . . Well I haven't had any tasks error out on the CPU because of it (nor at all) so I am not sure that is a thing. But I would prefer to move tasks to the CPU Q and refill the GPU queue because as you say, it will take more tasks in each hit, so it will fill better, but that requires the app to be able to move tasks in both directions. The rescheduler I have for Linux (CPU2GPU) only works in the one direction. Stephen <shrugs> ID: 1886098 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1886100 - Posted: 25 Aug 2017, 22:49:02 UTC - in response to Message 1885988. and since users normally have 3 or 4 GPUs you end up downloading Hundreds of tasks at a time. I usually download close to 300 GPU tasks at a time on my 3 GPU machine, which means it doesn't take long to fill a cache. It takes a bit longer on the two GPU machine, but, you don't have to download as many tasks ;-) Doesn't work for me that way at all. The MAXIMUM tasks I have ever downloaded at one time is the grand total of 22. More normal are downloaded bunches of 14 or 16. It takes me easily over an hour to replenish the cpu cache after rescheduling. . . I was finding that on one of my rigs while the other 2 were doing OK. It turned out to be because the free disk space was shrinking and BOINC was limiting the available space for downloads and therefore the download numbers. Check that you have plenty of free space and that enough of it is allocated to BOINC that it can d/l freely. Stephen . ID: 1886100 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1886103 - Posted: 25 Aug 2017, 22:54:48 UTC - in response to Message 1886008. Last modified: 25 Aug 2017, 22:59:01 UTC It's seems very difficult and it seems it would take quite some time, I have to reboot every time I exit BOINC. Petri have somehow found a way to fool the severs into thinking you have many more GPUs installed than you really have, much more clean metode. . . Yes I would like to be able to do that as well, just have the servers send the numbers the rig needs to keep going instead of fiddling about. But you need to be able to re-write and recompile the BOINC code to do that .... :( . . Also, are you running Linux and the repository version of BOINC? Because there is a script you can run that restarts the BOINC client as a service without rebooting the system. It has saved me a lot of irritation on my rig that is running the above configuration. It is a small simple script. . . You could even just issue the commands form the terminal each time. Stephen . Stephen :( ID: 1886103 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1886118 - Posted: 26 Aug 2017, 0:12:26 UTC - in response to Message 1886100. . . I was finding that on one of my rigs while the other 2 were doing OK. It turned out to be because the free disk space was shrinking and BOINC was limiting the available space for downloads and therefore the download numbers. Check that you have plenty of free space and that enough of it is allocated to BOINC that it can d/l freely. Stephen I think 10GB is big enough for BOINC to download into. I only ever ran into that problem when I let Einstein download willy-nilly. Certainly has never been the case for SETI. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1886118 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1886120 - Posted: 26 Aug 2017, 0:25:07 UTC - in response to Message 1886098. Last modified: 26 Aug 2017, 0:54:23 UTC . . Well I haven't had any tasks error out on the CPU because of it (nor at all) so I am not sure that is a thing. But I would prefer to move tasks to the CPU Q and refill the GPU queue because as you say, it will take more tasks in each hit, so it will fill better, but that requires the app to be able to move tasks in both directions. The rescheduler I have for Linux (CPU2GPU) only works in the one direction. This machine is getting Very close to giving errors on CPU tasks, https://setiathome.berkeley.edu/host_app_versions.php?hostid=8280801 SETI@home v8 (anonymous platform, CPU) = Average processing rate 626.74 GFLOPS From the other CPU Apps the Normal APR would be around, Average processing rate 33.20 GFLOPS Once the APR gets higher than around 20x normal, the CPU tasks will Time Out before finishing. 20 x 33.2 = 664 It is now at 626, which is getting Close. That happens when you run CPU tasks on your GPU. ID: 1886120 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1886124 - Posted: 26 Aug 2017, 0:45:31 UTC - in response to Message 1886103. It's seems very difficult and it seems it would take quite some time, I have to reboot every time I exit BOINC. Petri have somehow found a way to fool the severs into thinking you have many more GPUs installed than you really have, much more clean metode. . . Yes I would like to be able to do that as well, just have the servers send the numbers the rig needs to keep going instead of fiddling about. But you need to be able to re-write and recompile the BOINC code to do that .... The problem is people would set their cache to the Highest value 24/7 instead of just on Tuesday mornings. IF they would just use the higher values for 12 hrs a week, it wouldn't be that bad, but they won't. Because of that, I doubt you will ever see something that will automatically set the cache higher. So, you'll just have to settle for something you have to configure once a week, because it's a pain, people will probably only configure it once a week, which is better for the servers than 24/7. I don't have any trouble stopping and starting BOINC on my machines, probably due to running Ubuntu with the BOINC folder in my Home folder. It works for me. ID: 1886124 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1886131 - Posted: 26 Aug 2017, 1:09:12 UTC - in response to Message 1886120. Good information to know. I didn' know of the CPU limit of 20X. I guess I have no worries then since the highest I have ever seen on the CPU APR for any of my machines is under 100. And I have used the Rescheduler on all my Windows machines, even the new Ryzen machine which is very fast on CPU tasks compared to the FX machines. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1886131 ·

Bruce Volunteer tester Send message Joined: 15 Mar 02 Posts: 123 Credit: 124,955,234 RAC: 11	Message 1886146 - Posted: 26 Aug 2017, 2:47:47 UTC Hi Guys, Still seem to be getting a lot of invalids, though not the flood I was getting before. It looks like most of them are late -9 overflows and all appear to be guppies. Does the Cuda app 3t2b app have a problem with guppies? Is there something I can do to decrease the number of invalids? Switched to the actual Cuda 6.5 libraries hoping to make it more stable. It seemed to help some. Downloaded the Cuda Toolkit 6.5.19, extracted the two cuda files with DPKG, renamed them and replaced the ones in the SETI directory. Anything else I can do? Appreciate all the help. *Bruce* ID: 1886146 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1886150 - Posted: 26 Aug 2017, 3:45:15 UTC - in response to Message 1886146. That looks suspiciously like what I ran into when I first started using the 8.0 Special App and using a GTX 780. See Message 1864874. Same symptoms and also almost all Guppis. For me, switching to 6.5 dramatically reduced the frequency, but did not totally eliminate the problem. It didn't completely go away until I replaced the GTX 780 with a GTX 980. Petri knows the problem is there. He's mentioned it in a few of his posts, but I don't know if he's still trying to track it down. ID: 1886150 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1886156 - Posted: 26 Aug 2017, 5:34:49 UTC - in response to Message 1886131. Good information to know. I didn' know of the CPU limit of 20X. I guess I have no worries then since the highest I have ever seen on the CPU APR for any of my machines is under 100. And I have used the Rescheduler on all my Windows machines, even the new Ryzen machine which is very fast on CPU tasks compared to the FX machines. There is a couple of ways around that 20x Time Limit for tasks that I have found and posted 2 messages Here about it. Both work. I don't reschedule enough to need it now, but I posted the details before in case anyone else runs into that problem. ID: 1886156 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1886157 - Posted: 26 Aug 2017, 5:45:36 UTC - in response to Message 1885931. Petri send me another procedure which seems to me more easier .... I wrote a small script to do the job. If anyone is interested in ....it's a perl script. It works fine for me but need to be tested as your app_info may differ than mine (it produce ghost for some) You need to stop boinc, launch the script, restart and wait for new CPU tasks to download (which may take a while if you exceeded the daily limit :( ...it works well with slow graphics card (as GTX 1050 Ti) but require to restart the procedure as downloading only 100 new tasks is not enough for fast graphic card to be able to compute for the next 12 hours.... . . Hi Laurent, . . If that is CPU2GPU then I am using that and it works well enough, but it has two limitations that I find irksome. It only moves tasks in one direction as the name says and it is not selective. It moves all or none. But it does allow me to stow some extra tasks to try and get through the outage without work starvation. I think even Stephen has figured out how to run Arecibo vlars on his GPU. I would be amazed if Stephen could figure out what you just described. . . OK guys could we lay off on the Stephen references? I am developing a complex here ... Stephen :) ID: 1886157 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1886159 - Posted: 26 Aug 2017, 5:49:28 UTC - in response to Message 1885988. Doesn't work for me that way at all. The MAXIMUM tasks I have ever downloaded at one time is the grand total of 22. More normal are downloaded bunches of 14 or 16. It takes me easily over an hour to replenish the cpu cache after rescheduling. I have come to the conclusion that it has something to do with your resource shares between the other projects that you always seem to have on your computers. I know we have gone though all your cc settings and what we could think of before with no changes for you. When I reschedule CPU ->> GPU 99.8% of the time I get the full cache restored in one shot, which is 90-94 tasks, depending on the computer. So try removing your other projects ... or set them to 0 priority, and no new tasks - they don't seem to affect me that way. ID: 1886159 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1886160 - Posted: 26 Aug 2017, 5:49:33 UTC - in response to Message 1886118. I think 10GB is big enough for BOINC to download into. I only ever ran into that problem when I let Einstein download willy-nilly. Certainly has never been the case for SETI. . . Is that what BOINC is showing when you select the Disk tab? Because that should definitely be more than enough. Stephen ? ID: 1886160 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1886164 - Posted: 26 Aug 2017, 6:09:22 UTC - in response to Message 1886120. [quote]. . Well I haven't had any tasks error out on the CPU because of it (nor at all) so I am not sure that is a thing. But I would prefer to move tasks to the CPU Q and refill the GPU queue because as you say, it will take more tasks in each hit, so it will fill better, but that requires the app to be able to move tasks in both directions. The rescheduler I have for Linux (CPU2GPU) only works in the one direction. This machine is getting Very close to giving errors on CPU tasks, https://setiathome.berkeley.edu/host_app_versions.php?hostid=8280801 SETI@home v8 (anonymous platform, CPU) = Average processing rate 626.74 GFLOPS From the other CPU Apps the Normal APR would be around, Average processing rate 33.20 GFLOPS Once the APR gets higher than around 20x normal, the CPU tasks will Time Out before finishing. 20 x 33.2 = 664 It is now at 626, wh . . Hi TBar, . . I had a look and the average for V8 8.00 stock tasks is 40 (39.26) so by your formula I should be safe up to about 800 for anonymous platform, a little bit of a margin. In the meantime I need to upgrade the CPU app from SSSE3 to AVX and make sure I run some tasks through the CPUs as well as the ones I move to the GPU to bring that average down a bit. . . I don't know exactly which figures the servers use to estimate the maximum tolerable run time but the closer I can get the extremes the safer it should be. . . I only really crunch any tasks on the CPU when I am using that Q as a gathering point for the outage extra rations. Stephen ?? ID: 1886164 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1886165 - Posted: 26 Aug 2017, 6:20:16 UTC - in response to Message 1886124. It's seems very difficult and it seems it would take quite some time, I have to reboot every time I exit BOINC. Petri have somehow found a way to fool the severs into thinking you have many more GPUs installed than you really have, much more clean metode. . . Yes I would like to be able to do that as well, just have the servers send the numbers the rig needs to keep going instead of fiddling about. But you need to be able to re-write and recompile the BOINC code to do that .... The problem is people would set their cache to the Highest value 24/7 instead of just on Tuesday mornings. IF they would just use the higher values for 12 hrs a week, it wouldn't be that bad, but they won't. Because of that, I doubt you will ever see something that will automatically set the cache higher. So, you'll just have to settle for something you have to configure once a week, because it's a pain, people will probably only configure it once a week, which is better for the servers than 24/7. I don't have any trouble stopping and starting BOINC on my machines, probably due to running Ubuntu with the BOINC folder in my Home folder. It works for me. . . OOooohhh I knnoooww! . . If someone develops a handy tool to tweak things above the normal guidelines there will always be 'people' who will try and rort it and end up stuffing things up. But for that one day of the week it would be sooo nice. I don't even have a problem with the BOINC 1000 limit, just the 100 limits. But then I don't have a monster crusher like Petri :) . . I would like to be able to develop a streamlined re-scheduler though that can move tasks in both directions and allow selective control over which tasks are moved. But I am not a programmer. My attempts to refine Stubbles' script petered out because the structure of the Linux client_state.xml is different from the windows version and I was unsure of which lines needed to be changed, and as Laurent mentioned, it probably needs to be done in two (or more) passes and that increases the complexity manifold. Maybe one day ... Stephen :) ID: 1886165 ·

Bruce Volunteer tester Send message Joined: 15 Mar 02 Posts: 123 Credit: 124,955,234 RAC: 11	Message 1886169 - Posted: 26 Aug 2017, 7:00:25 UTC - in response to Message 1886150. That looks suspiciously like what I ran into when I first started using the 8.0 Special App and using a GTX 780. See Message 1864874. Same symptoms and also almost all Guppis. For me, switching to 6.5 dramatically reduced the frequency, but did not totally eliminate the problem. It didn't completely go away until I replaced the GTX 780 with a GTX 980. Petri knows the problem is there. He's mentioned it in a few of his posts, but I don't know if he's still trying to track it down. Jeff, Must be the same problem, my Titan Z's are the same 700 series family as your 780's. Both are still good, strong cards, it's to bad that they have problems running the new Cuda apps because of a bug. @Petri It sounds like the bug is carrying through in all newer versions. Don't know what your time constraints are, but myself and many others who have the 700 series cards, would certainly appreciate it if you could hunt down the problem with the app so that we could move up to the newer Cuda 8.0 versions. Your cuda optimizations are great, but it is going to kinda suck if everybody can't use them. I'm new to Linux, but if I can help with testing, let me know. Thanks Guys. *Bruce* ID: 1886169 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.