Linux CUDA 'Special' App finally available, featuring Low CPU use

Author	Message
Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1865473 - Posted: 4 May 2017, 1:29:55 UTC . . @ TBar . . OK, I bit the bullet, being only a few files I checked for duplication and the only ones I found were app_info and the SSSe3 app. So I renamed the old app_info and did not copy the new SSSE3 app since the files names are identical I presumed they are the same. The good news, it didn't trash anything ... <breathing again> . . I always get nervous when ppl tell me something is simple ... but it was actually as simple as you said. . . I have noted the current numbers of inconclusives/valids but it will take a while before any changes will become evident and certainly before they become conclusive. I will let you know. Stephen ?? ID: 1865473 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1865475 - Posted: 4 May 2017, 1:43:29 UTC - in response to Message 1865310. Last modified: 4 May 2017, 1:48:05 UTC You should be able to look at your last download if you need to compare files. I really don't know what is in your folders. Just as before, any changes to your app_info.xml requires a BOINC restart and you should always Stop BOINC before changing Apps. Just a couple tests with a WU that is known to produce the Bad Best Pulse, Running on TomsMacPro.local at Thu May 4 00:07:49 2017 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 16fe08aa.12502.25021.6.33.13.wu Listing executable(s) in /APPS : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 16fe08aa.12502.25021.6.33.13.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 3565 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -unroll autotune -device 0 unroll limits: min = 1, max = 256. Using unroll autotune. 146.56 real 21.62 user 16.22 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 146 seconds Speed compared to default : 2441 % ----------------- Comparing results Result : Strongly similar, Q= 99.72% --------------------------------------------------- Done with 16fe08aa.12502.25021.6.33.13.wu. Done with Benchmark run! Removing temporary files! Running on TomsMacPro.local at Thu May 4 00:22:16 2017 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 16fe08aa.12502.25021.6.33.13.wu Listing executable(s) in /APPS : setiathome_x41p_zi3t1f_x86_64-apple-darwin_cuda80 setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 16fe08aa.12502.25021.6.33.13.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 3565 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3t1f_x86_64-apple-darwin_cuda80 -bs -unroll 8 -device 0 148.70 real 22.30 user 16.55 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 149 seconds Speed compared to default : 2392 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 0 0 0 0 0 0 0 0 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 0 0 0 0 0 0 0 0 0 Triplet 0 0 0 0 0 0 0 0 0 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 1 1 1 1 0 1 1 1 1 0 Best Pulse 0 0 0 0 1 0 0 0 0 1 Best Triplet 0 0 0 0 0 0 0 0 0 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 3 3 3 1 1 3 3 3 1 Unmatched signal(s) in R1 at line(s) 396 Unmatched signal(s) in R2 at line(s) 396 For R1:R2 matched signals only, Q= 99.98% Result : Weakly similar. --------------------------------------------------- Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -pfb 64 -unroll autotune -device 0 unroll limits: min = 1, max = 256. Using unroll autotune. 145.82 real 21.72 user 16.21 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 146 seconds Speed compared to default : 2441 % ----------------- Comparing results Result : Strongly similar, Q= 99.72% --------------------------------------------------- Done with 16fe08aa.12502.25021.6.33.13.wu. Done with Benchmark run! Removing temporary files! Running on TomsMacPro.local at Thu May 4 00:29:41 2017 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 16fe08aa.12502.25021.6.33.13.wu Listing executable(s) in /APPS : setiathome_x41p_zi3t1f_x86_64-apple-darwin_cuda80 setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 16fe08aa.12502.25021.6.33.13.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 3565 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3t1f_x86_64-apple-darwin_cuda80 -bs -unroll 8 -device 0 146.30 real 22.08 user 16.26 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 147 seconds Speed compared to default : 2425 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 0 0 0 0 0 0 0 0 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 0 0 0 0 0 0 0 0 0 Triplet 0 0 0 0 0 0 0 0 0 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 1 1 1 1 0 1 1 1 1 0 Best Pulse 0 0 0 0 1 0 0 0 0 1 Best Triplet 0 0 0 0 0 0 0 0 0 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 3 3 3 1 1 3 3 3 1 Unmatched signal(s) in R1 at line(s) 396 Unmatched signal(s) in R2 at line(s) 396 For R1:R2 matched signals only, Q= 99.98% Result : Weakly similar. --------------------------------------------------- Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -unroll autotune -device 0 unroll limits: min = 1, max = 256. Using unroll autotune. 145.72 real 21.71 user 16.21 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 146 seconds Speed compared to default : 2441 % ----------------- Comparing results Result : Strongly similar, Q= 99.72% --------------------------------------------------- Done with 16fe08aa.12502.25021.6.33.13.wu. It appears to be the same as before, Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -unroll autotune -device 0 unroll limits: min = 1, max = 256. Using unroll autotune. 146.56 real 21.62 user 16.22 sys Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -pfb 64 -unroll autotune -device 0 unroll limits: min = 1, max = 256. Using unroll autotune. 145.82 real 21.72 user 16.21 sys Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -unroll autotune -device 0 unroll limits: min = 1, max = 256. Using unroll autotune. 145.72 real 21.71 user 16.21 sys One run is less than a second faster using -pfb 64, the next run is faster without using it. Still Inconclusive, and without adding any benefits, it adds another variable. So, the WU that Fails with the previous build zi3t1f is successful using zi3t2b. Not only that, all the previous builds, back to zi3k+, have produced False Overflows with certain Low Angle range Arecibo tasks. Just another reason to use zi3t2b... ID: 1865475 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1865513 - Posted: 4 May 2017, 8:20:05 UTC - in response to Message 1865475. Last modified: 4 May 2017, 8:33:07 UTC You should be able to look at your last download if you need to compare files. I really don't know what is in your folders. Just as before, any changes to your app_info.xml requires a BOINC restart and you should always Stop BOINC before changing Apps. . . You mentioned the possibility of setting the -poll option and that it uses more CPU time. Would that be likely to cause problems if used with -bs removed? . . I have ceased CPU crunching so the CUDA80 app has all the resources of the Core2 Duo at its disposal. Should I wait or should I just suck it and see? [Edit] - BOINC/CUDA80 was using only 10% of CPU so I shut it down to remove -bs. But -bs is NOT in the command line in app_info.xml. On restart it is still only using about 10 to 12% of CPU :( I am as always confused ... Stephen ?? ID: 1865513 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1865514 - Posted: 4 May 2017, 9:08:36 UTC - in response to Message 1865513. It looks as though -poll doesn't work anymore. So, if you're determined to use 100% CPU per task, use Petri's App from your other thread. I'm currently trying to build an App for Beta, which means the default settings will have to work for what MOST people want. That means LOW CPU use and Default Autotune. Most people would Scream at an App that uses 100% CPU. The problem is making Autotune work without commandlines and dealing with the 1 GB vRam GPUs that can't use Autotune, they may have to go. Just look for the other download for zi3t2b, replace the current one with that one, and make sure it is named correctly in the app_info. If you look at My 1050s, there are working Very well with using Low CPU, as most people's are. ID: 1865514 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1865548 - Posted: 4 May 2017, 15:02:32 UTC - in response to Message 1865514. It looks as though -poll doesn't work anymore. So, if you're determined to use 100% CPU per task, use Petri's App from your other thread. . . LOL, you make it sound like screwing every last cycle out of the CPU is my main objective, I can assure you it is not. My main aim is to achieve the shortest processing times I can in a sustainable way. If one configuration uses 90% of CPU time and only reduces runtimes by 2 or 3 seconds over a second which only uses 10% of CPU time, then I will definitely take door number 2. But if the first reduces runtimes by 20 or 30 seconds (especially at this level) then odds are that is the one I will be running. I'm currently trying to build an App for Beta, which means the default settings will have to work for what MOST people want. That means LOW CPU use and Default Autotune. Most people would Scream at an App that uses 100% CPU. The problem is making Autotune work without commandlines and dealing with the 1 GB vRam GPUs that can't use Autotune, they may have to go. . . Absolutely! Off the rack should fit 99% of users comfortably. And for what it is worth autotune seems to be working well - zi3k+ . . . Guppis . . . 7.2 to 7.5 mins (39 WUs) . . .NARAs . . . 4.5 to 4.6 mins (small sample, 11 WUs, in current run due to guppi deluge) . . VHARs . . . 2.1 mins (pretty consistent times, vary only by a few seconds, even smaller sample) zi3t2B . . . Guppis . . . 6.8 to 7.2 mins (48 WUs) . . . NARAs . . . 4.8 mins (1WU), 3.4 to 3.5 Mins (4 WUs) too small a sample to be significant, I need more tasks. . . . VHAR . . . . No WUs at the current time. . . Previously I had only been noting the runtimes in 1/4 minute units, not accurate enough for this comparison. So I captured a sample for reference noting times using 1/10th minute units. So far it looks like this version is achieving about a 20 sec reduction in run times for Guppis, not su much for Arecibo tasks. Can I take it that -bs is now built into the app and is not controllable by the command line switch? I think I might download Petri's version just to see the difference. If you want me to run this one for a while for statistical purposes I am happy to do so. It will probably take about a week for the difference in inconclusives to be detectable. So I am happy to carry on as it. Just look for the other download for zi3t2b, replace the current one with that one, and make sure it is named correctly in the app_info. If you look at My 1050s, there are working Very well with using Low CPU, as most people's are. . . I will leave this running as is and see how the numbers compare 1050 against 1050ti. Stephen ID: 1865548 ·

TheHoosh Send message Joined: 17 Aug 12 Posts: 12 Credit: 11,693,138 RAC: 0	Message 1865675 - Posted: 5 May 2017, 9:42:55 UTC I've got a problem with this new app in terms of not receiving any CUDA workunits. Apparently something must be wrong with my setup: - Linux Mint 18.1 (Mate Edition) using the 4.10 kernel - a Geforce 750 TI with the 381.09 driver installed - I've downloaded and extracted the CUDA apps and the corresponding app_info.xml into "var\lib\boinc-client\<seti-folder>" - I've downloaded the libcudart.so.8.0 and libcufft.so.8.0 libraries and also placed them in "var\lib\boinc-client\<seti-folder>" - I've restarted the machine But so far, I'm still only receiving those 8.22 OpenCL tasks for my 750TI although this app is not even listed in app_info.xml. Could someone point me towards the mistake I made? ID: 1865675 ·

Michel Makhlouta Volunteer tester Send message Joined: 21 Dec 03 Posts: 169 Credit: 41,799,743 RAC: 0	Message 1865677 - Posted: 5 May 2017, 9:55:49 UTC - in response to Message 1865675. did you give the added files the correct permissions? ID: 1865677 ·

TheHoosh Send message Joined: 17 Aug 12 Posts: 12 Credit: 11,693,138 RAC: 0	Message 1865678 - Posted: 5 May 2017, 10:20:46 UTC - in response to Message 1865677. Last modified: 5 May 2017, 10:21:36 UTC I'll have to check that once I'm back home. But still, why am I receiving those 8.22 OpenCL tasks if they are not listed in my app_info.xml? It looks like BOINC is ignoring this file entirely. ID: 1865678 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22202 Credit: 416,307,556 RAC: 380	Message 1865679 - Posted: 5 May 2017, 10:59:32 UTC Last modified: 5 May 2017, 11:02:20 UTC If there are any syntax errors in the file BOINC may ignore it. Check very carefully that you have the correct number of "_" characters, have used "-" or "_" where they are in the file name, have used the correct characters ("1"& "I"; "l", "I", "1"; "0" & "O" are easy to confuse), make sure the case of letters is correct). Also this file is only read when BOINC starts, thus may not have been read. (To make sure BOINC has stopped and restarted I do a computer restart as I did have some trouble getting BOINC to fully stop.) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1865679 ·

tazzduke Volunteer tester Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5	Message 1865680 - Posted: 5 May 2017, 11:00:04 UTC - in response to Message 1865678. Greetings TheHoosh Can you have a look in the event log and see if is says anything about seti@home found app_info.xml, or even show us the first 30 lines of the event log. If its not there, then it is being ignored by BOINC. Regards ID: 1865680 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1865738 - Posted: 5 May 2017, 16:50:41 UTC - in response to Message 1859140. Hi, Here is an interesting one: http://setiathome.berkeley.edu/workunit.php?wuid=2488511762 The SoG has the same kind of error that my version has. I'd like to know if R. finds a cure for that - it might help me too. Petri Was task preserved? SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1865738 ·

TheHoosh Send message Joined: 17 Aug 12 Posts: 12 Credit: 11,693,138 RAC: 0	Message 1865747 - Posted: 5 May 2017, 17:55:37 UTC - in response to Message 1865677. Last modified: 5 May 2017, 17:56:25 UTC did you give the added files the correct permissions? That did the trick! Now the new application is working like a charm! Thank you guys for your help! :) In comparison to the stock application the CUDA-based version runs 2x faster and requires less than 5% of the original app's CPU time (~18s vs. ~580s). Quite impressive! ID: 1865747 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1865763 - Posted: 5 May 2017, 19:57:04 UTC - in response to Message 1865738. Hi, Here is an interesting one: http://setiathome.berkeley.edu/workunit.php?wuid=2488511762 The SoG has the same kind of error that my version has. I'd like to know if R. finds a cure for that - it might help me too. Petri Was task preserved? Hi R., I'm sorry. I do not have that WU. Now the community could show its power: Does anyone have that? To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1865763 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1865845 - Posted: 6 May 2017, 6:13:13 UTC - in response to Message 1865763. Last modified: 6 May 2017, 6:16:17 UTC Have been able to look in some detail while down with a nasty cold for 2 weeks. As the pulses straddling unroll boundaries are relatively rare, my suggestion is to add an atomic (best &/orReportable) update count in mapped memory, either alongside or in place of the existing flags, then check it when a single unrolled launch is complete. If >1 then shunt unroll to 1 so to reprocess unroll worth (not many) in serial fashion so that the race across unroll boundary is muted. Can return to normal unroll for subsequent launches. That should work fine and have very little (if any noticeable) penalty, since we showed unroll 1 produces correct results in all cases examined in detail that I've seen. Care would need to be taken to remember to do the normal unroll worth as separate unroll 1 launches, then revert to full unroll after. If I recover enough (hopefully with the aid of some Gentleman Jack's later), then I'll give it a go, but figured I'd relay the findings for experimentation in case it takes me longer to get the machines/environments up to scratch. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1865845 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1865977 - Posted: 7 May 2017, 8:08:29 UTC - in response to Message 1865845. Have been able to look in some detail while down with a nasty cold for 2 weeks. As the pulses straddling unroll boundaries are relatively rare, my suggestion is to add an atomic (best &/orReportable) update count in mapped memory, either alongside or in place of the existing flags, then check it when a single unrolled launch is complete. If >1 then shunt unroll to 1 so to reprocess unroll worth (not many) in serial fashion so that the race across unroll boundary is muted. Can return to normal unroll for subsequent launches. That should work fine and have very little (if any noticeable) penalty, since we showed unroll 1 produces correct results in all cases examined in detail that I've seen. Care would need to be taken to remember to do the normal unroll worth as separate unroll 1 launches, then revert to full unroll after. If I recover enough (hopefully with the aid of some Gentleman Jack's later), then I'll give it a go, but figured I'd relay the findings for experimentation in case it takes me longer to get the machines/environments up to scratch. gee_jason :) I'll try that. P. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1865977 ·

Sidewinder Volunteer tester Send message Joined: 15 Nov 09 Posts: 100 Credit: 79,432,465 RAC: 0	Message 1866174 - Posted: 8 May 2017, 4:28:42 UTC Tbar your app_info.xml worked out of the box. Thanks for including that; saved me a bit of time. ID: 1866174 ·

tazzduke Volunteer tester Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5	Message 1867062 - Posted: 13 May 2017, 3:52:38 UTC Greetings All As any one been game enough to try and run two instances of the special app on one gpu? Or should I be the experimenter lol. On another note, this latest version really works a treat lol. Regards ID: 1867062 ·

scocam Send message Joined: 28 Feb 17 Posts: 27 Credit: 15,120,999 RAC: 0	Message 1867063 - Posted: 13 May 2017, 4:06:38 UTC - in response to Message 1867062. Last modified: 13 May 2017, 4:09:20 UTC I've been running two WUs per GPU for almost two weeks now and it seems to be doing very well (https://setiathome.berkeley.edu/show_host_detail.php?hostid=8257416). For the first few days, I ran a single WU/GPU but noticed that the GPU usage was hovering around 82-88% so I decided to see if running two per would increase GPU usage and not harm crunch times. GPU usage is now in the high 90s and times seem to be running a few seconds quicker than when running a single WU. I have no math to back that up but I couldn't be happier with the performance of this setup right now. scocam ID: 1867063 ·

tazzduke Volunteer tester Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5	Message 1867069 - Posted: 13 May 2017, 5:53:07 UTC - in response to Message 1867063. Thank you, went a changed it but got a few computation errors - out of memory error. Back on singles now on both cards that have a 2gb mem. Need to read more to understand. Regards ID: 1867069 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1867075 - Posted: 13 May 2017, 6:32:26 UTC - in response to Message 1867062. Last modified: 13 May 2017, 6:41:44 UTC Greetings All As any one been game enough to try and run two instances of the special app on one gpu? Or should I be the experimenter lol. On another note, this latest version really works a treat lol. Regards . . Petri and TBar say NO! And I tend to agree with them. And remember Scocam's cards all have 4 GB of Vram. Stephen :) ID: 1867075 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.