Message boards :
Number crunching :
Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation
Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · 27 . . . 83 · Next
Author | Message |
---|---|
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
It shouldn't be too difficult to figure out...It's f*%*&%*)ing LINUX! It's designed to be difficult to figure out anything, at least for the average non-geek user. ;^) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Let Google be your friend in Linux. Actually I find Linux much easier to crunch with - there is only like 3 different setting for command line options compared to the never ending list with SOG. It's just setting up your initial app_info that's a little tedious if you don't read the info provided. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
When I can get to it, perhaps this evening, I'll try doing a similar comparison for the run times on the 780, although I don't have one in another box to use as a control. I'm certainly curious to see if the Special App significantly improves the throughput for the 780.Okay, I got to it. Using the same format as my post for the 960, here are the numbers for the 780: Host 8253697 | Host 7057115 Linux "Special" | Win8.1 "SoG" (2/GPU) Avg RT (Tasks/Hr) | Avg RT (Tasks/Hr) -------------------|--------------------- High AR ----- 1:44 (34.62) | 5:29 (21.88) Normal AR --- 4:54 (12.24) | 11:42 (10.26) VLAR -------- 7:39 (7.84) | 20:32 (5.84) Clearly, the 780 does get better overall throughput with the Special App, about 58% better on High AR tasks, 19% on Normal ARs, and about 34% on the VLARs. Certainly impressive, but likely not good enough to compensate for the loss of use of the 670 and the reduced output of the 960. And, without a smoking gun pointing to whatever issue the 960 may be having, I'll likely switch back to SoG in Windows in a day or so, since I don't really want to devote much more time to this experiment. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Meanwhile, other people's 960s running the current App are producing times not much different than the ones you have posted for your 780. Which means their 960s are literally wiping the floor with your 960. It's common practice to compare similar hardware with similar software before making conclusions, you might want to look at a few other 960s. My 960 would be an easy find, it seems to work pretty much the same with a couple different systems, currently it's running in the previous system, http://setiathome.berkeley.edu/results.php?hostid=6796479&offset=320. The slower times are being produced by a 950 running in a x4 PCIe slot, the other two cards are pretty close. Then there is Gianfranco's 960, https://setiathome.berkeley.edu/results.php?hostid=8215300&offset=220. You could probably find a few more running version zi3t2b if you look around. There are a few 750Ti cards running the zi3t2b App, mine http://setiathome.berkeley.edu/results.php?hostid=6906726&offset=220 and another, http://setiathome.berkeley.edu/results.php?hostid=7942417&offset=320. Even the 750s are putting your 960 to shame. sad... ;) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Meanwhile, other people's 960s running the current App are producing times not much different than the ones you have posted for your 780. Which means their 960s are literally wiping the floor with your 960. It's common practice to compare similar hardware with similar software before making conclusions, you might want to look at a few other 960s. My 960 would be an easy find, it seems to work pretty much the same with a couple different systems, currently it's running in the previous system, http://setiathome.berkeley.edu/results.php?hostid=6796479&offset=320. The slower times are being produced by a 950 running in a x4 PCIe slot, the other two cards are pretty close. Then there is Gianfranco's 960, https://setiathome.berkeley.edu/results.php?hostid=8215300&offset=220. You could probably find a few more running version zi3t2b if you look around. There are a few 750Ti cards running the zi3t2b App, mine http://setiathome.berkeley.edu/results.php?hostid=6906726&offset=220 and another, http://setiathome.berkeley.edu/results.php?hostid=7942417&offset=320. Even the 750s are putting your 960 to shame. sad... ;)Actually, what I'm in the process of doing is installing Ubuntu on my box with the four GTX 960s (2 onboard, 2 on risers), which I've already demonstrated perform about the same as this one in Windows. Then, when I do comparisons, I have what should be a solid baseline to start from. The OS is already installed, but it will still take awhile to bring it up to date (happening now) and then get all the other bits and pieces in place before I install BOINC and S@h. I doubt if I'll actually start running anything this evening, what with the outage tomorrow, but perhaps by tomorrow evening or Wednesday. And again, I will ask two questions that I posed in earlier posts for which no answers were forthcoming: 1) Does it seem likely that Linux would have a problem with a GPU on a riser, when Windows does not? 2) I see in the output for that last example you provided that he's apparently using a "pfb" parameter, which I assume is the same as the "pfblockspersm" that I used to set in the mbcuda.cfg file when I was running Cuda in the pre-SoG days. Could something like that be causing such an improvement? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
And again, I will ask two questions that I posed in earlier posts for which no answers were forthcoming: I've given my suggestions, Numerous times. Go back and read the posts. I'm not going to keep repeating myself. Did you see any improvement when running pfblockspersm in Windows? I don't remember any, and the other people in Linux who are Not running that setting don't seem to be having any problems...do they. You Might see a second or two difference when trying that setting in the benchmark, my tests are all inconclusive when I tried those settings. So, I don't bother with something that produces negligible results. Clearly it won't produce a 200% difference, which is your problem. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Don't know why you're being so snippy when somebody's trying to help test your app. But if that's the way you choose to be, STUFF IT. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Don't know why you're being so snippy when somebody's trying to help test your app. But if that's the way you choose to be, STUFF IT. Hey, I built an App just for your 780. It works well doesn't it. I also identified the problem with your 960 within minutes, gave you suggestions multiple times. Yet you keep insisting there isn't a problem with your setup, and suggest Petri's App is the problem instead, while ignoring the results from other people's machines. If that's the way you're going to test things, then perhaps you should stop. It's annoying when people keep ignoring your suggestions and insist the problem is elsewhere. I at least try reasonable suggestions....most of the time. *nods head* |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . I have a GT 730 which I would love to try with CUDA80, it would not be the fastest thing but I am confident it would take to it quite well. When I swing Bertie over to Linux I might stick it in the spare slot and cripple the 970s for a while just to see how it manages. Since it, like the 630, only has two CUs I would have to limit the unroll to 2 instead of 13 :( Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
It's f*%*&%*)ing LINUX! It's designed to be difficult to figure out anything, at least for the average non-geek user. ;^) . . I know that feeling ... Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
[quote Even the 750s are putting your 960 to shame. sad... ;)[/quote] . . I would have expected 960's to do close to the times my GTX1050ti is doing. . . Halflings (VHAR) . . . 2 mins (28 / hour) . . Normals . . . . . . . . . . . . .4.5 mins (13 / hour) . . Guppis . . . . . . . . . . . .. 7.5 mins (8 / hour) . . But I am using an earlier version. Stephen |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
You should upgrade to the current zi3t2b and you might break 7 minutes on the blc2s. I think the difference between zi3k+ and the current build is about 30 seconds on the blc tasks and Fewer Inconclusive results. I can't get better than 7:45 on My GTX 960 with the blc2s but Gianfranco is running about 6.5 minutes on his 960 in Linux. I suppose that's the difference between OSX and Linux. My GTX 1050 is running just above 7 minutes on the blc2 tasks in Linux, Run time: 7 min 17 sec . There is a much greater improvement between zi+ and zi3t2b. Anyone with zi+, zi3k+, or anything else should upgrade the current version. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
You should upgrade to the current zi3t2b and you might break 7 minutes on the blc2s. I think the difference between zi3k+ and the current build is about 30 seconds on the blc tasks and Fewer Inconclusive results. . . For me the $64,000 question is, does the upgrade run automagically or do I have to edit config files? Stephen ?? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Try being a little more descriptive. Read this post https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1863856 The Latest version zi3t2b with the Lowered vRam patch is now available at Crunchers Anonymous, http://www.arkayn.us/forum/index.php?topic=197.msg4499#msg4499If the new app_info.xml has the same version number and plan class entries then all you have to do is add the new files. It's up to you if you want to remove the old files. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Try being a little more descriptive. Read this post https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1863856 . . I have d/l'd the file and extracted it. I will have a go after lunch. Stephen :) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
For Petri or Jason, if you're still collecting examples of WUs for offline testing, where the Pulse reporting by the Special App is a bit odd, here's a new Inconclusive I spotted. Workunit 2525653719 (blc02_2bit_guppi_57835_05245_HIP38647_0023.28046.409.23.46.204.vlar) Task 5705555225 (S=1, A=0, P=4, T=0, G=0) x41p_zi3t2b, Cuda 6.50 special Task 5705555226 (S=1, A=0, P=4, T=0, G=0) v8.22 (opencl_ati5_nocal) windows_intelx86 The Pulse counts for both hosts are the same, as are the reported Best Pulse. However, for the Special App, the Best Pulse doesn't match one of the previously reported Pulses, as follows: Pulse: peak=4.645121, time=45.82, period=9.947, d_freq=2674792128.46, score=1.001, chirp=-14.771, fft_len=256 Spike: peak=24.31458, time=40.09, d_freq=2674791485.17, chirp=-20.315, fft_len=128k Pulse: peak=0.2709029, time=45.82, period=0.09916, d_freq=2674792451.61, score=1.082, chirp=26.43, fft_len=64 Pulse: peak=3.874821, time=45.84, period=9.009, d_freq=2674796407.05, score=1.011, chirp=61.024, fft_len=512 Pulse: peak=3.920059, time=45.82, period=8.14, d_freq=2674798612.19, score=1.004, chirp=75.017, fft_len=256 ... Best spike: peak=24.31458, time=40.09, d_freq=2674791485.17, chirp=-20.315, fft_len=128k Best autocorr: peak=16.87247, time=85.9, delay=2.4451, d_freq=2674793425.33, chirp=-24.528, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=4.221204, time=45.84, period=8.998, d_freq=2674796407.05, score=1.101, chirp=61.024, fft_len=512 Best triplet: peak=0, time=-2.124e+11, period=0, d_freq=0, chirp=0, fft_len=0 In the corresponding task from an OpenCL app, those Pulses match: Pulse: peak=4.645125, time=45.82, period=9.947, d_freq=2674792128.46, score=1.001, chirp=-14.771, fft_len=256 D: threshold 0.08758591; unscaled peak power: 0.08762821 exceeds threshold for 0.04829% Spike: peak=24.31457, time=40.09, d_freq=2674791485.17, chirp=-20.315, fft_len=128k Pulse: peak=0.270903, time=45.82, period=0.09916, d_freq=2674792451.61, score=1.082, chirp=26.43, fft_len=64 D: threshold 0.004868537; unscaled peak power: 0.004948388 exceeds threshold for 1.64% Pulse: peak=4.221204, time=45.84, period=8.998, d_freq=2674796407.05, score=1.101, chirp=61.024, fft_len=512 D: threshold 0.1511975; unscaled peak power: 0.1633066 exceeds threshold for 8.009% Pulse: peak=3.920061, time=45.82, period=8.14, d_freq=2674798612.19, score=1.004, chirp=75.017, fft_len=256 D: threshold 0.07585774; unscaled peak power: 0.07607242 exceeds threshold for 0.283% Best spike: peak=24.31457, time=40.09, d_freq=2674791485.17, chirp=-20.315, fft_len=128k Best autocorr: peak=16.87248, time=85.9, delay=2.4451, d_freq=2674793425.33, chirp=-24.528, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+011, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=4.221204, time=45.84, period=8.998, d_freq=2674796407.05, score=1.101, chirp=61.024, fft_len=512 Best triplet: peak=0, time=-2.124e+011, period=0, d_freq=0, chirp=0, fft_len=0 |
Gianfranco Lizzio Send message Joined: 5 May 99 Posts: 39 Credit: 28,049,113 RAC: 87 |
@TBar Performance between OSX and Linux are almost the same. In Linux I get better times just because I overclock the card. The memories works at 7700 Mhz against the factory 7010 Mhz and the core graphics run at 1480Mhz. With these overclock ​​the card works at 50 degrees with fans at 50%. I don't want to believe, I want to know! |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
OK, thanks. That helps explaining it. I had decided it had to be something about the memory. I have the 2 GB card and it's running out of memory in Sierra if used as the main display. I changed the wires around and use a 950 for the display and now I can run the benchmark test at unroll 8 on the 960 without the App crashing. I'm still afraid to try running it in BOINC at unroll 8. Maybe tomorrow. It doesn't seem to make much difference using 6 or 8 though. One thing I noticed about this version is it gives better results with the Known PulseFind bug. In my benchmark tests some of the tasks known to produce the Bad Best Pulse actually pass. The One task that always produced 2 Bad Pulses now only finds One bad Pulse. So, this version is definitely better than past versions with the Known issue that has existed since the unroll feature was added. That is one reason it produces fewer Inconclusives and why Everyone should Upgrade to this version. One step closer... |
Gianfranco Lizzio Send message Joined: 5 May 99 Posts: 39 Credit: 28,049,113 RAC: 87 |
@ base clock with blc02 the runtime is 7min 17 sec. I hope this can help you. I don't want to believe, I want to know! |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . So there is nothing that overwrites apart from the app_info.xml? I will rename the old app_info.xml as a reference before copying the new files across but I want to be sure I am not going to scrap anything else that might come back to bite me. Will I need to restart BOINC or will it just pick up on the change when the next new task starts? Or should I execute a "read config files". I know you said to "just add the new files" but I have ghosted full caches before ... :( Stephen ? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.