Linux CUDA 'Special' App finally available, featuring Low CPU use

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 83 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1859494 - Posted: 5 Apr 2017, 2:04:49 UTC - in response to Message 1859455.  
Last modified: 5 Apr 2017, 2:06:20 UTC

under W7, bad WU with ATI ?

https://setiathome.berkeley.edu/workunit.php?wuid=2486259533

Hmmm, for this ATI card....Look at these items;
https://setiathome.berkeley.edu/result.php?resultid=5625786710
SSE2xj Win32 Build 3330 = This is an Old version with the Sanity Check. The Sanity check in the Old Apps can trigger with the Newer tasks;
ERROR: Possible wrong computation state on GPU, host needs reboot or maintenance
To fix this Update to a Newer App.
OpenCL 1.2 AMD-APP (923.1) = This is an Old driver. Your Capeverde would work better with a Newer Driver. For Win 7 I'd recommend 13/12 which seened to work fine with my Capeverde.

Try that.
ID: 1859494 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 715
Credit: 8,032,827
RAC: 62
France
Message 1859607 - Posted: 5 Apr 2017, 15:32:34 UTC

ok, thanks for the answer ;)
ID: 1859607 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1863856 - Posted: 27 Apr 2017, 0:54:51 UTC

The Latest version zi3t2b with the Lowered vRam patch is now available at Crunchers Anonymous, http://www.arkayn.us/forum/index.php?topic=197.msg4499#msg4499
This version has the -unroll autotune setting which Automatically sets the Unroll to match your number of Compute Units. If you have a GPU with only One GB of vRam you must change the -unroll setting in the app_info.xml to 1 or 2 before running BOINC. GPUs with 2 GB of vRam Might be ok with using 7 or 8, it would be safe to use -unroll 6 until further tested. See the README_x41p_zi3t2b.txt file in the docs folder for other hints. This version has the Blocking Sync set as default, similar to the Older CUDA Apps. You can increase the CPU use by using the -poll command, similar to the old CUDA Apps.

As usual, if you have Existing Work Units you Must Change the app_info.xml version number and plan class to match the tasks assigned in the client_state.xml file to avoid creating Ghost tasks. If in doubt, finish the current tasks before making the change. This package has the AP and CPU App included with an appropriate app_info.xml. On my machines, the BLC tasks are a little faster and there are fewer Inconclusive results with this version.
ID: 1863856 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1864152 - Posted: 28 Apr 2017, 3:20:15 UTC

. . HI Guys,

. . @ TBar/Petri

. . My first invalid ...

http://setiathome.berkeley.edu/workunit.php?wuid=2519489735

. . Not sure if it is significant.

Stephen

??
ID: 1864152 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1864159 - Posted: 28 Apr 2017, 3:39:08 UTC - in response to Message 1864152.  

. . HI Guys,

. . @ TBar/Petri

. . My first invalid ...

http://setiathome.berkeley.edu/workunit.php?wuid=2519489735

. . Not sure if it is significant.

Stephen

??


Many errors have been fixed since zi3k+.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1864159 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1864200 - Posted: 28 Apr 2017, 6:28:08 UTC - in response to Message 1864159.  

. . HI Guys,

. . @ TBar/Petri

. . My first invalid ...

http://setiathome.berkeley.edu/workunit.php?wuid=2519489735

. . Not sure if it is significant.

Stephen

??


Many errors have been fixed since zi3k+.

Petri


. . OK. so it is time to update the app. Sigh, another chance to get things wrong ... :)

Stephen

??
ID: 1864200 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1864468 - Posted: 29 Apr 2017, 4:26:55 UTC - in response to Message 1863856.  

Okay, I finally bit the bullet and decided to dip my toe into these waters. I installed Ubuntu on one of my machines yesterday, along with the latest zi3t2b from your link. After shooting myself in the foot a couple times (resulting in 25 error tasks), I seemed to get everything going well enough to let it run overnight and quite a few hours today.

Initial results looked encouraging but then the Invalids started appearing (3 so far) and, after looking through the Inconclusives, it appears that many more will also end up Invalid once the tiebreaker weighs in.

The host is 8253697. The problem seems to be that some of the tasks running on the GTX 780 are identifying massive numbers of Pulses that the wingmen are not finding. I don't have time this evening to dig further, but it appears that the GTX 960 is not having the same problem. (The GTX 670 has been excluded from processing following one of those initial foot-shooting episodes.) The GTX 780 also has, I think, close to 300 Validated tasks, as well, so it seems to be an intermittent problem. (I don't think temperature is a problem, either, as I used that Coolbits thingy to give me fan control and had the GTX 780 running at about 70C or less. I suppose there could have been temperature spikes that I never saw, but that seems unlikely.)

Here are links to a couple of the Inconclusives that are still outstanding as of the moment.

Workunit 2520980022 (blc02_2bit_guppi_57835_02209_HIP27056_0015.13456.0.23.46.228.vlar)
Task 5695784772 (S=0, A=0, P=30, T=0, G=0) x41p_zi3t2b, Cuda 8.00 special
Task 5695784773 (S=0, A=0, P=4, T=3, G=0) v8.22 (opencl_nvidia_SoG) windows_intelx86

Workunit 2521153441 (blc02_2bit_guppi_57835_03239_HIP28267_0018.31617.818.24.47.178.vlar)
Task 5696150902 (S=4, A=2, P=23, T=1, G=0) x41p_zi3t2b, Cuda 8.00 special
Task 5696150903 (S=4, A=2, P=3, T=1, G=0) v8.22 (opencl_nvidia_SoG) windows_intelx86

Anyway, that's all I have time for this evening. I'll switch that box back over to Windows tonight and hopefully can look at it a little closer from this end tomorrow.
ID: 1864468 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1864470 - Posted: 29 Apr 2017, 6:02:30 UTC - in response to Message 1864468.  

...The problem seems to be that some of the tasks running on the GTX 780 are identifying massive numbers of Pulses that the wingmen are not finding. I don't have time this evening to dig further, but it appears that the GTX 960 is not having the same problem. (The GTX 670 has been excluded from processing following one of those initial foot-shooting episodes.) ...
It could be the 780 is not happy with CUDA 8. I've noticed a couple 780s that aren't very happy with CUDA 7.5. There aren't many around in the Top couple of thousand Hosts, and the one Host I know of is running the CUDA 6.0 App, https://setiathome.berkeley.edu/results.php?hostid=8177126&offset=220. I've been waiting for that Host to Upgrade to the CUDA 8 App to see what happens. I removed the CUDA 6.0 Special App because it wasn't compatible with CC 3.7, and all the CUDA 6.5 Apps are a bit slower than 7.5 & 8.0 on my machines. I suppose I could build a CUDA 6.5 Special App with the zi3t2b code and see how it works. Anyway you could try the old CUDA 6 Special App and see if it works on the 780? I'll see about building a new 6.5 App this weekend.
ID: 1864470 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1864583 - Posted: 29 Apr 2017, 17:17:29 UTC - in response to Message 1864468.  

Try the New CUDA 6.5 Special App, it's the same as the CUDA 8 App only compiled with the Older Toolkit. If you don't want to bother downloading the 6.5 Toolkit for the 6.5 Libraries, just download and use the CUDA 6.0 Libraries from SETI,
http://boinc2.ssl.berkeley.edu/beta/download/libcudart.so.6.0
http://boinc2.ssl.berkeley.edu/beta/download/libcufft.so.6.0
It looks as though there isn't much difference using the 6.0 libraries.
The 6.5 App appears to be around 10 to 30 seconds slower than 8.0 on my machine.
http://www.arkayn.us/forum/index.php?topic=197.msg4499#msg4499
ID: 1864583 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1864585 - Posted: 29 Apr 2017, 17:39:03 UTC - in response to Message 1864470.  
Last modified: 29 Apr 2017, 17:40:39 UTC

...The problem seems to be that some of the tasks running on the GTX 780 are identifying massive numbers of Pulses that the wingmen are not finding. I don't have time this evening to dig further, but it appears that the GTX 960 is not having the same problem. (The GTX 670 has been excluded from processing following one of those initial foot-shooting episodes.) ...
It could be the 780 is not happy with CUDA 8. I've noticed a couple 780s that aren't very happy with CUDA 7.5. There aren't many around in the Top couple of thousand Hosts, and the one Host I know of is running the CUDA 6.0 App, https://setiathome.berkeley.edu/results.php?hostid=8177126&offset=220. I've been waiting for that Host to Upgrade to the CUDA 8 App to see what happens. I removed the CUDA 6.0 Special App because it wasn't compatible with CC 3.7, and all the CUDA 6.5 Apps are a bit slower than 7.5 & 8.0 on my machines. I suppose I could build a CUDA 6.5 Special App with the zi3t2b code and see how it works. Anyway you could try the old CUDA 6 Special App and see if it works on the 780? I'll see about building a new 6.5 App this weekend.
The Invalid count has reached 18 as of this writing. All have come from the GTX 780, none from the GTX 960. All but one are VLARs, coming from the blc02_2bit_guppi_57835... series. However, there is the one exception, which comes from a WU with a normal AR (0.404448), 22mr08ad.26086.24612.15.42.187.

There are currently 14 more tasks with similar characteristics in the Inconclusive pile. Of those, all are from the GTX 780 and all are VLARs.

It appears that the "extra" Pulses for each WU always appear in a fairly narrow range of peaks, with identical "time", "period", "chirp", and "fft_len" values, though these may vary slightly from WU to another. I have no idea what the significance of that is.

Looking at a small sample of the Valid tasks from that host, I do also see several guppi VLARs successfully processed by the GTX 780, so just being a VLAR is not the only factor causing the problem.

Perhaps later today, or this evening, I can try your suggestion of switching to the Cuda 6 special app, assuming it's still available for download somewhere. In the meantime, that box will continue to run Windows.

EDIT: Oops! I see you posted while I was researching and composing. Okay, I'll try the 6.5 later on, when I can get to it. Thanks.
ID: 1864585 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1864647 - Posted: 30 Apr 2017, 1:09:45 UTC - in response to Message 1864583.  

Try the New CUDA 6.5 Special App, it's the same as the CUDA 8 App only compiled with the Older Toolkit. If you don't want to bother downloading the 6.5 Toolkit for the 6.5 Libraries, just download and use the CUDA 6.0 Libraries from SETI,
http://boinc2.ssl.berkeley.edu/beta/download/libcudart.so.6.0
http://boinc2.ssl.berkeley.edu/beta/download/libcufft.so.6.0
It looks as though there isn't much difference using the 6.0 libraries.
The 6.5 App appears to be around 10 to 30 seconds slower than 8.0 on my machine.
http://www.arkayn.us/forum/index.php?topic=197.msg4499#msg4499
Okay, finally up and running again on Linux, this time with your new 6.5 app. Things didn't exactly go smoothly, though only partly my fault this time, I think.

I overlooked the fact that the Plan Class had changed in the new app_info.xml and thus ghosted the 124 Cuda80 tasks that were in the queue. My bad. I should be able to retrieve those later on, though, so not a long-term problem.

On the other hand, trying to use the Cuda 6.0 libraries failed miserably. The app, as compiled, seems to require the 6.5 libraries, as the first 9 new tasks downloaded crapped out faster than I could switch to the Project tab and click "Suspend". The error message was:
../../projects/setiathome.berkeley.edu/setiathome_x41p_zi3t2b_x86_64-pc-linux-gnu_cuda65: error while loading shared libraries: libcudart.so.6.5: cannot open shared object file: No such file or directory

So, I downloaded the 6.5 Toolkit (which takes a while), and then installed the libraries from there. Even that took 2 attempts, as the first one crapped out with "Installation Failed. Using unsupported compiler." (It seems that "--override" has to be added to the command line to overcome that shortcoming, although why the installer couldn't have simply asked if I wanted to override the compiler instead of failing and sending me back to square one is beyond me. Oh, wait..........it's f@#$%#$ing LINUX!)

Anyway, it's running now and has returned a handful of what appear to be satisfactory results, but it's really too early to tell. I'll check back in a couple hours, but it will probably have to wait until tomorrow morning to see if Inconclusives/Invalids start showing up again once the wingmen start reporting. (Again, the host is 8253697.)
ID: 1864647 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1864698 - Posted: 30 Apr 2017, 5:31:07 UTC - in response to Message 1864647.  
Last modified: 30 Apr 2017, 5:38:27 UTC

Yes, it does look as though the 780 is working nearly normal now, I'm not sure about the 960 though. I have one of those and I know it's kinda slow, comparatively speaking, but that seems a bit slower than expected. Almost as though it were on a x1 extender.... Now if you could just replace that useless 670 ;-) Even a 750Ti should net around 19k with the Special App. I've been able to change between libraries without any trouble. I just make sure the libraries are in the setiathome.berkeley.edu folder and named in the four lines of the app_info.xml. I run a few path and ld_library lines before compiling, naming a few locations, and I've never even had trouble running the Apps in the benchmark folder without running any more paths first. I suppose it could be different on other machines, I do have libraries scattered all around my machines. It would be nice if the SETI server had half as many on their machine ;)

I've run the CUDA 6.5 App on my machine for almost a day, and it seems to be working normally. Back to the 8.0 App for now.
ID: 1864698 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1864835 - Posted: 30 Apr 2017, 18:17:09 UTC - in response to Message 1864698.  

Yes, it does look as though the 780 is working nearly normal now, I'm not sure about the 960 though. I have one of those and I know it's kinda slow, comparatively speaking, but that seems a bit slower than expected. Almost as though it were on a x1 extender.... Now if you could just replace that useless 670 ;-) Even a 750Ti should net around 19k with the Special App. I've been able to change between libraries without any trouble. I just make sure the libraries are in the setiathome.berkeley.edu folder and named in the four lines of the app_info.xml. I run a few path and ld_library lines before compiling, naming a few locations, and I've never even had trouble running the Apps in the benchmark folder without running any more paths first. I suppose it could be different on other machines, I do have libraries scattered all around my machines. It would be nice if the SETI server had half as many on their machine ;)

I've run the CUDA 6.5 App on my machine for almost a day, and it seems to be working normally. Back to the 8.0 App for now.
Agreed. The 6.5 App doesn't seem to have the same clustered Pulse problem that the 8.0 App was showing on the 780.

I haven't gotten to the point yet of comparing the performance of the two cards vs. the SoG app in Windows. Perhaps this evening I'll get to that. The 960 is indeed on a riser cable, but from an x8/x4 slot rather than x1. According to the NVIDIA info display, the Memory Transfer Rate is the same for both cards, and both are running just slightly under their maximum clock rates, with the 780 showing at the P3 level and the 960 at P2 (with the same clock range as P3). I suppose the fact that the monitor is connected to the 960 could cause a slight performance degradation, but I don't really know what, if any, the impact might be.

I don't know if I want to consider replacing the 670 or not. It's certainly a productive workhorse on the Windows side and it seems a shame that the Special App can't accommodate the GTX 6xx series of cards. I've still got a GTX 660 in one of my other crunch-only hosts (along with two 750Ti's and a 960). I'll probably evaluate the whole situation after running with just the 2 cards on the Special App for a few weeks and see if the added throughput on those manage to overcome the loss of the 670. If not, then I'd probably go back to running SoG in Windows, rather than replace the 670.
ID: 1864835 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1864840 - Posted: 30 Apr 2017, 19:03:54 UTC - in response to Message 1864835.  

One could omit the use of the 670 in cc_config and create a 2nd BOINC directory to run a stock app (or Lunatics) for just that card without the 'sauce'
It would then not leave the card sit idle.

More to manage, but workable.
ID: 1864840 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1864842 - Posted: 30 Apr 2017, 19:12:40 UTC - in response to Message 1864835.  

I don't know if I want to consider replacing the 670 or not... I've still got a GTX 660 in one of my other crunch-only hosts (along with two 750Ti's and a 960).
Yes, I noticed the other hopeless 600 card with the 750s. I was thinking a swap, perhaps placing the two hopeless cards together and moving a 750 or 960 to the machine running Linux to replace the 670. I'd move the 960 to a slot and place a 750 on the riser since it's the slower card and risers always result in a penalty. Just a comparison with my 960 running in a x16 slot;
Semi-shorty;
mine: AR = 1.145172 - Run time: 2 min 34 sec http://setiathome.berkeley.edu/result.php?resultid=5701742533
yours: AR= 1.145661 - Run time: 7 min 13 sec https://setiathome.berkeley.edu/result.php?resultid=5701906252
BLC02;
mine: Run time: 7 min 46 sec http://setiathome.berkeley.edu/result.php?resultid=5701443799
yours: Run time: 13 min 27 sec https://setiathome.berkeley.edu/result.php?resultid=5701967278
Something seriously wrong there.

With three working cards in that machine it would probably be in the 60-70k range. Mine are in that range with basically three cheap x50 series cards.
The 960 doesn't count, my 950s run almost as fast. Even when running the 960 at unroll 8 it wasn't any better.
ID: 1864842 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1864860 - Posted: 30 Apr 2017, 21:04:20 UTC

Hi fellows,
The 780 is doing admirably well. It has 12 cores.

All new cards (750+) do good too.

I had a sad feeling with my 980 series. They did not perform that well. The 780 outperformed the 980 on shorties when benchmarking. The wattage of the 780 is justified anyway. It can do. I never had a chance to try with a 780Ti. I guess it would have done well. On lower AR (like vlar) the 980 did well.

a) you need a lot of GPU cores (sm/smx) -- this is where my optimisation is aimed at. The ATI/AMD have 64 of those.. My Ti has 28.
b) any new GPU can do more instructions per clock. (Maxwell/Pascal)
c) a fast GPU RAM helps. -- Lower GPU clock, higher RAM? temp v.s. speed.

--
Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1864860 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1864869 - Posted: 30 Apr 2017, 21:57:25 UTC - in response to Message 1864842.  

Yes, I noticed the other hopeless 600 card with the 750s. I was thinking a swap, perhaps placing the two hopeless cards together and moving a 750 or 960 to the machine running Linux to replace the 670. I'd move the 960 to a slot and place a 750 on the riser since it's the slower card and risers always result in a penalty. Just a comparison with my 960 running in a x16 slot;
Semi-shorty;
mine: AR = 1.145172 - Run time: 2 min 34 sec http://setiathome.berkeley.edu/result.php?resultid=5701742533
yours: AR= 1.145661 - Run time: 7 min 13 sec https://setiathome.berkeley.edu/result.php?resultid=5701906252
BLC02;
mine: Run time: 7 min 46 sec http://setiathome.berkeley.edu/result.php?resultid=5701443799
yours: Run time: 13 min 27 sec https://setiathome.berkeley.edu/result.php?resultid=5701967278
Something seriously wrong there.

With three working cards in that machine it would probably be in the 60-70k range. Mine are in that range with basically three cheap x50 series cards.
The 960 doesn't count, my 950s run almost as fast. Even when running the 960 at unroll 8 it wasn't any better.
Unless I switch that other box over to 64-bit Linux, I think I'll leave the GPU configuration alone for now. Under 32-bit Win7, I seem to be stretching the RAM limits due to some video RAM mapping that I've never been able to sort out. It's running right on the ragged edge with the current configuration. I also have the two 750Ti's running outside the case, partly because they don't require any extra power connections. Swapping one for the 670 would complicate that arrangement (though the 850w PSU should handle it adequately.

I don't know what might be inhibiting the 960. If I get some time this evening, I'll try to do some performance comparisons, both with the Windows SoG tasks that it's been running, as well as the performance of the 960s in two of my other boxes. Perhaps something will stand out......or perhaps not.

@Brent - I'm already ignoring the 670 w/ cc_config, but whether I want to add complexity by doubling up on BOINC is something I'd have to think long and hard about. Managing a "simple" Linux system is already way more of a PIA than I would normally care to deal with. ;^) Perhaps once if the Windows version of the Special App hits prime time I might look into it.
ID: 1864869 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1864872 - Posted: 30 Apr 2017, 22:10:29 UTC - in response to Message 1864869.  
Last modified: 30 Apr 2017, 22:15:31 UTC

Under 32-bit Win7, I seem to be stretching the RAM limits due to some video RAM mapping that I've never been able to sort out. It's running right on the ragged edge with the current configuration.

I have a 32bit Win Vista system with a C2D and 2* GTX 750Tis. I am unable to run the SoG application on it due to video driver restarts. The lack of available physical RAM (due to the 32bit OS), slow CPU clock speed & limited number of cores just make it impossible for the system to meet the demands of even those low power cards.

I don't know what might be inhibiting the 960.

CPU clock speed?
The faster the GPU crunches, the more CPU support it needs. Even with a CPU reserved per GPU WU, if the CPU is too slow to keep up, then the GPU output will be significantly impacted.
I can't see PCIe bandwidth having that great an impact on crunching times unless the systems are only PCIe v1 specification. Even then, such a big performance hit seems, unlikely... a 50% hit, maybe. But 2-3 times?
Grant
Darwin NT
ID: 1864872 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1864874 - Posted: 30 Apr 2017, 22:13:07 UTC - in response to Message 1864860.  

Hi fellows,
The 780 is doing admirably well. It has 12 cores.
--
Petri
The problem I ran into seemed to be some sort of incompatibility between the 780, TBar's Cuda 8.0 version of the Special App, and a small percentage of processed tasks (all but one being guppi VLARs). It did not occur with my GTX 960 using Cuda 8.0 and it is not occurring now running the Cuda 6.5 version on the 780.

I detailed the problem in my Message 1864468 and my followup in Message 1864585. You can still see examples of the Stderr output in any of the Invalid tasks for host 8253697 and in most of the Inconclusive tasks for that host that were reported on 28 Apr.

You'll see a cluster of Pulses not found by the wingmen which are always similar to these:

Pulse: peak=1.968891, time=45.86, period=2.947, d_freq=2594346105.39, score=1.005, chirp=47.314, fft_len=1024 
Pulse: peak=1.115416, time=45.84, period=0.2801, d_freq=2594341360.16, score=2.843, chirp=-48.632, fft_len=512 
Pulse: peak=0.9445668, time=45.84, period=0.2801, d_freq=2594341382.51, score=2.408, chirp=-48.632, fft_len=512 
Pulse: peak=1.003587, time=45.84, period=0.2801, d_freq=2594341404.87, score=2.558, chirp=-48.632, fft_len=512 
Pulse: peak=1.065494, time=45.84, period=0.2801, d_freq=2594341427.22, score=2.716, chirp=-48.632, fft_len=512 
Pulse: peak=0.9986159, time=45.84, period=0.2801, d_freq=2594341449.57, score=2.545, chirp=-48.632, fft_len=512 
Pulse: peak=0.9415617, time=45.84, period=0.2801, d_freq=2594341471.92, score=2.4, chirp=-48.632, fft_len=512 
Pulse: peak=1.040676, time=45.84, period=0.2801, d_freq=2594341494.27, score=2.653, chirp=-48.632, fft_len=512 
Pulse: peak=1.040215, time=45.84, period=0.2801, d_freq=2594341516.62, score=2.651, chirp=-48.632, fft_len=512 
Pulse: peak=0.9044526, time=45.84, period=0.2801, d_freq=2594341538.98, score=2.305, chirp=-48.632, fft_len=512 
Pulse: peak=0.9381055, time=45.84, period=0.2801, d_freq=2594341561.33, score=2.391, chirp=-48.632, fft_len=512 
Pulse: peak=1.030072, time=45.84, period=0.2801, d_freq=2594341583.68, score=2.626, chirp=-48.632, fft_len=512 
Pulse: peak=0.920841, time=45.84, period=0.2801, d_freq=2594341606.03, score=2.347, chirp=-48.632, fft_len=512 
Pulse: peak=0.9513928, time=45.84, period=0.2801, d_freq=2594341628.38, score=2.425, chirp=-48.632, fft_len=512 
Pulse: peak=1.062582, time=45.84, period=0.2801, d_freq=2594341650.73, score=2.708, chirp=-48.632, fft_len=512 
Pulse: peak=0.9836082, time=45.84, period=0.2801, d_freq=2594341673.09, score=2.507, chirp=-48.632, fft_len=512 
Pulse: peak=0.9944587, time=45.84, period=0.2801, d_freq=2594341695.44, score=2.535, chirp=-48.632, fft_len=512 
Pulse: peak=0.9929603, time=45.84, period=0.2801, d_freq=2594341717.79, score=2.531, chirp=-48.632, fft_len=512 
Pulse: peak=1.162578, time=45.84, period=0.2801, d_freq=2594341740.14, score=2.963, chirp=-48.632, fft_len=512 
Pulse: peak=0.9750736, time=45.84, period=0.2801, d_freq=2594341762.49, score=2.485, chirp=-48.632, fft_len=512

It would be interesting to see if anybody else with a 780 experiences the same issue with the Cuda 8.0 app or if this is somehow unique to my setup.
ID: 1864874 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1864875 - Posted: 30 Apr 2017, 22:22:53 UTC - in response to Message 1864872.  

That core2duo would be a great candidate for Linux. Drop the CPU tasks and get 30-32k out of those 750Ti's :D
ID: 1864875 · Report as offensive
Previous · 1 . . . 19 · 20 · 21 · 22 · 23 · 24 · 25 . . . 83 · Next

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.