Posts by petri33


log in
1) Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers (Message 1795815)
Posted 13 days ago by Profile petri33Project Donor
You have to define the word 'work'.

You could tell a human being to dig a hole or build a house, and there are enough pre-assimilated cultural reference points that what you get will probably bear some resemblance to what you imagined in the first place. Tell a human being to build an airplane (and nothing else): what are the chances that it will pass CAA certification and that you would be willing to let it take you to your next holiday destination?

We are giving our computers the task of finding 'interesting' signals in a slice from a radio-frequency recording. Maybe some computers are capable of 'understanding' what human beings would deem to be an 'interesting' signal (or, indeed, what another computer would deem to be an interesting signal - which may or may not be the same thing). But I don't think the one on my desk can do that yet.

Forty years ago, I set to to write "A program to detect, tabulate and/or plot the 'interesting' parts of a function". That effort resulted in a deck of punched cards, a typed report, and a Diploma in Computer Science. The first two chapters of the report are devoted to consideration of what might be 'interesting', and how to present a meaningful (and digestible) response back to the provider of the function under investigation. But let's be honest: the output of the resulting program only expressed what was interesting to me, the programmer. Computers forty years ago hadn't yet become able to intuit answers to questions like "Are there any interesting signals in this data?", and I don't think asking Siri via your iPhone would get you a more useful answer today.

So, how are how are your going to 'give a computer a task'? I presume at the very least a high-level meta-specification of what a task is - in a machine-independent format and language, of course. Internet access to computer-archivists of libraries of known signal-processing algorithms?

Or are we still living in the era of pre-written machine code being used as the meta-specification of the expected result? In other words, "an 'interesting' signal is a signal found by one of these pre-coded (and pre-tested) algorithms" - as they would have been by my dissertation-piece forty years ago.

In that case, behind all the fancy language, all you're suggesting is that we move from "we need a hole here: this is your spade" to "we need a hole here: there's a toolshed over there, help yourself to whatever tool you prefer". It's progress, but it isn't fundamentally different.


An OpenCL task (Or cuda) can have an "intelligent sub contractor manager in the computer" to distribute the work to the resource best available. It is a deal: this work, you do, in a time limit. Just do. A job can have a suggestion how to do it and some negotiation to whom to give it, but it needs to be done.

A task is just a bunch of bytes.
2) Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers (Message 1795784)
Posted 13 days ago by Profile petri33Project Donor
A computer is a resource. Give it a task. It will finish processing(hopefully/or be aborted after a time limit).
There is no need to know how the task at hand will be processed. Give a computer a queue of work and then give some more so that it does not starve.

With the future HW going to have more processors (CPU-cores/units) and more GPU(s) the one work-item can be distributed to all available resources. No need for N concurrent applications. One serially launched application that processes the one task at best possible way taking in consideration the user experience (no lag) will utilize all GPUs and CPUs. (My vision. I have not started to use sharing fft on multiple GPU's yet.)

Interproject sharing of resources would happen at (one)task level. Do N of those and then do M of those and then ....
3) Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers (Message 1795780)
Posted 13 days ago by Profile petri33Project Donor
Whether or not Boinc clients will know what to do with simplified truly heterogeneous apps is another question.

What would (ideally) BOINC do except 'launch and wait', as now?


In a 'truly' (note the qualification) heterogeneous environment, the client should not care (or need to know) if the task is processed on CPU, multiple threads, GPU, Multiple GPUs, Multiple Hosts via MPI, FPGAs, DSPs, or a room full of monkeys with abacuses, and/or if there are dynamically changing conditions during the run. The estimate (and so scheduler and client app control) mechanisms in particular are prone to upset (i.e. are unstable) when hardware change occurs (along with other 'used-to-be-weird' situations that are becoming more normal)


Thank You Jason,

That reminds me of the days way back 30 years when I laughed when someone said that the IP protocol allows for a taxi driven tapes to be delivered and give a nice good steady bits/second output. I know, I laughed and I was wrong. I was young. All that was said was true. The IP protocol allows for that. Send a taxi full of hard drives with modern capacity and no optical or any other transfer meas can surpass that. The reply for a successful transfer comes back with the taxi. The responsiviness of that system is not too good. A.k.a low latency is a dream.

We're going to a new world.
4) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1795776)
Posted 13 days ago by Profile petri33Project Donor
.

The slowdown in low ar comes from having a (only) one full pot to process.
That kind of makes all work to go to one SM/SMX unit.

More precisely as I explained in post on Lunatics (sadly they came down now) it's 8 independend arrays to search. 8 PoT arrays. Depending on workgroup you chose all of them, indeed, can go to only single CU (in OpenCL terms) that corresponds SM/SMx in NV/CUDA terms. One can artifically distribute it to 8 different CUs by appropriate workgroup limits, but this will underload each of CUs of course.
Another way is to unroll some of periods to provide more data to process inparallel.


My errors are probably from calculating the average from an artifically shortened pot. A pre-calculated avg is something I'll try tomorrow or in the next days,

I do avg pre-calculation in Triplet search. It was looked as good idea before. Currently if full signal search decoupling needed this provide additional dependence to get rid of. But cause it's not the single obstacle for full PoT on GPU I don't touch it yet.


then getting rid of the loop doing form LastP to FirstP and replacing that with grid.z and a parameter.

Not sure it's really possible. Such unroll comes with memory for arrays to hold.
Initially I did fixed-size x32 unroll for periods. Currently it configurable but still much less than total periods num. Max total periods num could be estimated as 2/3*(1024*1024/8). One need to have corresponding amount of memory to hold that number of separate (though little shortened on first iteration) arrays.
Maybe doable with 4GB GPUs? Worth to calculate.


Earlier I tried with 8 streams and other stuff to make more work parallel. It is hard to keep track of found results and to report them in the same order as the CPU version does. You have figured out a way to do that with SoG!!

Few queues (again, in OpenCL terms that correspond to CUDA stream) per single PoT search looks like increase in overhead. One could try few PoT searches in separate queues (not too big memory footprint increase, partially implemented) or even few icfft iterations simultaneously (that would be quite a big rework of existing code and unfortunately sharp increase in memory footprint).
Regarding particular signal order - for not overflowed task it's irrelevant while one have no false positives/negatives. For overflowed task it constitutes real issue. Ironically, they are "noisy" ones that will need separate treatment on postprocessing stage anyway. I decided to sacrifice absolute signal ordering and just attempt to keep differencies as small as possible to reduce numbers of mismatched overflows. Also, seems ordering issue existed even in original CUDA code (though in quite small degree). So I would recommend to concentrate on false positives/negatives more than on signal ordering.
EDIT: BTW, this quite differs from AstroPulse situation where signals (for example, in FFA) are updated for some proximity-establishing algorithm. There if such signal updates come in wrong order even non-overflow task will have wrong final signals. That took lot of time and some tricks in code to keep all found in parallel signals in order. One of released builds even grow in memory to some huge sizes because of this.


Thank You for a detailed explanation. I'll read it again and again at least three times or until I get all that is in it. Thank You.
5) Message boards : Number crunching : For those interested in electic bill and power consumption of GTX1080 (Message 1795511)
Posted 14 days ago by Profile petri33Project Donor
GPU at 1825 MHz for the 1080
and
at 1300 MHz for the 980's.

Good to see.

And (roughly) how many WU/s per hour are they processing?


First number is gtx1080 and the latter is a 980.
High ar: 46 - 67 seconds per task. (78 to 53 per hour)
Mid ar: 132-209 seconds per task. (27 - 17 per hour)
Very Low ar: a way less than 500 seconds per task. (approx 215 240 sec to 360sec). (17-12 guppi per hour)

So the avg depends on the current Arecibo/guppi/whatever ratio.

and for the next questions an answer in advance:
AP:s see a 25% speedup with a gtx1080 compared to the gtx980. I have not had time to optimize the code for the gtx1080 yet.
6) Message boards : Number crunching : For those interested in electic bill and power consumption of GTX1080 (Message 1795509)
Posted 14 days ago by Profile petri33Project Donor
And the software is heavily modified, so that the numbers can be compared only against eachother, but not with a current official software. I'm running my own builds and modifications.
7) Message boards : Number crunching : For those interested in electic bill and power consumption of GTX1080 (Message 1795508)
Posted 14 days ago by Profile petri33Project Donor
For those interested in electic bill and power consumption of GTX1080

Thanks for that.

20W less for Guppies, 10W less for non-Guppies; although I notice the GPU utilisation on the GTX1080 is slightly less than the other cards, the maximum power rating for that card is still less than the maximum for the GTX 980s.

All are running at their boost frequency?
And how many WU/hr for the GTX 1080 v GTX 980?


GPU at 1825 MHz for the 1080
and
at 1300 MHz for the 980's.

Mem clocks 7210 MHz for the 780's and 10010 MHz for the 1080.

Some boost set up here and there but not too much.
8) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1795504)
Posted 14 days ago by Profile petri33Project Donor


or something else since I'm not runnig OpenCL.

Cause SoG is modification of OpenCL build - obviously something else.

Did you consider to put that build on beta ?


It is still an alpha, but I have sent code to JasonG and TBar for more serious testing.

The slowdown in low ar comes from having a (only) one full pot to process. That kind of makes all work to go to one SM/SMX unit. My errors are probably from calculating the average from an artifically shortened pot. A pre-calculated avg is something I'll try tomorrow or in the next days, then getting rid of the loop doing form LastP to FirstP and replacing that with grid.z and a parameter. Earlier I tried with 8 streams and other stuff to make more work parallel. It is hard to keep track of found results and to report them in the same order as the CPU version does. You have figured out a way to do that with SoG!!
9) Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows (Message 1795499)
Posted 14 days ago by Profile petri33Project Donor
What is interesting in this aspect - behavior of SoG/non-SoG on Linux platform:
ux/x86_64 8.10 (opencl_nvidia_sah) 18 May 2016, 1:10:51 UTC 1,460 GigaFLOPS
Linux/x86_64 8.10 (opencl_nvidia_SoG) 18 May 2016, 1:10:51 UTC 1,718 GigaFLOPS

SoG only marginaly faster while on Windows it leads with ~2 fold magnitude.

Can anyone running SoG on Linux post its typical stderr header?


Something like this?
Name blc2_2bit_guppi_57451_63021_HIP116936_OFF_0004.12280.0.17.26.210.vlar_1 Workunit 2182371096 Created 11 Jun 2016, 11:22:35 UTC Sent 11 Jun 2016, 17:50:31 UTC Report deadline 3 Aug 2016, 22:50:13 UTC Received 11 Jun 2016, 22:47:22 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 7475713 Run time 4 min 57 sec CPU time 54 sec Validate state Valid Credit 146.01 Device peak FLOPS 13,313.28 GFLOPS Application version SETI@home v8 Anonymous platform (NVIDIA GPU)


or this (from the same unit)

Stderr output <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 3 CUDA device(s): Device 1: Graphics Device, 8113 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 20 pciBusID = 2, pciSlotID = 0 Device 2: GeForce GTX 980, 4036 MiB, regsPerBlock 65536 computeCap 5.2, multiProcs 16 pciBusID = 1, pciSlotID = 0 Device 3: GeForce GTX 980, 4037 MiB, regsPerBlock 65536 computeCap 5.2, multiProcs 16 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 2 setiathome_CUDA: CUDA Device 2 specified, checking... Device 2: GeForce GTX 980 is okay SETI@home using CUDA accelerated device GeForce GTX 980 setiathome v8 enhanced x41p_zi, Cuda 7.50 special Compiled with NVCC 7.5, using 6.5 libraries. Modifications done by petri33. Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.008659


or something else since I'm not runnig OpenCL.
[/pre]
10) Message boards : Number crunching : For those interested in electic bill and power consumption of GTX1080 (Message 1795470)
Posted 14 days ago by Profile petri33Project Donor
Here is a screenshot. It is taken when all 3 gpu's are running guppi vlar on CUDA. The middle one is a GTX1080. Current NVIDIA linux driver does not name it correctly.
|-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 980 On | 0000:01:00.0 On | N/A | | 44% 65C P0 121W / 230W | 1009MiB / 4036MiB | 98% Default | +-------------------------------+----------------------+----------------------+ | 1 Graphics Device On | 0000:02:00.0 Off | N/A | | 43% 63C P0 104W / 215W | 812MiB / 8113MiB | 91% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 980 On | 0000:03:00.0 Off | N/A | | 41% 59C P0 132W / 230W | 763MiB / 4037MiB | 93% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 855 G /usr/bin/X 186MiB | | 0 1433 G compiz 59MiB | | 0 4366 C ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8 757MiB | | 2 4370 C ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8 757MiB | +-----------------------------------------------------------------------------+

And the executable name is different from the actual version since I have a makefile that I update manually and I haven't changed the executable name accordingly.


And now they are running some non guppi work.
|-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 980 On | 0000:01:00.0 On | N/A | | 45% 69C P0 137W / 230W | 1021MiB / 4036MiB | 91% Default | +-------------------------------+----------------------+----------------------+ | 1 Graphics Device On | 0000:02:00.0 Off | N/A | | 45% 68C P0 125W / 215W | 812MiB / 8113MiB | 84% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 980 On | 0000:03:00.0 Off | N/A | | 43% 62C P0 138W / 230W | 763MiB / 4037MiB | 90% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 855 G /usr/bin/X 198MiB | | 0 1433 G compiz 59MiB | | 0 5411 C ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8 757MiB | | 2 5407 C ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8 757MiB | +-----------------------------------------------------------------------------+
11) Message boards : Number crunching : Nvidia-Ubuntu 14.04 fail with latest kernel (Message 1795462)
Posted 14 days ago by Profile petri33Project Donor
I managed to install the official NVIDIA latest driver to my ubuntu 15 for the first time ever. I had to apt-get purge/remove all nvidia*, uninstall lighdm, try kde and uninstall that and then install lightdm and nvidia drivers. It took me a week and I did not write down everything I did. But it is doable.

Petri (Pee Tree)
12) Message boards : Number crunching : Are some gpu tasks longer now? (Message 1795212)
Posted 15 days ago by Profile petri33Project Donor
Hopefully things will work out.
BTW, I just downloaded another copy of the sah_v7_opt folder and I'm still getting the same error with the PetriR_raw2 files;
Undefined symbols for architecture x86_64: "cudaAcc_GetAutoCorrelation(float*, int, int)", referenced from: seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o "cudaAcc_FindAutoCorrelations(int, int)", referenced from: seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1


Probably one of the first things I'll end up looking at, because the autocorrelation streamlining is one of the safest areas, and should give a near constant improvement at all angle ranges.


@TBar: analyzeFuncs.cpp in main folder does not see the cuda/cudaAcc_autocorr.cu GetAutoCorrelations-function correctly defined. The linker expects to find the function, having as a parameter a pointer to a float (or an array of floats), and integer and an integer. I'd try 'make clean' and then 'make' to build all from the scratch. If that doesn't work I'd try to find a duplicate definition for the function (in a *.h file).

p.s. I've been a week away from my home (and shut down my computer for the time) and haven't had a time to look at any new (old to me) source that has been published. I've been busy making my rig to recognize a gtx1080 properly under Linux and then getting any 'acceptable' results from my modified code. I had to revert back to pr_zi. That is the one most of You are running if You are experimenting. One good thing. It still works - with the new hardware too.

p.p.s I'm sorry that I can't give You an acceptable working executable/source for the NVIDIA-cuda platform at the moment. But, I'm an optimistic. There is a whole summer (in the northern hemisphere) time to make things work all OK.

p.p.p.s And having JasonG there backing/leading up I'm sure a Superb Cuda Build will emerge.
13) Message boards : Number crunching : Are some gpu tasks longer now? (Message 1795193)
Posted 15 days ago by Profile petri33Project Donor
But I have a suspicion that the newer and larger the GPU, the greater the slowdown. I'll try and test that next time I have a gap between GPUGrid tasks on my GTX 970.

You are right. Low ar makes pulsefinding to run on one SM/SMX on NVIDIA GPU's. When PoTLen == PulsePoTLen the work can not be (currently) divided to all SM units. So the hit is 16x on 980, 12x on 780, 5x on 750, etc. depending on the number of SM units on the GPU.

I have done some experimenting with my 1080 and it runs guppi vlar units in about 200-300 seconds. But is has an issue with not finding all pulses or finding too many pulses.

Then you might like to look at another 'suspicion' of mine. This would be much harder to demonstrate in numbers.

When two cuda50 tasks are running on the same GPU, fairly obviously, one will have started before the other - by anything between a fraction of a second and several minutes. It seems to me that the first to start consistently runs faster. This property is inheritable: when the first starter finishes, the second task becomes the 'first to start' and runs faster. A third task will start, becoming the 'second starter' for the time being, and accordingly run slowly.

I don't think that's purely the result of non-linear progress reporting (progress %age reporting moves more slowly at the start of the task), but it's easy to confuse it with that and I might have been confused. But you might consider the possibility that 'application launch order' might affect queuing, somewhere down the line.


A nice point Richard. But I run only one at a time.
I do though have an explanation or an educated guess.

The whole process is an alternating series of CPU-GPU work. GPU has to finish its work and transfer the data to main memory for CPU. Then the CPU does some post processing. Only after finishing the post processing it asks for more GPU work. I have a feeling that the SOG verion buffers more work and the transfers are eliminated to a minimum.

Explanation (guess) a) The task that has started (first) yields GPU time to other processes at some point of processing and does its own CPU processing (or waiting for a GPU to host [CPU] memory transfer) and is the first in line to begin with a new batch of GPU processing. And it is (almost) always the first to submit new work to the GPU and the later started threads do not get the GPU time slice, but have to wait instead. So the first started process is always in the lead.

Explanation (guess) b) The other explanation is that the processing seems to go faster towards the end. My experience is that when running multiple instances on a GPU that the percentage and the time to finish appear to go the faster the more near the end is. That may be an effect of BOINC, not seti. And if I remember correctly there is an option to set up the boincmgr to a 'linear time display'.

Just My Thoughts. Now I'm going to a Sauna (with beer).
14) Message boards : Number crunching : Are some gpu tasks longer now? (Message 1795121)
Posted 15 days ago by Profile petri33Project Donor

But I have a suspicion that the newer and larger the GPU, the greater the slowdown. I'll try and test that next time I have a gap between GPUGrid tasks on my GTX 970.


You are right. Low ar makes pulsefinding to run on one SM/SMX on NVIDIA GPU's. When PoTLen == PulsePoTLen the work can not be (currently) divided to all SM units. So the hit is 16x on 980, 12x on 780, 5x on 750, etc. depending on the number of SM units on the GPU.

I have done some experimenting with my 1080 and it runs guppi vlar units in about 200-300 seconds. But is has an issue with not finding all pulses or finding too many pulses.
15) Message boards : Number crunching : GPU Wars 2016: News & Rumors (Message 1792063)
Posted 26 days ago by Profile petri33Project Donor
799€.


Perrrkele...

;)



My thoughts exactly, but one can not help himself when something this interesting is available. If the performance/noise/temps/power do match the price I can send it back.
16) Message boards : Number crunching : GPU Wars 2016: News & Rumors (Message 1792023)
Posted 26 days ago by Profile petri33Project Donor
Dayum, so 1070 is possibly a 970/980(non-ti) upgrade at half-ish the initial price?.

... and outperforms a GTX 980Ti & the Titan X.


I demand independent reviews, lol

Or better yet, personal inspection?


My computer parts dealer had 5 gtx1080's on shelf. I ordered one. Should take 1-3 days to arrive.The card is Inno3D Founder Edition (aka slow). 799€.
17) Message boards : Number crunching : Why need 5 different stock AMD OpenCL GPU applications? (Message 1790376)
Posted 25 May 2016 by Profile petri33Project Donor
Well,
There is a haystack of different needles and straws. To attack them best in the long run ....
18) Message boards : Number crunching : GPU Wars 2016: News & Rumors (Message 1788937)
Posted 20 May 2016 by Profile petri33Project Donor
Probably tomorrow I start gathering my various test pieces, so that there is some sortof organised pack for whoever the lucky first to get one is, can extract some meaningful numbers.

Good to hear.

+1
19) Message boards : Number crunching : I've Built a Couple OSX CUDA Apps... (Message 1788929)
Posted 20 May 2016 by Profile petri33Project Donor
This is what it shows with just the Two 950s in Yosemite;

Fri May 20 00:54:55 2016 | | Starting BOINC client version 7.6.32 for x86_64-apple-darwin
Fri May 20 00:54:55 2016 | | CUDA: NVIDIA GPU 0: Graphics Device (driver version 7.5.29, CUDA version 7.5, compute capability 5.2, 2048MB, 1874MB available, 2022 GFLOPS peak)
Fri May 20 00:54:55 2016 | | CUDA: NVIDIA GPU 1: Graphics Device (driver version 7.5.29, CUDA version 7.5, compute capability 5.2, 2048MB, 1825MB available, 2022 GFLOPS peak)
Fri May 20 00:54:55 2016 | | OpenCL: NVIDIA GPU 0: Graphics Device (driver version 10.5.2 346.02.03f06, device version OpenCL 1.2, 2048MB, 1874MB available, 2022 GFLOPS peak)
Fri May 20 00:54:55 2016 | | OpenCL: NVIDIA GPU 1: Graphics Device (driver version 10.5.2 346.02.03f06, device version OpenCL 1.2, 2048MB, 1825MB available, 2022 GFLOPS peak)
Fri May 20 00:54:55 2016 | | OpenCL CPU: Intel(R) Xeon(R) CPU E5472 @ 3.00GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
Fri May 20 00:54:55 2016 | SETI@home | Found app_info.xml; using anonymous platform
Fri May 20 00:54:55 2016 | SETI@home Beta Test | Found app_info.xml; using anonymous platform
Fri May 20 00:54:56 2016 | | Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU E5472 @ 3.00GHz [x86 Family 6 Model 23 Stepping 6]
Fri May 20 00:54:56 2016 | | OS: Mac OS X 10.10.5 (Darwin 14.5.0)

That's great, IF you only want to use Two cards. Now...why would you only want to use Two cards?
It would be nice If BOINC could handle 3 or maybe 4 cards without going Bonkers.
It does handle Three 750TI without any trouble...even Three ATI cards work. Try it with Two 950s & a 750TI though.


How about these environment variables: CUDA_​VISIBLE_​DEVICES and CUDA_​DEVICE_​ORDER ?

See http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars

Id try to export CUDA_​VISIBLE_​DEVICES=0,1,2 with all permutations and try to launch BOINC after each change. The order of devices may have an effect.

CUDA_​VISIBLE_​DEVICES=1,0,2 would probably list gtx750ti first.

Petri
20) Message boards : Number crunching : GPU Wars 2016: News & Rumors (Message 1788698)
Posted 19 May 2016 by Profile petri33Project Donor
That sounds like a great plan if it can be achieved. Go Man Go! ;-)


I'm doing some testing and trying and I think that Jason will do the whole thing right.

You can get a sneak peek preview here. It is an ar 0.42 task in 164 seconds. There are some high ar tasks that take less than 60 seconds. The guppi vlars take about 700 seconds and need some more optimizing.

Oh how I wish I could get one of those GTX1080's ....


Next 20

Copyright © 2016 University of California