Are some gpu tasks longer now?

Message boards : Number crunching : Are some gpu tasks longer now?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1795194 - Posted: 10 Jun 2016, 19:11:06 UTC - in response to Message 1795158.  


But I have a suspicion that the newer and larger the GPU, the greater the slowdown. I'll try and test that next time I have a gap between GPUGrid tasks on my GTX 970.


You are right. Low ar makes pulsefinding to run on one SM/SMX on NVIDIA GPU's. When PoTLen == PulsePoTLen the work can not be (currently) divided to all SM units. So the hit is 16x on 980, 12x on 780, 5x on 750, etc. depending on the number of SM units on the GPU.

I have done some experimenting with my 1080 and it runs guppi vlar units in about 200-300 seconds. But is has an issue with not finding all pulses or finding too many pulses.

Would it be possible to make this change to the Baseline App and see if it still had problems finding the correct number of pulses? From my experience the Baseline App is very accurate and might be useful very quickly if all the SMs could be used. Right now it seems the problem with the SIGBUS Errors I was having is related to the OS. The Apps compiled in Mountain Lion don't produce any Errors when compiled with Toolkit 7.5. So, for now it appears the problem with SIGBUS Errors can be avoided.


Possible. This weekend for me is to involve direct comparisons between Petri's modifications and Baseline sources, then injecting the least-risky/widest-compatibility/biggest-impact components. Whether or not the strange pulses are a simple precision change, or a logic breakage somewhere, I won't know for a while. Either way the Logic changes Petri and I chatted about seemed headed down the right path to me, so whatever the weirdness is will likely turn up along the way.

Hopefully things will work out.
BTW, I just downloaded another copy of the sah_v7_opt folder and I'm still getting the same error with the PetriR_raw2 files;
Undefined symbols for architecture x86_64:
  "cudaAcc_GetAutoCorrelation(float*, int, int)", referenced from:
      seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o
  "cudaAcc_FindAutoCorrelations(int, int)", referenced from:
      seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1
ID: 1795194 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1795199 - Posted: 10 Jun 2016, 19:46:03 UTC - in response to Message 1795194.  

Hopefully things will work out.
BTW, I just downloaded another copy of the sah_v7_opt folder and I'm still getting the same error with the PetriR_raw2 files;
Undefined symbols for architecture x86_64:
  "cudaAcc_GetAutoCorrelation(float*, int, int)", referenced from:
      seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o
  "cudaAcc_FindAutoCorrelations(int, int)", referenced from:
      seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1


Probably one of the first things I'll end up looking at, because the autocorrelation streamlining is one of the safest areas, and should give a near constant improvement at all angle ranges.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1795199 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1795212 - Posted: 10 Jun 2016, 21:23:50 UTC - in response to Message 1795199.  

Hopefully things will work out.
BTW, I just downloaded another copy of the sah_v7_opt folder and I'm still getting the same error with the PetriR_raw2 files;
Undefined symbols for architecture x86_64:
  "cudaAcc_GetAutoCorrelation(float*, int, int)", referenced from:
      seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o
  "cudaAcc_FindAutoCorrelations(int, int)", referenced from:
      seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1


Probably one of the first things I'll end up looking at, because the autocorrelation streamlining is one of the safest areas, and should give a near constant improvement at all angle ranges.


@TBar: analyzeFuncs.cpp in main folder does not see the cuda/cudaAcc_autocorr.cu GetAutoCorrelations-function correctly defined. The linker expects to find the function, having as a parameter a pointer to a float (or an array of floats), and integer and an integer. I'd try 'make clean' and then 'make' to build all from the scratch. If that doesn't work I'd try to find a duplicate definition for the function (in a *.h file).

p.s. I've been a week away from my home (and shut down my computer for the time) and haven't had a time to look at any new (old to me) source that has been published. I've been busy making my rig to recognize a gtx1080 properly under Linux and then getting any 'acceptable' results from my modified code. I had to revert back to pr_zi. That is the one most of You are running if You are experimenting. One good thing. It still works - with the new hardware too.

p.p.s I'm sorry that I can't give You an acceptable working executable/source for the NVIDIA-cuda platform at the moment. But, I'm an optimistic. There is a whole summer (in the northern hemisphere) time to make things work all OK.

p.p.p.s And having JasonG there backing/leading up I'm sure a Superb Cuda Build will emerge.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1795212 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1795235 - Posted: 10 Jun 2016, 23:01:27 UTC - in response to Message 1795193.  

A nice point Richard. But I run only one at a time.
I do though have an explanation or an educated guess.

The whole process is an alternating series of CPU-GPU work. GPU has to finish its work and transfer the data to main memory for CPU. Then the CPU does some post processing. Only after finishing the post processing it asks for more GPU work. I have a feeling that the SOG verion buffers more work and the transfers are eliminated to a minimum.

Explanation (guess) a) The task that has started (first) yields GPU time to other processes at some point of processing and does its own CPU processing (or waiting for a GPU to host [CPU] memory transfer) and is the first in line to begin with a new batch of GPU processing. And it is (almost) always the first to submit new work to the GPU and the later started threads do not get the GPU time slice, but have to wait instead. So the first started process is always in the lead.

Explanation (guess) b) The other explanation is that the processing seems to go faster towards the end. My experience is that when running multiple instances on a GPU that the percentage and the time to finish appear to go the faster the more near the end is. That may be an effect of BOINC, not seti. And if I remember correctly there is an option to set up the boincmgr to a 'linear time display'.

Just My Thoughts. Now I'm going to a Sauna (with beer).


May or may not be of interest/use, or just a red herring.


I'm presently running the CUDA50 application on my 2*GTX 750Tis, Win10, 353.82 driver, using the -poll option with 1 CPU core reserved for each GPU WU.
I had been running 2WUs at a time till last night where I decided to give 3WUs at a time a go. Even with the loss of 2 more CPU cores output, the increase in WUs per hour from the GPU has offset the CPU loss.

Previously (before Guppies) running 3WUs at a time gave better output than 2 with mid range WUs, and much better output with longer running WUs. However the effect with shorties was a massive increase in processing time. So severe was the increase in shortie processing time that it completely offset any gains from the other WUs and resulted in less work per hour being done, even if there were only the occasional shortie in the mix.


Now with barely any shortie to be seen, and the Guppies long running processing time like the earlier longer running (but non VLAR) WUs, 3 WUs at a time gives the most work per hour.


One thing I have noticed with the Guppies, and is even more noticeable with 3WUs running- when monitoring the GPUs with GPU-z the power used by the GPU varies depending on the Memory Controller load, not the GPU load as it did before the introduction of the Guppies.
The GPU load remains around 98-99% (94% the lowest; no WUs finishing or starting in that period). Power consumption (as a % of TDP) varies between 54% (80% memory controller load) down to 31% (30% memory controller load).

My personal wild arse guess is that the Memory Controller load isn't what's causing the increased power consumption, but that it's just an indicator of the work the GPU is doing around that time when the data is being moved.
Grant
Darwin NT
ID: 1795235 · Report as offensive
Previous · 1 · 2 · 3

Message boards : Number crunching : Are some gpu tasks longer now?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.