Message boards :
Number crunching :
Are some gpu tasks longer now?
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Just tell us when it's finished, please, and works out of the box. It does. When you will learn how to separate expectations from observations constructive interaction can be resumed. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
Just tell us when it's finished, please, and works out of the box. It doesn't. Something that works out of the box doesn't require the user to manually edit configuration files in order to be able to use it without it impacting negatively on the rest of the system. By all means, provide the option for even greater performance, but make sure to advise the user that doing so will greatly reduce CPU processing of work, or stop it all together in the case of 2 & 4 core machines & make it possible for them to finish that work before it's completion is blocked. Grant Darwin NT |
Miklos M. Send message Joined: 5 May 99 Posts: 955 Credit: 136,115,648 RAC: 73 |
Thank you for pointing them out. I see it here that you posted:blc4_2bit_guppi_57451_26351_HIP69732_0023.15229.0.18.27.31.vlar_1 But when I look at my pages of tasks I do not see any vlar. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
Thank you for pointing them out. I see it here that you posted:blc4_2bit_guppi_57451_26351_HIP69732_0023.15229.0.18.27.31.vlar_1 Looking at your in progress list about 1/3 of them are Guppie VLARs. Grant Darwin NT |
Miklos M. Send message Joined: 5 May 99 Posts: 955 Credit: 136,115,648 RAC: 73 |
I just looked at them by NAME and found them. Thank you everyone. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Just tell us when it's finished, please, and works out of the box. And to all with such attitude: If you want your personal expectations will be met - donate hardware for development, pay for development of features you want, hire own personal programmer. And donate your own time for testing when asked (that is, on beta and alpha). Until then... well, misters "I know better how it should be so do that as I said or I'll not use it". Your advices not actually useful. Want to cooperate - fine. Want to waste my time for reading blaming and spam - I'll start "respect dev's time" company via blacklisting. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
I think it's a great app Raistmer. My GPUs are chewing through the data. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
My GPUs are chewing through the data. That isn't the problem. The problem is the effect it has on systems in it's default stock settings. Grant Darwin NT |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I think it's a great app Raistmer. Thanks for support. SETI apps news We're not gonna fight them. We're gonna transcend them. |
betreger Send message Joined: 29 Jun 99 Posts: 11360 Credit: 29,581,041 RAC: 66 |
Sten I thought you were only going to run Android for the summer. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
You are right. Low ar makes pulsefinding to run on one SM/SMX on NVIDIA GPU's. When PoTLen == PulsePoTLen the work can not be (currently) divided to all SM units. So the hit is 16x on 980, 12x on 780, 5x on 750, etc. depending on the number of SM units on the GPU. I have done some experimenting with my 1080 and it runs guppi vlar units in about 200-300 seconds. But is has an issue with not finding all pulses or finding too many pulses. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
But I have a suspicion that the newer and larger the GPU, the greater the slowdown. I'll try and test that next time I have a gap between GPUGrid tasks on my GTX 970. Then you might like to look at another 'suspicion' of mine. This would be much harder to demonstrate in numbers. When two cuda50 tasks are running on the same GPU, fairly obviously, one will have started before the other - by anything between a fraction of a second and several minutes. It seems to me that the first to start consistently runs faster. This property is inheritable: when the first starter finishes, the second task becomes the 'first to start' and runs faster. A third task will start, becoming the 'second starter' for the time being, and accordingly run slowly. I don't think that's purely the result of non-linear progress reporting (progress %age reporting moves more slowly at the start of the task), but it's easy to confuse it with that and I might have been confused. But you might consider the possibility that 'application launch order' might affect queuing, somewhere down the line. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
When two cuda50 tasks are running on the same GPU, fairly obviously, one will have started before the other - by anything between a fraction of a second and several minutes. It seems to me that the first to start consistently runs faster. This property is inheritable: when the first starter finishes, the second task becomes the 'first to start' and runs faster. A third task will start, becoming the 'second starter' for the time being, and accordingly run slowly. In the Cuda handbook publication, it explains there is only one DMA engine, so some software pipelining needs to happen if multiple threads or processes (with their own threads) want to use the device concurrently. In Petri's case he's raising efficiency and hiding latencies with Cuda streams, such that optimal is a single instance. In my experience the latencies of the simpler model on Linux are smaller to start with. Whether on not these aspects change with Pascal & newer Linux+drivers, no idea as yet. [Edit:] correction Kepler+ have two, but they are different priorities, and probably saturating with many small requests in baseline code + multiple instances/apps. Upping transfer sizes to over 4MiB for Fermi+, and doing some pipelining anyway, will probably improve things down the line. Because the command buffer is shared between engines, applications must “software-pipeline†their requests in different streams... So 'Classic' (Baseline) Cuda code is more likely to 'fight' under the demands of the new tasks. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Would it be possible to make this change to the Baseline App and see if it still had problems finding the correct number of pulses? From my experience the Baseline App is very accurate and might be useful very quickly if all the SMs could be used. Right now it seems the problem with the SIGBUS Errors I was having is related to the OS. The Apps compiled in Mountain Lion don't produce any Errors when compiled with Toolkit 7.5. So, for now it appears the problem with SIGBUS Errors can be avoided. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Possible. This weekend for me is to involve direct comparisons between Petri's modifications and Baseline sources, then injecting the least-risky/widest-compatibility/biggest-impact components. Whether or not the strange pulses are a simple precision change, or a logic breakage somewhere, I won't know for a while. Either way the Logic changes Petri and I chatted about seemed headed down the right path to me, so whatever the weirdness is will likely turn up along the way. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
Which version of the app are you using? The original: MB8_win_x86_SSE3_OpenCL_NV_r3430_SoG.exe or the -use_sleep accommodating one: MB8_win_x86_SSE3_OpenCL_NV_r3430.exe Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
I guess I am very confused then. So you are saying that MB8_win_x86_SSE3_OpenCL_NV_r3430_SoG.exe IS NOT a SoG app, EVEN THOUGH it ships with the <plan_class>opencl_nvidia_SoG</plan_class> in its aistub file??? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I think you might have misread their post. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
...But you might consider the possibility that 'application launch order' might affect queuing, somewhere down the line. Here is some of the detail from the Cuda handbook, that pertains specifically to Windows WDDM (Vista+ drivers): ...On WDDM, if there are applications competing for time on the same GPU, Windows can and will swap memory objects out in order to enable each application to run. The Windows operating system tries to make this as efficient as possible, but as with all paging, having it never happen is much faster than having it ever happen. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
But I have a suspicion that the newer and larger the GPU, the greater the slowdown. I'll try and test that next time I have a gap between GPUGrid tasks on my GTX 970. A nice point Richard. But I run only one at a time. I do though have an explanation or an educated guess. The whole process is an alternating series of CPU-GPU work. GPU has to finish its work and transfer the data to main memory for CPU. Then the CPU does some post processing. Only after finishing the post processing it asks for more GPU work. I have a feeling that the SOG verion buffers more work and the transfers are eliminated to a minimum. Explanation (guess) a) The task that has started (first) yields GPU time to other processes at some point of processing and does its own CPU processing (or waiting for a GPU to host [CPU] memory transfer) and is the first in line to begin with a new batch of GPU processing. And it is (almost) always the first to submit new work to the GPU and the later started threads do not get the GPU time slice, but have to wait instead. So the first started process is always in the lead. Explanation (guess) b) The other explanation is that the processing seems to go faster towards the end. My experience is that when running multiple instances on a GPU that the percentage and the time to finish appear to go the faster the more near the end is. That may be an effect of BOINC, not seti. And if I remember correctly there is an option to set up the boincmgr to a 'linear time display'. Just My Thoughts. Now I'm going to a Sauna (with beer). To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.