CUDA Toolkit 8.0 Available for Developers

Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1795780 - Posted: 12 Jun 2016, 21:05:01 UTC - in response to Message 1794375.  

Whether or not Boinc clients will know what to do with simplified truly heterogeneous apps is another question.

What would (ideally) BOINC do except 'launch and wait', as now?


In a 'truly' (note the qualification) heterogeneous environment, the client should not care (or need to know) if the task is processed on CPU, multiple threads, GPU, Multiple GPUs, Multiple Hosts via MPI, FPGAs, DSPs, or a room full of monkeys with abacuses, and/or if there are dynamically changing conditions during the run. The estimate (and so scheduler and client app control) mechanisms in particular are prone to upset (i.e. are unstable) when hardware change occurs (along with other 'used-to-be-weird' situations that are becoming more normal)


Thank You Jason,

That reminds me of the days way back 30 years when I laughed when someone said that the IP protocol allows for a taxi driven tapes to be delivered and give a nice good steady bits/second output. I know, I laughed and I was wrong. I was young. All that was said was true. The IP protocol allows for that. Send a taxi full of hard drives with modern capacity and no optical or any other transfer meas can surpass that. The reply for a successful transfer comes back with the taxi. The responsiviness of that system is not too good. A.k.a low latency is a dream.

We're going to a new world.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1795780 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1795784 - Posted: 12 Jun 2016, 21:21:36 UTC

A computer is a resource. Give it a task. It will finish processing(hopefully/or be aborted after a time limit).
There is no need to know how the task at hand will be processed. Give a computer a queue of work and then give some more so that it does not starve.

With the future HW going to have more processors (CPU-cores/units) and more GPU(s) the one work-item can be distributed to all available resources. No need for N concurrent applications. One serially launched application that processes the one task at best possible way taking in consideration the user experience (no lag) will utilize all GPUs and CPUs. (My vision. I have not started to use sharing fft on multiple GPU's yet.)

Interproject sharing of resources would happen at (one)task level. Do N of those and then do M of those and then ....
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1795784 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1795795 - Posted: 12 Jun 2016, 22:18:55 UTC - in response to Message 1795784.  

No need for N concurrent applications. One serially launched application that processes the one task at best possible way taking in consideration the user experience (no lag) will utilize all GPUs and CPUs.


I've been thinking that would be the ideal (or even one launched application that is responsible for the feeding & returning of results from multiple GPUs).
Watching the effect of different types of MB WUs running on the same GPU, then the effect of 1 or 2 Guppies on the processing of 2 or 1 Arecibo WUs makes me think it would be a near impossible task to optimise things to process such different types of work simultaneously and give maximum throughput.
Running each WU one at a time, with the optimal environment for that type of WU, would result in the highest throughput.
Grant
Darwin NT
ID: 1795795 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1795798 - Posted: 12 Jun 2016, 22:27:45 UTC - in response to Message 1795784.  

You have to define the word 'work'.

You could tell a human being to dig a hole or build a house, and there are enough pre-assimilated cultural reference points that what you get will probably bear some resemblance to what you imagined in the first place. Tell a human being to build an airplane (and nothing else): what are the chances that it will pass CAA certification and that you would be willing to let it take you to your next holiday destination?

We are giving our computers the task of finding 'interesting' signals in a slice from a radio-frequency recording. Maybe some computers are capable of 'understanding' what human beings would deem to be an 'interesting' signal (or, indeed, what another computer would deem to be an interesting signal - which may or may not be the same thing). But I don't think the one on my desk can do that yet.

Forty years ago, I set to to write "A program to detect, tabulate and/or plot the 'interesting' parts of a function". That effort resulted in a deck of punched cards, a typed report, and a Diploma in Computer Science. The first two chapters of the report are devoted to consideration of what might be 'interesting', and how to present a meaningful (and digestible) response back to the provider of the function under investigation. But let's be honest: the output of the resulting program only expressed what was interesting to me, the programmer. Computers forty years ago hadn't yet become able to intuit answers to questions like "Are there any interesting signals in this data?", and I don't think asking Siri via your iPhone would get you a more useful answer today.

So, how are how are your going to 'give a computer a task'? I presume at the very least a high-level meta-specification of what a task is - in a machine-independent format and language, of course. Internet access to computer-archivists of libraries of known signal-processing algorithms?

Or are we still living in the era of pre-written machine code being used as the meta-specification of the expected result? In other words, "an 'interesting' signal is a signal found by one of these pre-coded (and pre-tested) algorithms" - as they would have been by my dissertation-piece forty years ago.

In that case, behind all the fancy language, all you're suggesting is that we move from "we need a hole here: this is your spade" to "we need a hole here: there's a toolshed over there, help yourself to whatever tool you prefer". It's progress, but it isn't fundamentally different.
ID: 1795798 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1795815 - Posted: 12 Jun 2016, 23:04:17 UTC - in response to Message 1795798.  
Last modified: 12 Jun 2016, 23:05:32 UTC

You have to define the word 'work'.

You could tell a human being to dig a hole or build a house, and there are enough pre-assimilated cultural reference points that what you get will probably bear some resemblance to what you imagined in the first place. Tell a human being to build an airplane (and nothing else): what are the chances that it will pass CAA certification and that you would be willing to let it take you to your next holiday destination?

We are giving our computers the task of finding 'interesting' signals in a slice from a radio-frequency recording. Maybe some computers are capable of 'understanding' what human beings would deem to be an 'interesting' signal (or, indeed, what another computer would deem to be an interesting signal - which may or may not be the same thing). But I don't think the one on my desk can do that yet.

Forty years ago, I set to to write "A program to detect, tabulate and/or plot the 'interesting' parts of a function". That effort resulted in a deck of punched cards, a typed report, and a Diploma in Computer Science. The first two chapters of the report are devoted to consideration of what might be 'interesting', and how to present a meaningful (and digestible) response back to the provider of the function under investigation. But let's be honest: the output of the resulting program only expressed what was interesting to me, the programmer. Computers forty years ago hadn't yet become able to intuit answers to questions like "Are there any interesting signals in this data?", and I don't think asking Siri via your iPhone would get you a more useful answer today.

So, how are how are your going to 'give a computer a task'? I presume at the very least a high-level meta-specification of what a task is - in a machine-independent format and language, of course. Internet access to computer-archivists of libraries of known signal-processing algorithms?

Or are we still living in the era of pre-written machine code being used as the meta-specification of the expected result? In other words, "an 'interesting' signal is a signal found by one of these pre-coded (and pre-tested) algorithms" - as they would have been by my dissertation-piece forty years ago.

In that case, behind all the fancy language, all you're suggesting is that we move from "we need a hole here: this is your spade" to "we need a hole here: there's a toolshed over there, help yourself to whatever tool you prefer". It's progress, but it isn't fundamentally different.


An OpenCL task (Or cuda) can have an "intelligent sub contractor manager in the computer" to distribute the work to the resource best available. It is a deal: this work, you do, in a time limit. Just do. A job can have a suggestion how to do it and some negotiation to whom to give it, but it needs to be done.

A task is just a bunch of bytes.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1795815 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1795819 - Posted: 12 Jun 2016, 23:39:55 UTC - in response to Message 1795815.  

A task is just a bunch of bytes.

On that level, a computer program is just a bunch of bytes, too. But a complete specification of a task comprises both the data bytes, and the processing bytes, even if expressed as a choice rather than a prescription.
ID: 1795819 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1795822 - Posted: 12 Jun 2016, 23:46:30 UTC
Last modified: 12 Jun 2016, 23:46:51 UTC

The real issue with hybrid approach is data dependence.
Even if some part better to do on CPU and another - on GPU there is no sense to split parts between devices if those parts interrelated. Transmission will kill all possible speedup.
I learnt it in hard way when attempted to do Brook+ AP's main loop part on GPU w/o having Brook+ FFT lib available. To do only SETI-own part of search on GPU but FFT on CPU gave good slowdown instead of any speedup.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1795822 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1795934 - Posted: 13 Jun 2016, 17:24:15 UTC - in response to Message 1795819.  
Last modified: 13 Jun 2016, 17:28:34 UTC

A task is just a bunch of bytes.

On that level, a computer program is just a bunch of bytes, too. But a complete specification of a task comprises both the data bytes, and the processing bytes, even if expressed as a choice rather than a prescription.


Well not quite. The computation complexity may remain order N nomatter what program/hardware, but the communications (i.e .memory) complexity may be reduced to zero by having infinite registers. In that sense, for cobblestone scale we only award computation, since memory transactions are more or less arbitrary. [i.e. the defined+paid work is computational work, rather than communications work]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1795934 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1795935 - Posted: 13 Jun 2016, 17:29:51 UTC - in response to Message 1795780.  

We're going to a new world.


Oh glad you see it too. I think I'm proud to live in these times.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1795935 · Report as offensive
Previous · 1 · 2 · 3

Message boards : Number crunching : CUDA Toolkit 8.0 Available for Developers


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.