GPUs dethroned: Go DSP!

Author	Message
ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20323 Credit: 7,508,002 RAC: 20	Message 1174418 - Posted: 28 Nov 2011, 23:22:57 UTC Last modified: 28 Nov 2011, 23:24:41 UTC We've had GPUs and CUDA (and soon OpenCL) take the number crunching by storm... DSP number-crunching add-on PCI cards next? TI throws DSPs at supercomputers ... so a four-chip PCI-Express card will deliver 2 Tera-FLOPS* of single-precision oomph in under 200 watts of total power...* The big advantage with those looks to be very low power to give very efficient number crunching for a TFLOP at a time or so now, with a fast ramp-up to faster things soon... Kinda makes our present GigaFLOPS look a bit hot and flustered and lame... The big question is when? And will they take off? But what will CUDA and OpenCL be doing on GPUs soon? Happy fast crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 1174418 ·

zoom3+1=4 Volunteer tester Send message Joined: 30 Nov 03 Posts: 65759 Credit: 55,293,173 RAC: 49	Message 1174427 - Posted: 29 Nov 2011, 0:06:15 UTC - in response to Message 1174418. We've had GPUs and CUDA (and soon OpenCL) take the number crunching by storm... DSP number-crunching add-on PCI cards next? TI throws DSPs at supercomputers ... so a four-chip PCI-Express card will deliver 2 Tera-FLOPS* of single-precision oomph in under 200 watts of total power...* The big advantage with those looks to be very low power to give very efficient number crunching for a TFLOP at a time or so now, with a fast ramp-up to faster things soon... Kinda makes our present GigaFLOPS look a bit hot and flustered and lame... The big question is when? And will they take off? But what will CUDA and OpenCL be doing on GPUs soon? Happy fast crunchin', Martin Now there's an old concept, with a twist(PCI-E), the Atari Falcon had a Motorola 56001(@ 32MHz) stock. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's ID: 1174427 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1174431 - Posted: 29 Nov 2011, 0:20:35 UTC - in response to Message 1174427. Last modified: 29 Nov 2011, 0:22:01 UTC I was thinking that too, ;-) Claggy Atari Enthusiasts Searching for ET ID: 1174431 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1174438 - Posted: 29 Nov 2011, 0:46:50 UTC In the bigger picture, there's definitely a push to bring flexible DSP, FPGA and many-cored technology to wider adoption. These are traditionally used in dedicated embedded designs. It makes sense that as heterogeneous processing with all sorts of processing nodes becomes feasible, that some of these technologies purpose designed for the sorts of processing we're doing would happily integrate. I think in the long run, as the costs lower, you could see some interesting options to try appearing. More Flops for lower power would be a win for us. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1174438 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20323 Credit: 7,508,002 RAC: 20	Message 1174448 - Posted: 29 Nov 2011, 1:24:45 UTC - in response to Message 1174438. Last modified: 29 Nov 2011, 1:26:30 UTC In the bigger picture, ... More Flops for lower power would be a win for us. Indeed so and interesting. What is interesting is why TI are only just now jumping on the 'general purpose compute' band wagon. Their 'new' feature is to bundle multiple copies of their DSPs onto a single chip to add up higher compute numbers, all still for relatively low power. A new market opened up? Or inspiration from how nVidia have hit the big time in hybrid supercomputers with their Fermi/Tesla GPGPUs, and how AMD are quickly moving into a similar architecture with Bulldozer and their 'APU's?... I'm a little surprised it's taken TI this long to start putting together massive compute planes of DSPs. Now that they are, looking good! Meanwhile, I still think FPGAs on their own are far too expensive to waste on floating-point compute. Their advantages are reconfigurability and dedicated logic to outpace software algorithm/data flows. They can win out well for pipelined 'real-time' fixed-point/integer processing. I keep watching the prices but still far too expensive when compared to CPUs/GPUs :-( Shame the Christmas break isn't longer for experimenting! Happy fast crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 1174448 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1174451 - Posted: 29 Nov 2011, 1:43:33 UTC - in response to Message 1174448. Last modified: 29 Nov 2011, 1:45:09 UTC A new market opened up? Or inspiration from how nVidia have hit the big time in hybrid supercomputers with their Fermi/Tesla GPGPUs, and how AMD are quickly moving into a similar architecture with Bulldozer and their 'APU's?... I'd say all factors contribute in some way, and these are just examples of companies doing what they do best to step up to the plate. One educational thing to look at, and think about what it might mean, is the list of companies that are contributing members to the Khronos Group for OpenCL development. It's no secret that the technology pushes are not 'just' aimed at using GPUs for compute, but eventually all manner of devices. When you start including DSPs and FPGAs on add in cards & hopefully motherboards, you're starting to move into hardware territory, which is traditionally very expensive with all sorts of barriers to entry. IMO it's likely those barriers will lift. One small example of what using purpose designed signal processing hardware could achieve, is that typical DSPs have special addressing modes that eliminate the need to move data around in memory while doing FFTs. We need that ;) shuffling stuff around in memory is slow, and trying to work with shuffled data is too error prone & awkward. So by simply flicking a (hardware) register switch that addresses memory in a bit reversed sequence you eliminate a whole bunch of overhead & programming difficulty. That means you can either do the same FFTs with less complexity (Energy etc), or process more in the same power envelope. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1174451 ·

Wembley Volunteer tester Send message Joined: 16 Sep 09 Posts: 429 Credit: 1,844,293 RAC: 0	Message 1174456 - Posted: 29 Nov 2011, 2:51:09 UTC Does anyone know if these new TI DSP's are going to support OpenCL? ID: 1174456 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1174457 - Posted: 29 Nov 2011, 3:03:52 UTC - in response to Message 1174456. Last modified: 29 Nov 2011, 3:04:19 UTC Does anyone know if these new TI DSP's are going to support OpenCL? In a lot of cases like this, and there's more going on than TI's offering, I think the answer will end up they'll have to eventually: From the article Martin posted: The hardware is the easy part, of course. The software stack would be a little more problematic. If TI is serious about using DSPs and ARMs in HPC, it is going to have to come up with something more than support for OpenMP and more like Nvidia's CUDA environment. For now, '...something more than support for OpenMP and more like Nvidia's CUDA environment' means OpenCL. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1174457 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1174460 - Posted: 29 Nov 2011, 3:44:06 UTC I am glad more companies are getting into the idea of using their products for co-processors. However I do hope they can provide something a lot better than what they mention in the article. They do mention long term plans of taking what they have now and making a 512Gflop chip that comes in at 32w, but putting 4 of them on a card for 2Tflops @ ~130w. I hope the cost isn't very high as I could just buy a HD6870 that is rated for 2TFlops @ 151w. With the up coming HD7870 being about 2.9Tflops and 120w I think TI may have some catching up to do in this field. If nothing else they will need to come down in cost quite a lot. The $1000 for the current 512Gflop card and $2000 for the up coming 1Tflop card is a bit high in the cost/flops ratio. One thing TI is normally really good about is support. If they can support it better than the other guys and get people to use it then it doesn't matter if it isn't really as good. Just like their DLP technology. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1174460 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1174712 - Posted: 30 Nov 2011, 18:39:38 UTC Last modified: 30 Nov 2011, 18:42:26 UTC Well, we will just write on OpenCL for another coprocessors. FPGA soon will have OpenCL support. EDIT: link added http://www.altera.com/corporate/news_room/releases/2011/products/nr-opencl.html ID: 1174712 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.