Intel Xeon Phi


log in

Advanced search

Message boards : Number crunching : Intel Xeon Phi

1 · 2 · 3 · Next
Author Message
Profile John P Baker
Send message
Joined: 20 Dec 02
Posts: 1
Credit: 1,417,745
RAC: 6,308
United States
Message 1381374 - Posted: 15 Jun 2013, 2:15:13 UTC

Are there any plans for SETI@HOME to support the Intel Xeon Phi coprocessors?

If so, will it be able to support multiple coprocessors installed in a single workstation?
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,471,301
RAC: 48,393
Australia
Message 1381376 - Posted: 15 Jun 2013, 2:27:58 UTC - in response to Message 1381374.

Are there any plans for SETI@HOME to support the Intel Xeon Phi coprocessors?

For something that specialised it'll be up to someone to volunteer their services to develop the support for the hardware.
____________
Grant
Darwin NT.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 597
Credit: 133,976,967
RAC: 116,621
United Kingdom
Message 1381453 - Posted: 15 Jun 2013, 10:21:27 UTC - in response to Message 1381376.

Are there any plans for SETI@HOME to support the Intel Xeon Phi coprocessors?

For something that specialised it'll be up to someone to volunteer their services to develop the support for the hardware.

We've decided to order one. Unfortunately the model on offer is the 60-core passively-cooled one so I had to convince the boss that we needed to spend as much again on a chassis with forced-air cooling. I have a box it would have dropped into if it was actively cooled; the two-GPU Linux cruncher would have taken it at the expense of one of the GPUs but that's heavily-used for data analysis so not the machine for Phi development.

For sure I'll get BOINC running, if only for the impressive start-up message. S@H is a bit more problematic; the Phi only has 8 GB of memory on-board, and that includes the file-system (with no swap). A quick glance at my Linux boxes shows Astropulse runs about 50 MB and Seti7 runs 40 MB so there's no way to get all 240 threads running a WU each. Intel reckons that two threads/core is the sweet spot for big calculations, though.

There's a chance for parallelism, though, depending on where the bottlenecks are -- several threads each executing one pass through a loop structure (OpenMP).
Also their vector units are much wider (2x?) than the CPUs so the compiler should be able to use that to speed up operations.
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,230,835
RAC: 7,546
Russia
Message 1381486 - Posted: 15 Jun 2013, 12:50:31 UTC - in response to Message 1381453.
Last modified: 15 Jun 2013, 12:51:42 UTC

A quick glance at my Linux boxes shows Astropulse runs about 50 MB and Seti7 runs 40 MB so there's no way to get all 240 threads running a WU each. Intel reckons that two threads/core is the sweet spot for big calculations, though.

Can it execute OpenCL on CPUs ?
If yes then OpenCL MB and AP can be configured to run few own instances each using few dozens of CPU cores, for example. This will reduce memory footprint considerably while using almost all CPU cores (in degree almost all GPU "cores" used now).
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 597
Credit: 133,976,967
RAC: 116,621
United Kingdom
Message 1381511 - Posted: 15 Jun 2013, 14:26:11 UTC - in response to Message 1381486.

A quick glance at my Linux boxes shows Astropulse runs about 50 MB and Seti7 runs 40 MB so there's no way to get all 240 threads running a WU each. Intel reckons that two threads/core is the sweet spot for big calculations, though.

Can it execute OpenCL on CPUs ?
If yes then OpenCL MB and AP can be configured to run few own instances each using few dozens of CPU cores, for example. This will reduce memory footprint considerably while using almost all CPU cores (in degree almost all GPU "cores" used now).

Yes, that's one of the programming models, though we didn't explore it in the workshop I went to at CERN. It's not (yet) as efficient as other models, but it does exist. If things come to fruition I might ask to try your OpenCL code.
____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3386
Credit: 46,230,835
RAC: 7,546
Russia
Message 1381560 - Posted: 15 Jun 2013, 18:20:18 UTC - in response to Message 1381511.

I'm sure that properly written multithreaded app could achieve better performance than OpenCL... but we have no manpower to develop such currently. From other side OpenCL allows to use manycore CPUs out of the box. I hope only few almost cosmetic modification will be required, mostly to allow app to use only part of available CPUs (using all 240 or so cores could be inefficient cause some places in algorithm don't have big parallelism exposed) and device detection.

____________

Profile Michael W.F. Miles
Avatar
Send message
Joined: 24 Mar 07
Posts: 237
Credit: 27,809,439
RAC: 20,493
Canada
Message 1381606 - Posted: 15 Jun 2013, 20:58:00 UTC

I'm a bit confused about the gflop rating of these co processors. My 460 gtx does 1109 gflops peak at least that is what BOINC rates it at. These PHI units are 1200 gflops and one heck of a price tag to go with that power.

What is the difference of using 60 xeon cpus as a co-processor as far as speed goes?

Sounds like these units will be a screamer but then again so are the latest much cheaper 700 series nvidia products



Notes on Intel 7120X

The co-processor does not support MMX, SSEx and AVX instructions
The co-processor includes 8 GB of on-board GDDR5 memory with effective speed 5.5 GT/s
Peak performance is 1220 GFLOPS for double-precision operations
The card does not come with thermal solution


This is a different page with their explanation of what the new units are

For those of you who are unaware as to what “Xeon Phi co-processors” are, they are Intel’s attempt at competing with AMD and Nvidia in offering add-in cards that can provide some serious compute performance. AMD and Nvidia already provide these in the form of the AMD FirePro, Nvidia Quadro and Nvidia Tesla enterprise add-in cards. However, about 9 months ago it emerged that Intel’s Xeon Phi could barely compete with AMD and Nvidia’s last gen graphics architecture design in terms of compute performance. I have yet to see a current generation GPU architecture vs Intel Xeon Phi comparison but you can bet that the current generation GPU architecture wins comfortably and at a lower price. That said GPU compute is likely to be the way forward in terms of cost-efficiency but Intel’s Xeon Phi does perform admirably and Intel is pumping in a lot of money to it


TGDaily gives this story

A leaked Chipzilla roadmap has appeared on the world wide wibble which indicates that Intel is planning to release two new Xeon Phi co-processor cards in May.

In addition to the already announced Xeon Phi 3100-series products due in the first half of the year, it seems that Chipzilla will also unveil new 5100- and 7100-series coprocessors starting this May.

There will be the Xeon Phi 5120D, 3120A, 3120P, 7120P and 7120X. It is not clear what the exact specs of co-processors will be but all of them are based on Knights Corner.

The Xeon Phi 3100-series was supposed to deliver over 1TFLOPS peak double precision performance, which is about as floppy as it gets at the moment. But the Xeon Phi 5120D, 7120P and 7120X coprocessors are even floppier. At present, Intel does not have exact shipment dates for the new products, but expects them to become generally available from May 1, 2013 to July 31, 2013.

The Intel Xeon Phi coprocessor 3100 family is designed for a boffin on a budget as it can run compute-bound workloads such as life science applications and financial simulations. The Intel Xeon Phi 3100 family supports up to 6GB memory at 240GB/s bandwidth. It also has a series of reliability features including memory error correction codes (ECC).

The whole family will run at less than 300W thermal design point (TDP) envelope. The Intel Xeon Phi coprocessor 3100 will cost about $2000.

The Intel Xeon Phi coprocessor 5110P features 60 cores with 4-way simultaneous multi-threading technology and 512KB L2 cache per core, provides additional performance at a lower power envelope.

It can manage 1.01TFLOPS double-precision performance with the wind behind it, and supports 8GB of GDDR5 memory at a higher 320 GB/sec memory bandwidth.

It needs 225 watts TDP, and is a passively cooled Intel Xeon Phi coprocessor 5110P. It is designed for denser computing environments such as the movie industry and energy research. You will be able to pick one of those up for $2,649.


That's a big price tag on that baby

Michael Miles
The Assimilators

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,471,301
RAC: 48,393
Australia
Message 1381620 - Posted: 15 Jun 2013, 21:43:41 UTC - in response to Message 1381606.
Last modified: 15 Jun 2013, 21:44:58 UTC

That's a big price tag on that baby

Checkout the pricing on Telsa cards & pro graphics cards (Quadro/Firepro). The Xeon Phi is comparable.
GPUs are good with workloads that can be paralellised, The Xeon Phi will be excelent for workloads that can be paralellised.

Using Seti as an example- v6 optimised application- A shorty would take a GPU just over 3 minutes, running 2 at a time. On an older CPU it was about 30min. On a Xeon Phi core, it would probably take 2 hours.

The GPU does 2 at a time, the Xeon Phi (with miltiple cores & hyperhthreading) would be doing around 120 (or more) at a time.

So roughly 40/hr for the GPU, 60/hr (or more) for the Xeon Phi. Keep in mind that the application for the GPU has been developed for some time. The application for the Xeon Phi would just be one that was ported over, no optimisations or tweaking.
A specifically built & tweaked application would be much, much faster. Xeon Phis are built specifically to run in high density racks- multiple racks of multiple cards running hundreds of WUs each at time.

And keep in mind this is the first generation of Xeon Phi, GPGPU computing has been around for over 5 generations.
The Xeon Phi has a huge potential in it's field.
____________
Grant
Darwin NT.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 597
Credit: 133,976,967
RAC: 116,621
United Kingdom
Message 1381626 - Posted: 15 Jun 2013, 21:55:41 UTC - in response to Message 1381620.

And keep in mind this is the first generation of Xeon Phi, GPGPU computing has been around for over 5 generations.
The Xeon Phi has a huge potential in it's field.

I'd agree with most of that (apart from the greengrocer's apostrophe...) Let's say I've heard "rumours" about future Xeon Phis, such that it's possible the product line may expand...
____________

Profile Michael W.F. Miles
Avatar
Send message
Joined: 24 Mar 07
Posts: 237
Credit: 27,809,439
RAC: 20,493
Canada
Message 1381631 - Posted: 15 Jun 2013, 22:16:54 UTC - in response to Message 1381620.

Grant SSSF

thank you for you explanation and yes I see where doing 120 at one time would be a huge benefit. Like all new computing items it will take a bit to implement and looking forward to seeing such a beast in the field at a reasonable price/
It would be really nice for GPU to be a part of the windows platform to do all intensive tasks and not just an add on.

Right now rendering 1080p with DVDfab using the gpu I get 500-600 fps encoding speed with just a 460 gtx
I can imagine what the newer beasts can pull off

Michael Miles

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4219
Credit: 1,039,842
RAC: 435
United States
Message 1381673 - Posted: 16 Jun 2013, 1:02:06 UTC - in response to Message 1381606.

I'm a bit confused about the gflop rating of these co processors. My 460 gtx does 1109 gflops peak at least that is what BOINC rates it at. These PHI units are 1200 gflops and one heck of a price tag to go with that power.
...

The Xeon Phi rating is for double precision GFLOPS, the 460 GTX for single precision. In the kind of big scientific applications the Phi is aimed at, that's important, but for SETI single precision is used for almost all calculations in the main inner loops. So in spite of the consumer graphics cards having poor double precision performance they are very suitable here. A Tesla card also has very good double precision performance, and commands a price similar to the Phi.

Each core in the Phi has 512 bit SIMD, so presumably that 1200 GFLOPS reflects 8 double precision (64 bit) operations simultaneously on each core. That would mean single precision GFLOPS of 2400, doing 16 operations simultaneously on each core.
Joe

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 597
Credit: 133,976,967
RAC: 116,621
United Kingdom
Message 1398043 - Posted: 2 Aug 2013, 19:06:39 UTC

<sigh> Someone's screwed up -- twice. First our order for a Xeon Phi system wan't placed due to a "lost" e-mail. Once we noticed that we re-placed the order and finally got word it will be delivered next week. But then I looked more closely at the e-mail history and realised that the order was for just the co-processor, not the necessary forced-cooling server as well!</sigh>
I could put it in our dual-GPU server, but that's heavily used by our grad students. Need to sort it out next week...
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,471,301
RAC: 48,393
Australia
Message 1398086 - Posted: 2 Aug 2013, 20:26:34 UTC - in response to Message 1398043.

<sigh> Someone's screwed up -- twice. First our order for a Xeon Phi system wan't placed due to a "lost" e-mail. Once we noticed that we re-placed the order and finally got word it will be delivered next week. But then I looked more closely at the e-mail history and realised that the order was for just the co-processor, not the necessary forced-cooling server as well!</sigh>
I could put it in our dual-GPU server, but that's heavily used by our grad students. Need to sort it out next week...

At least it'll give you a chance to work some code on it to see which is the best way to proceed.
____________
Grant
Darwin NT.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 597
Credit: 133,976,967
RAC: 116,621
United Kingdom
Message 1398157 - Posted: 3 Aug 2013, 0:05:38 UTC - in response to Message 1398086.

I could put it in our dual-GPU server, but that's heavily used by our grad students. Need to sort it out next week...

At least it'll give you a chance to work some code on it to see which is the best way to proceed.

Tja. Of course we need the Intel compiler suite as well, which *may* have been ordered today. In principle we can compile at CERN on their licence and drag the executables back across La Manche; in practice that's most likely to be unworkable.
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 597
Credit: 133,976,967
RAC: 116,621
United Kingdom
Message 1403930 - Posted: 16 Aug 2013, 16:26:42 UTC

Progress, of a sort. I got the hardware in my hands today. Surprisingly there was nothing else in the box except a paper slip saying we must read the Health and Safety warnings at an Intel website, and a mysterious metal bracket. A while Googling and searching the Intel web-complex eventually turned up information on how to install the required support software and drivers, and a link to download them from. Now we're waiting on the server to do that and, ultimately, the Intel compilers.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,471,301
RAC: 48,393
Australia
Message 1404047 - Posted: 16 Aug 2013, 22:37:44 UTC - in response to Message 1403930.


*eagerly awaiting the first XPhi results*
____________
Grant
Darwin NT.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 597
Credit: 133,976,967
RAC: 116,621
United Kingdom
Message 1404317 - Posted: 17 Aug 2013, 15:58:02 UTC - in response to Message 1404047.


*eagerly awaiting the first XPhi results*

As am I mate, as am I. You realise I'm delaying my return to Godzone to do this for you?
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,471,301
RAC: 48,393
Australia
Message 1404398 - Posted: 17 Aug 2013, 18:45:28 UTC - in response to Message 1404317.


*eagerly awaiting the first XPhi results*

As am I mate, as am I. You realise I'm delaying my return to Godzone to do this for you?

That's, ok.
Don't forget, it'll be months yet before it's warm enough down south to be almost comfortable for habitation.
____________
Grant
Darwin NT.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 597
Credit: 133,976,967
RAC: 116,621
United Kingdom
Message 1424049 - Posted: 4 Oct 2013, 15:46:16 UTC - in response to Message 1403930.

Progress, of a sort. I got the hardware in my hands today. Surprisingly there was nothing else in the box except a paper slip saying we must read the Health and Safety warnings at an Intel website, and a mysterious metal bracket. A while Googling and searching the Intel web-complex eventually turned up information on how to install the required support software and drivers, and a link to download them from. Now we're waiting on the server to do that and, ultimately, the Intel compilers.

Sigh! We thought we had the server coming several weeks ago, but when I noticed a mismatch on the advance copy of the packing slip from the supplier we then realised they'd accidentally re-sent the PS from something we got in June.
"So when's our order coming?"
"Early October."
Monday, another packing slip, for the right box this time. Today, a box arrives. Got it into the lab and unpacked the server. It didn't hit me at first that the front was different to what I'd expected, until I decided to make sure the right number of disks was installed: one disk, two disks, three disks, four disks...
Hang on, four disks, we should only have three! 5, 6, 7, 8, 9, empty slot, 10, 11. Then another penny dropped -- these were 3 TB 3.5" disks, we'd ordered 1 TB 2.5" disks.
A quick look at the packing slip inside the box showed that the machine had been destined for a different college, even though the address slapped on the outside of the box was for us. So, back in the box it goes.
At least we have got the access code to download the compiler.
To be continued...
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5774
Credit: 57,471,301
RAC: 48,393
Australia
Message 1424223 - Posted: 4 Oct 2013, 22:29:20 UTC - in response to Message 1424049.


Doesn't exactly inspire confidence in your chosen supplier.
Mistakes happen, but repeatedly?
____________
Grant
Darwin NT.

1 · 2 · 3 · Next

Message boards : Number crunching : Intel Xeon Phi

Copyright © 2014 University of California