Tesla K40 not recognized as GPU


log in

Advanced search

Message boards : Number crunching : Tesla K40 not recognized as GPU

Author Message
Profile William de Thomas
Volunteer tester
Send message
Joined: 15 May 99
Posts: 17
Credit: 10,918,328
RAC: 9
Puerto Rico
Message 1467482 - Posted: 22 Jan 2014, 23:05:51 UTC

Hello, I just had the privilege of testing out the Tesla K40 on the Nvidia Tesla test drive site. Once I installed the Boinc software on their server and ran seti@home, only the cpu on that server did tasks. I looked in the log and the Teslas showed up (4 of them, nice) but no GPU task were downloaded to the server. I installed Boinc on my server that has Two GTX Titans and I was able to download task for the GPU and I also used Lunatics to help out.

I also installed folding@home and it would not do any GPU task either (on the remote server). GPU Grid did not work either.

I am assuming that it might be a Tesla driver problem. Any feedback on this would be appreciated.
____________

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4179
Credit: 114,459,489
RAC: 141,307
United States
Message 1467493 - Posted: 22 Jan 2014, 23:20:13 UTC
Last modified: 22 Jan 2014, 23:21:27 UTC

I see the card is being displayed on the details page for that machine. It does look odd that the driver version is not listed.
Checking the tops hosts I see a machine with a K20 "NVIDIA Tesla K20c (4095MB) driver: 320.27 OpenCL: 1.01".
While your machine displays "[2] NVIDIA Tesla K40m (4095MB) OpenCL: 1.01".
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5818
Credit: 58,953,331
RAC: 47,952
Australia
Message 1467524 - Posted: 23 Jan 2014, 1:55:25 UTC - in response to Message 1467493.

I see the card is being displayed on the details page for that machine. It does look odd that the driver version is not listed.

That was my first guess- driver issues.
If the driver doesn't support it, it can't be used.
____________
Grant
Darwin NT.

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4992
Credit: 73,407,879
RAC: 15,809
Australia
Message 1467525 - Posted: 23 Jan 2014, 2:08:59 UTC - in response to Message 1467482.
Last modified: 23 Jan 2014, 2:17:01 UTC

Hello, I just had the privilege of testing out the Tesla K40 on the Nvidia Tesla test drive site. Once I installed the Boinc software on their server and ran seti@home, only the cpu on that server did tasks. I looked in the log and the Teslas showed up (4 of them, nice) but no GPU task were downloaded to the server. I installed Boinc on my server that has Two GTX Titans and I was able to download task for the GPU and I also used Lunatics to help out.

I also installed folding@home and it would not do any GPU task either (on the remote server). GPU Grid did not work either.

I am assuming that it might be a Tesla driver problem. Any feedback on this would be appreciated.


As there's no display on these beasties (?) Most likely ordinary Windows drivers won't make sense for it (guessing, I don't have one ;) ). Even if they can work with regular drivers, as you're on Windows 7 which can have multiple drivers without issue, installing the appropriate Tesla Compute Cluster (TCC) Driver should help. Checking with the nvidia-smi utility that TCC mode is engaged would be a good idea. There are advantages there with current applications, since it won't have all the graphics (Windows Display Driver Model, WDDM) stuff in the way.

On the performance side, without all those driver latencies from a graphics driver, it could feasibly crunch significantly faster than a 780 or Titan, with current applications. That's true for the time being, while I am still working on ways to effectively hide all the graphic driver induced latencies for faster GPUs on Windows. In the future that gap may close again due to a combination of app refinement and our limited use of double precision.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile William de Thomas
Volunteer tester
Send message
Joined: 15 May 99
Posts: 17
Credit: 10,918,328
RAC: 9
Puerto Rico
Message 1467704 - Posted: 23 Jan 2014, 14:30:38 UTC

Yeah, I am almost sure it's a driver issue. Due that it is a remote server I don't know the drivers installed on that machine . I do know that I looked into the control panel and they were there and recognized so I imagine that the drivers were installed for those cards.

Thanks for the replies.
____________

Ulrich Metzner
Volunteer tester
Avatar
Send message
Joined: 3 Jul 02
Posts: 984
Credit: 8,481,631
RAC: 2,462
Germany
Message 1467718 - Posted: 23 Jan 2014, 15:02:02 UTC - in response to Message 1467704.

(...) Due that it is a remote server I don't know the drivers installed on that machine . (...)

IIRC there was an issue with GPU processing regarding BOINC on a remote computer?
____________
Aloha, Uli

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8497
Credit: 49,889,786
RAC: 50,933
United Kingdom
Message 1467726 - Posted: 23 Jan 2014, 15:19:32 UTC - in response to Message 1467718.

(...) Due that it is a remote server I don't know the drivers installed on that machine . (...)

IIRC there was an issue with GPU processing regarding BOINC on a remote computer?

Only if the proprietary Microsoft Remote Desktop Protocol is being actively used to access it. If that was the case, the OP would be able to look up driver version numbers etc. just as easily as on his local workstation.

If RDP is inactive, or an alternative remote viewing product is being used, there's no problem.

More to the point, a remote server probably runs (aguably should run, normally) without a local console operator logged in. And a server-trained sysadmin would probably spot that BOINC can be installed in 'protected application execution' mode (aka "as a service"), and choose that option. Either of those two states is enough to prevent GPU crunching with BOINC.

There are BOINC projects who run their BOINC applications internally on Tesla-class GPUs mounted in headless chassis - Einstein with their Atlas cluster, for one. You could ask them how they do it, but I suspect that the answer would be a workstation-class Linux installation, which doesn't help much.

Profile William de Thomas
Volunteer tester
Send message
Joined: 15 May 99
Posts: 17
Credit: 10,918,328
RAC: 9
Puerto Rico
Message 1467793 - Posted: 23 Jan 2014, 17:55:23 UTC

I mentioned to the AMAX contact person the problem but never received an answer back. Don't know if they looked into it.
____________

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,274
RAC: 0
Korea, North
Message 1467813 - Posted: 23 Jan 2014, 18:20:42 UTC

You may have a case of the GPU being to NEW for the project to identify it. FOr instance, look at my computers and you'll see one of them is running a 2048 MB Hawaii GPU. The reality is this GPU is a R9 290X with 4096 MB. Yours may be so different from previous GPU's that BOINC just shrugs its shoulders and moves on.
____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

Profile William de Thomas
Volunteer tester
Send message
Joined: 15 May 99
Posts: 17
Credit: 10,918,328
RAC: 9
Puerto Rico
Message 1467821 - Posted: 23 Jan 2014, 18:42:46 UTC

I thought about that also, it being to new. Thanks
____________

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4095
Credit: 33,041,241
RAC: 7,870
United Kingdom
Message 1467846 - Posted: 23 Jan 2014, 19:31:18 UTC - in response to Message 1467482.
Last modified: 23 Jan 2014, 19:33:54 UTC

Your host managed to grab Anonymous platform CPU and GPU work at 22 Jan 2014, 17:40:43, that then was abandoned at 18:24:22 UTC (Did you do a Remove/Add Project at that time?)
Then it grabbed more CPU work (All VLAR) at 22 Jan 2014, 18:42:26 and 18:47:35 UTC,
then at 18:52:58 UTC all that CPU work was marked as Timed out - no response (did you reset the project, or remove the app_info and attempt to get the work resent as GPU tasks?)
VLAR tasks aren't sent to GPUs because of the extra long computation time, attempting to get them resent will get them marked as Timed out.
at 18:52:59 UTC you were sent a solitary Stock v7.00 (cuda42) task, a day has gone by and that task hasn't been completed and reported yet, (The host hasn't contacted the project since to report the task and ask for more)

I think a bit more patience is required, as plainly it can get work (work supply permitting), whether it can complete it is another matter, at present it looks as if Boinc is not even running on that host,
surely Boinc would have asked for work in the last day, even if it is for more CPU work, and surely would have completed some CPU work, and if the Cuda task had errored, it would have been reported by now,
the only other possibility is that the host is blocked from downloading work from external sites, and won't ask for more until it's downloaded it's existing supply, what does the Event log say?

Claggy

Profile William de Thomas
Volunteer tester
Send message
Joined: 15 May 99
Posts: 17
Credit: 10,918,328
RAC: 9
Puerto Rico
Message 1468148 - Posted: 24 Jan 2014, 12:17:21 UTC

Thanks for the reply. i did remove seti at one point and added GPU Grid and then went back to seti at another time. Since I only had 5 hours testing time I was in a little rush. Folding@home didn't work either with the Tesla K40.

I will receive two Tesla K40 by the end of this month (hopefully) and I will retest and post my observations. They will be installed at my home so I will be able to do a lot of things (testing) on them.

As I mentioned earlier, as it was on a remote server, I couldn't do much. Just wanted to know if anybody had this problem also because the K40 are so new.

Thanks
____________

Batter UpProject donor
Avatar
Send message
Joined: 5 May 99
Posts: 1946
Credit: 24,858,651
RAC: 0
United States
Message 1468362 - Posted: 24 Jan 2014, 18:18:52 UTC

Tesla K40 $5,299.99.
____________

Profile William de Thomas
Volunteer tester
Send message
Joined: 15 May 99
Posts: 17
Credit: 10,918,328
RAC: 9
Puerto Rico
Message 1474077 - Posted: 8 Feb 2014, 4:25:48 UTC

Got the two Tesla K40 running and they are recognized here at home. Seems the remote server was not set up with the correct driver or something. Will post later on the PPD
____________

hancocka
Send message
Joined: 19 May 00
Posts: 10
Credit: 2,508,257
RAC: 0
United Kingdom
Message 1476685 - Posted: 13 Feb 2014, 19:56:31 UTC - in response to Message 1474077.
Last modified: 13 Feb 2014, 20:04:11 UTC

Got the two Tesla K40 running and they are recognized here at home. Seems the remote server was not set up with the correct driver or something. Will post later on the PPD


I've got my hands on two Tesla K40c, what I've noticed is the memory for CUDA is only shown as 4096 MB (4GB), and these are 12GB cards.

OpenCL states shows the memory correctly.

Anyone any tips, on how to quickly configure a single K40 card, with a i7, 6 Core Processor. (12 threads)

(got two machines of the same spec!)

Machine details here http://setiathome.berkeley.edu/show_host_detail.php?hostid=7212790
____________

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4992
Credit: 73,407,879
RAC: 15,809
Australia
Message 1476722 - Posted: 13 Feb 2014, 21:00:24 UTC - in response to Message 1476685.
Last modified: 13 Feb 2014, 21:02:08 UTC

I've got my hands on two Tesla K40c, what I've noticed is the memory for CUDA is only shown as 4096 MB (4GB), and these are 12GB cards.


For this part, on Windows, Seti@Home multibeam (V7) Cuda is a 32 bit program, so is limited to the ~4Gig 32 bit address space (minus a bit for driver components etc).

Switching to 64 bit has been found to be a significant performance penalty on GK110 and earlier (A GPU cost of wider addresses, not CPU host program cost), so is redundant for this application and not distributed.

Fortunately this particular application is never likely to need more than a few hundred Megabytes (excepting possible GBT future datasets, which will have revised applications anyway),

As a driver & OS function, each Cuda instance will see its own 4GB portion of the full video memory, assuming you're on a 64 bit host OS. That means running multiple instances will use as much of the available video memory before scaling advantages drop.

For Linux, the situation is slightly different at the moment, with IIRC difficulties trying to make 32 bit apps work with 64 bit host OS. There for 64 bit the executables are 64 bit, use 64 bit addressing, and so are somewhat slower at this time.

The performance issues using 64 bit executables may or may not change with future builds on both platforms, as more costly latencies elsewhere get minimised and hidden.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile William de Thomas
Volunteer tester
Send message
Joined: 15 May 99
Posts: 17
Credit: 10,918,328
RAC: 9
Puerto Rico
Message 1478725 - Posted: 18 Feb 2014, 12:53:03 UTC - in response to Message 1476685.

Same here. I have two GTX Titans and two Tesla K40 set up on the same machine.
____________

Profile William de Thomas
Volunteer tester
Send message
Joined: 15 May 99
Posts: 17
Credit: 10,918,328
RAC: 9
Puerto Rico
Message 1480613 - Posted: 22 Feb 2014, 12:58:06 UTC

All working well so far.
____________

Message boards : Number crunching : Tesla K40 not recognized as GPU

Copyright © 2014 University of California