Tesla K40 not recognized as GPU

Message boards : Number crunching : Tesla K40 not recognized as GPU
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile William de Thomas
Volunteer tester

Send message
Joined: 15 May 99
Posts: 17
Credit: 15,501,592
RAC: 0
Puerto Rico
Message 1467482 - Posted: 22 Jan 2014, 23:05:51 UTC

Hello, I just had the privilege of testing out the Tesla K40 on the Nvidia Tesla test drive site. Once I installed the Boinc software on their server and ran seti@home, only the cpu on that server did tasks. I looked in the log and the Teslas showed up (4 of them, nice) but no GPU task were downloaded to the server. I installed Boinc on my server that has Two GTX Titans and I was able to download task for the GPU and I also used Lunatics to help out.

I also installed folding@home and it would not do any GPU task either (on the remote server). GPU Grid did not work either.

I am assuming that it might be a Tesla driver problem. Any feedback on this would be appreciated.
ID: 1467482 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1467493 - Posted: 22 Jan 2014, 23:20:13 UTC
Last modified: 22 Jan 2014, 23:21:27 UTC

I see the card is being displayed on the details page for that machine. It does look odd that the driver version is not listed.
Checking the tops hosts I see a machine with a K20 "NVIDIA Tesla K20c (4095MB) driver: 320.27 OpenCL: 1.01".
While your machine displays "[2] NVIDIA Tesla K40m (4095MB) OpenCL: 1.01".
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1467493 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1467524 - Posted: 23 Jan 2014, 1:55:25 UTC - in response to Message 1467493.  

I see the card is being displayed on the details page for that machine. It does look odd that the driver version is not listed.

That was my first guess- driver issues.
If the driver doesn't support it, it can't be used.
Grant
Darwin NT
ID: 1467524 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1467525 - Posted: 23 Jan 2014, 2:08:59 UTC - in response to Message 1467482.  
Last modified: 23 Jan 2014, 2:17:01 UTC

Hello, I just had the privilege of testing out the Tesla K40 on the Nvidia Tesla test drive site. Once I installed the Boinc software on their server and ran seti@home, only the cpu on that server did tasks. I looked in the log and the Teslas showed up (4 of them, nice) but no GPU task were downloaded to the server. I installed Boinc on my server that has Two GTX Titans and I was able to download task for the GPU and I also used Lunatics to help out.

I also installed folding@home and it would not do any GPU task either (on the remote server). GPU Grid did not work either.

I am assuming that it might be a Tesla driver problem. Any feedback on this would be appreciated.


As there's no display on these beasties (?) Most likely ordinary Windows drivers won't make sense for it (guessing, I don't have one ;) ). Even if they can work with regular drivers, as you're on Windows 7 which can have multiple drivers without issue, installing the appropriate Tesla Compute Cluster (TCC) Driver should help. Checking with the nvidia-smi utility that TCC mode is engaged would be a good idea. There are advantages there with current applications, since it won't have all the graphics (Windows Display Driver Model, WDDM) stuff in the way.

On the performance side, without all those driver latencies from a graphics driver, it could feasibly crunch significantly faster than a 780 or Titan, with current applications. That's true for the time being, while I am still working on ways to effectively hide all the graphic driver induced latencies for faster GPUs on Windows. In the future that gap may close again due to a combination of app refinement and our limited use of double precision.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1467525 · Report as offensive
Profile William de Thomas
Volunteer tester

Send message
Joined: 15 May 99
Posts: 17
Credit: 15,501,592
RAC: 0
Puerto Rico
Message 1467704 - Posted: 23 Jan 2014, 14:30:38 UTC

Yeah, I am almost sure it's a driver issue. Due that it is a remote server I don't know the drivers installed on that machine . I do know that I looked into the control panel and they were there and recognized so I imagine that the drivers were installed for those cards.

Thanks for the replies.
ID: 1467704 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1467718 - Posted: 23 Jan 2014, 15:02:02 UTC - in response to Message 1467704.  

(...) Due that it is a remote server I don't know the drivers installed on that machine . (...)

IIRC there was an issue with GPU processing regarding BOINC on a remote computer?
Aloha, Uli

ID: 1467718 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1467726 - Posted: 23 Jan 2014, 15:19:32 UTC - in response to Message 1467718.  

(...) Due that it is a remote server I don't know the drivers installed on that machine . (...)

IIRC there was an issue with GPU processing regarding BOINC on a remote computer?

Only if the proprietary Microsoft Remote Desktop Protocol is being actively used to access it. If that was the case, the OP would be able to look up driver version numbers etc. just as easily as on his local workstation.

If RDP is inactive, or an alternative remote viewing product is being used, there's no problem.

More to the point, a remote server probably runs (aguably should run, normally) without a local console operator logged in. And a server-trained sysadmin would probably spot that BOINC can be installed in 'protected application execution' mode (aka "as a service"), and choose that option. Either of those two states is enough to prevent GPU crunching with BOINC.

There are BOINC projects who run their BOINC applications internally on Tesla-class GPUs mounted in headless chassis - Einstein with their Atlas cluster, for one. You could ask them how they do it, but I suspect that the answer would be a workstation-class Linux installation, which doesn't help much.
ID: 1467726 · Report as offensive
Profile William de Thomas
Volunteer tester

Send message
Joined: 15 May 99
Posts: 17
Credit: 15,501,592
RAC: 0
Puerto Rico
Message 1467793 - Posted: 23 Jan 2014, 17:55:23 UTC

I mentioned to the AMAX contact person the problem but never received an answer back. Don't know if they looked into it.
ID: 1467793 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 1467813 - Posted: 23 Jan 2014, 18:20:42 UTC

You may have a case of the GPU being to NEW for the project to identify it. FOr instance, look at my computers and you'll see one of them is running a 2048 MB Hawaii GPU. The reality is this GPU is a R9 290X with 4096 MB. Yours may be so different from previous GPU's that BOINC just shrugs its shoulders and moves on.


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 1467813 · Report as offensive
Profile William de Thomas
Volunteer tester

Send message
Joined: 15 May 99
Posts: 17
Credit: 15,501,592
RAC: 0
Puerto Rico
Message 1467821 - Posted: 23 Jan 2014, 18:42:46 UTC

I thought about that also, it being to new. Thanks
ID: 1467821 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1467846 - Posted: 23 Jan 2014, 19:31:18 UTC - in response to Message 1467482.  
Last modified: 23 Jan 2014, 19:33:54 UTC

Your host managed to grab Anonymous platform CPU and GPU work at 22 Jan 2014, 17:40:43, that then was abandoned at 18:24:22 UTC (Did you do a Remove/Add Project at that time?)
Then it grabbed more CPU work (All VLAR) at 22 Jan 2014, 18:42:26 and 18:47:35 UTC,
then at 18:52:58 UTC all that CPU work was marked as Timed out - no response (did you reset the project, or remove the app_info and attempt to get the work resent as GPU tasks?)
VLAR tasks aren't sent to GPUs because of the extra long computation time, attempting to get them resent will get them marked as Timed out.
at 18:52:59 UTC you were sent a solitary Stock v7.00 (cuda42) task, a day has gone by and that task hasn't been completed and reported yet, (The host hasn't contacted the project since to report the task and ask for more)

I think a bit more patience is required, as plainly it can get work (work supply permitting), whether it can complete it is another matter, at present it looks as if Boinc is not even running on that host,
surely Boinc would have asked for work in the last day, even if it is for more CPU work, and surely would have completed some CPU work, and if the Cuda task had errored, it would have been reported by now,
the only other possibility is that the host is blocked from downloading work from external sites, and won't ask for more until it's downloaded it's existing supply, what does the Event log say?

Claggy
ID: 1467846 · Report as offensive
Profile William de Thomas
Volunteer tester

Send message
Joined: 15 May 99
Posts: 17
Credit: 15,501,592
RAC: 0
Puerto Rico
Message 1468148 - Posted: 24 Jan 2014, 12:17:21 UTC

Thanks for the reply. i did remove seti at one point and added GPU Grid and then went back to seti at another time. Since I only had 5 hours testing time I was in a little rush. Folding@home didn't work either with the Tesla K40.

I will receive two Tesla K40 by the end of this month (hopefully) and I will retest and post my observations. They will be installed at my home so I will be able to do a lot of things (testing) on them.

As I mentioned earlier, as it was on a remote server, I couldn't do much. Just wanted to know if anybody had this problem also because the K40 are so new.

Thanks
ID: 1468148 · Report as offensive
Batter Up
Avatar

Send message
Joined: 5 May 99
Posts: 1946
Credit: 24,860,347
RAC: 0
United States
Message 1468362 - Posted: 24 Jan 2014, 18:18:52 UTC

Tesla K40 $5,299.99.
ID: 1468362 · Report as offensive
Profile William de Thomas
Volunteer tester

Send message
Joined: 15 May 99
Posts: 17
Credit: 15,501,592
RAC: 0
Puerto Rico
Message 1474077 - Posted: 8 Feb 2014, 4:25:48 UTC

Got the two Tesla K40 running and they are recognized here at home. Seems the remote server was not set up with the correct driver or something. Will post later on the PPD
ID: 1474077 · Report as offensive
hancocka

Send message
Joined: 19 May 00
Posts: 10
Credit: 4,574,614
RAC: 0
United Kingdom
Message 1476685 - Posted: 13 Feb 2014, 19:56:31 UTC - in response to Message 1474077.  
Last modified: 13 Feb 2014, 20:04:11 UTC

Got the two Tesla K40 running and they are recognized here at home. Seems the remote server was not set up with the correct driver or something. Will post later on the PPD


I've got my hands on two Tesla K40c, what I've noticed is the memory for CUDA is only shown as 4096 MB (4GB), and these are 12GB cards.

OpenCL states shows the memory correctly.

Anyone any tips, on how to quickly configure a single K40 card, with a i7, 6 Core Processor. (12 threads)

(got two machines of the same spec!)

Machine details here http://setiathome.berkeley.edu/show_host_detail.php?hostid=7212790
ID: 1476685 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1476722 - Posted: 13 Feb 2014, 21:00:24 UTC - in response to Message 1476685.  
Last modified: 13 Feb 2014, 21:02:08 UTC

I've got my hands on two Tesla K40c, what I've noticed is the memory for CUDA is only shown as 4096 MB (4GB), and these are 12GB cards.


For this part, on Windows, Seti@Home multibeam (V7) Cuda is a 32 bit program, so is limited to the ~4Gig 32 bit address space (minus a bit for driver components etc).

Switching to 64 bit has been found to be a significant performance penalty on GK110 and earlier (A GPU cost of wider addresses, not CPU host program cost), so is redundant for this application and not distributed.

Fortunately this particular application is never likely to need more than a few hundred Megabytes (excepting possible GBT future datasets, which will have revised applications anyway),

As a driver & OS function, each Cuda instance will see its own 4GB portion of the full video memory, assuming you're on a 64 bit host OS. That means running multiple instances will use as much of the available video memory before scaling advantages drop.

For Linux, the situation is slightly different at the moment, with IIRC difficulties trying to make 32 bit apps work with 64 bit host OS. There for 64 bit the executables are 64 bit, use 64 bit addressing, and so are somewhat slower at this time.

The performance issues using 64 bit executables may or may not change with future builds on both platforms, as more costly latencies elsewhere get minimised and hidden.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1476722 · Report as offensive
Profile William de Thomas
Volunteer tester

Send message
Joined: 15 May 99
Posts: 17
Credit: 15,501,592
RAC: 0
Puerto Rico
Message 1478725 - Posted: 18 Feb 2014, 12:53:03 UTC - in response to Message 1476685.  

Same here. I have two GTX Titans and two Tesla K40 set up on the same machine.
ID: 1478725 · Report as offensive
Profile William de Thomas
Volunteer tester

Send message
Joined: 15 May 99
Posts: 17
Credit: 15,501,592
RAC: 0
Puerto Rico
Message 1480613 - Posted: 22 Feb 2014, 12:58:06 UTC

All working well so far.
ID: 1480613 · Report as offensive

Message boards : Number crunching : Tesla K40 not recognized as GPU


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.