Tesla K40 not recognized as GPU

Author	Message
William de Thomas Volunteer tester Send message Joined: 15 May 99 Posts: 17 Credit: 15,501,592 RAC: 0	Message 1467482 - Posted: 22 Jan 2014, 23:05:51 UTC Hello, I just had the privilege of testing out the Tesla K40 on the Nvidia Tesla test drive site. Once I installed the Boinc software on their server and ran seti@home, only the cpu on that server did tasks. I looked in the log and the Teslas showed up (4 of them, nice) but no GPU task were downloaded to the server. I installed Boinc on my server that has Two GTX Titans and I was able to download task for the GPU and I also used Lunatics to help out. I also installed folding@home and it would not do any GPU task either (on the remote server). GPU Grid did not work either. I am assuming that it might be a Tesla driver problem. Any feedback on this would be appreciated. ID: 1467482 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1467493 - Posted: 22 Jan 2014, 23:20:13 UTC Last modified: 22 Jan 2014, 23:21:27 UTC I see the card is being displayed on the details page for that machine. It does look odd that the driver version is not listed. Checking the tops hosts I see a machine with a K20 "NVIDIA Tesla K20c (4095MB) driver: 320.27 OpenCL: 1.01". While your machine displays "[2] NVIDIA Tesla K40m (4095MB) OpenCL: 1.01". SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1467493 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13731 Credit: 208,696,464 RAC: 304	Message 1467524 - Posted: 23 Jan 2014, 1:55:25 UTC - in response to Message 1467493. I see the card is being displayed on the details page for that machine. It does look odd that the driver version is not listed. That was my first guess- driver issues. If the driver doesn't support it, it can't be used. Grant Darwin NT ID: 1467524 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1467525 - Posted: 23 Jan 2014, 2:08:59 UTC - in response to Message 1467482. Last modified: 23 Jan 2014, 2:17:01 UTC Hello, I just had the privilege of testing out the Tesla K40 on the Nvidia Tesla test drive site. Once I installed the Boinc software on their server and ran seti@home, only the cpu on that server did tasks. I looked in the log and the Teslas showed up (4 of them, nice) but no GPU task were downloaded to the server. I installed Boinc on my server that has Two GTX Titans and I was able to download task for the GPU and I also used Lunatics to help out. I also installed folding@home and it would not do any GPU task either (on the remote server). GPU Grid did not work either. I am assuming that it might be a Tesla driver problem. Any feedback on this would be appreciated. As there's no display on these beasties (?) Most likely ordinary Windows drivers won't make sense for it (guessing, I don't have one ;) ). Even if they can work with regular drivers, as you're on Windows 7 which can have multiple drivers without issue, installing the appropriate Tesla Compute Cluster (TCC) Driver should help. Checking with the nvidia-smi utility that TCC mode is engaged would be a good idea. There are advantages there with current applications, since it won't have all the graphics (Windows Display Driver Model, WDDM) stuff in the way. On the performance side, without all those driver latencies from a graphics driver, it could feasibly crunch significantly faster than a 780 or Titan, with current applications. That's true for the time being, while I am still working on ways to effectively hide all the graphic driver induced latencies for faster GPUs on Windows. In the future that gap may close again due to a combination of app refinement and our limited use of double precision. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1467525 ·

William de Thomas Volunteer tester Send message Joined: 15 May 99 Posts: 17 Credit: 15,501,592 RAC: 0	Message 1467704 - Posted: 23 Jan 2014, 14:30:38 UTC Yeah, I am almost sure it's a driver issue. Due that it is a remote server I don't know the drivers installed on that machine . I do know that I looked into the control panel and they were there and recognized so I imagine that the drivers were installed for those cards. Thanks for the replies. ID: 1467704 ·

Ulrich Metzner Volunteer tester Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13	Message 1467718 - Posted: 23 Jan 2014, 15:02:02 UTC - in response to Message 1467704. (...) Due that it is a remote server I don't know the drivers installed on that machine . (...) IIRC there was an issue with GPU processing regarding BOINC on a remote computer? Aloha, Uli ID: 1467718 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1467726 - Posted: 23 Jan 2014, 15:19:32 UTC - in response to Message 1467718. (...) Due that it is a remote server I don't know the drivers installed on that machine . (...) IIRC there was an issue with GPU processing regarding BOINC on a remote computer? Only if the proprietary Microsoft Remote Desktop Protocol is being actively used to access it. If that was the case, the OP would be able to look up driver version numbers etc. just as easily as on his local workstation. If RDP is inactive, or an alternative remote viewing product is being used, there's no problem. More to the point, a remote server probably runs (aguably should run, normally) without a local console operator logged in. And a server-trained sysadmin would probably spot that BOINC can be installed in 'protected application execution' mode (aka "as a service"), and choose that option. Either of those two states is enough to prevent GPU crunching with BOINC. There are BOINC projects who run their BOINC applications internally on Tesla-class GPUs mounted in headless chassis - Einstein with their Atlas cluster, for one. You could ask them how they do it, but I suspect that the answer would be a workstation-class Linux installation, which doesn't help much. ID: 1467726 ·

William de Thomas Volunteer tester Send message Joined: 15 May 99 Posts: 17 Credit: 15,501,592 RAC: 0	Message 1467793 - Posted: 23 Jan 2014, 17:55:23 UTC I mentioned to the AMAX contact person the problem but never received an answer back. Don't know if they looked into it. ID: 1467793 ·

skildude Send message Joined: 4 Oct 00 Posts: 9541 Credit: 50,759,529 RAC: 60	Message 1467813 - Posted: 23 Jan 2014, 18:20:42 UTC You may have a case of the GPU being to NEW for the project to identify it. FOr instance, look at my computers and you'll see one of them is running a 2048 MB Hawaii GPU. The reality is this GPU is a R9 290X with 4096 MB. Yours may be so different from previous GPU's that BOINC just shrugs its shoulders and moves on. In a rich man's house there is no place to spit but his face. Diogenes Of Sinope ID: 1467813 ·

William de Thomas Volunteer tester Send message Joined: 15 May 99 Posts: 17 Credit: 15,501,592 RAC: 0	Message 1467821 - Posted: 23 Jan 2014, 18:42:46 UTC I thought about that also, it being to new. Thanks ID: 1467821 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1467846 - Posted: 23 Jan 2014, 19:31:18 UTC - in response to Message 1467482. Last modified: 23 Jan 2014, 19:33:54 UTC Your host managed to grab Anonymous platform CPU and GPU work at 22 Jan 2014, 17:40:43, that then was abandoned at 18:24:22 UTC (Did you do a Remove/Add Project at that time?) Then it grabbed more CPU work (All VLAR) at 22 Jan 2014, 18:42:26 and 18:47:35 UTC, then at 18:52:58 UTC all that CPU work was marked as Timed out - no response (did you reset the project, or remove the app_info and attempt to get the work resent as GPU tasks?) VLAR tasks aren't sent to GPUs because of the extra long computation time, attempting to get them resent will get them marked as Timed out. at 18:52:59 UTC you were sent a solitary Stock v7.00 (cuda42) task, a day has gone by and that task hasn't been completed and reported yet, (The host hasn't contacted the project since to report the task and ask for more) I think a bit more patience is required, as plainly it can get work (work supply permitting), whether it can complete it is another matter, at present it looks as if Boinc is not even running on that host, surely Boinc would have asked for work in the last day, even if it is for more CPU work, and surely would have completed some CPU work, and if the Cuda task had errored, it would have been reported by now, the only other possibility is that the host is blocked from downloading work from external sites, and won't ask for more until it's downloaded it's existing supply, what does the Event log say? Claggy ID: 1467846 ·

William de Thomas Volunteer tester Send message Joined: 15 May 99 Posts: 17 Credit: 15,501,592 RAC: 0	Message 1468148 - Posted: 24 Jan 2014, 12:17:21 UTC Thanks for the reply. i did remove seti at one point and added GPU Grid and then went back to seti at another time. Since I only had 5 hours testing time I was in a little rush. Folding@home didn't work either with the Tesla K40. I will receive two Tesla K40 by the end of this month (hopefully) and I will retest and post my observations. They will be installed at my home so I will be able to do a lot of things (testing) on them. As I mentioned earlier, as it was on a remote server, I couldn't do much. Just wanted to know if anybody had this problem also because the K40 are so new. Thanks ID: 1468148 ·

Batter Up Send message Joined: 5 May 99 Posts: 1946 Credit: 24,860,347 RAC: 0	Message 1468362 - Posted: 24 Jan 2014, 18:18:52 UTC Tesla K40 $5,299.99. ID: 1468362 ·

William de Thomas Volunteer tester Send message Joined: 15 May 99 Posts: 17 Credit: 15,501,592 RAC: 0	Message 1474077 - Posted: 8 Feb 2014, 4:25:48 UTC Got the two Tesla K40 running and they are recognized here at home. Seems the remote server was not set up with the correct driver or something. Will post later on the PPD ID: 1474077 ·

hancocka Send message Joined: 19 May 00 Posts: 10 Credit: 4,574,614 RAC: 0	Message 1476685 - Posted: 13 Feb 2014, 19:56:31 UTC - in response to Message 1474077. Last modified: 13 Feb 2014, 20:04:11 UTC Got the two Tesla K40 running and they are recognized here at home. Seems the remote server was not set up with the correct driver or something. Will post later on the PPD I've got my hands on two Tesla K40c, what I've noticed is the memory for CUDA is only shown as 4096 MB (4GB), and these are 12GB cards. OpenCL states shows the memory correctly. Anyone any tips, on how to quickly configure a single K40 card, with a i7, 6 Core Processor. (12 threads) (got two machines of the same spec!) Machine details here http://setiathome.berkeley.edu/show_host_detail.php?hostid=7212790 ID: 1476685 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1476722 - Posted: 13 Feb 2014, 21:00:24 UTC - in response to Message 1476685. Last modified: 13 Feb 2014, 21:02:08 UTC I've got my hands on two Tesla K40c, what I've noticed is the memory for CUDA is only shown as 4096 MB (4GB), and these are 12GB cards. For this part, on Windows, Seti@Home multibeam (V7) Cuda is a 32 bit program, so is limited to the ~4Gig 32 bit address space (minus a bit for driver components etc). Switching to 64 bit has been found to be a significant performance penalty on GK110 and earlier (A GPU cost of wider addresses, not CPU host program cost), so is redundant for this application and not distributed. Fortunately this particular application is never likely to need more than a few hundred Megabytes (excepting possible GBT future datasets, which will have revised applications anyway), As a driver & OS function, each Cuda instance will see its own 4GB portion of the full video memory, assuming you're on a 64 bit host OS. That means running multiple instances will use as much of the available video memory before scaling advantages drop. For Linux, the situation is slightly different at the moment, with IIRC difficulties trying to make 32 bit apps work with 64 bit host OS. There for 64 bit the executables are 64 bit, use 64 bit addressing, and so are somewhat slower at this time. The performance issues using 64 bit executables may or may not change with future builds on both platforms, as more costly latencies elsewhere get minimised and hidden. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1476722 ·

William de Thomas Volunteer tester Send message Joined: 15 May 99 Posts: 17 Credit: 15,501,592 RAC: 0	Message 1478725 - Posted: 18 Feb 2014, 12:53:03 UTC - in response to Message 1476685. Same here. I have two GTX Titans and two Tesla K40 set up on the same machine. ID: 1478725 ·

William de Thomas Volunteer tester Send message Joined: 15 May 99 Posts: 17 Credit: 15,501,592 RAC: 0	Message 1480613 - Posted: 22 Feb 2014, 12:58:06 UTC All working well so far. ID: 1480613 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.