Questions and Answers :
GPU applications :
Concerning the memory reported for my gpu's
Message board moderation
Author | Message |
---|---|
Tron Send message Joined: 16 Aug 09 Posts: 180 Credit: 2,250,468 RAC: 0 |
Boinc seems to be reporting an obscene amount of Video ram at startup what should i do to correct this? and how do I get the nvidia driver version to be discoverable by boinc? (ubuntu 12.04) Mon 03 Sep 2012 06:26:15 PM EDT | | Starting BOINC client version 7.0.27 for x86_64-pc-linux-gnu Mon 03 Sep 2012 06:26:15 PM EDT | | log flags: file_xfer, sched_ops, task Mon 03 Sep 2012 06:26:15 PM EDT | | Libraries: libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3 Mon 03 Sep 2012 06:26:15 PM EDT | | Data directory: /var/lib/boinc-client Mon 03 Sep 2012 06:26:15 PM EDT | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz [Family 6 Model 30 Stepping 5] Mon 03 Sep 2012 06:26:15 PM EDT | | Processor: 8.00 MB cache Mon 03 Sep 2012 06:26:15 PM EDT | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 popcnt lahf_lm dtherm tpr_shadow vnmi flexpriority ept vpid Mon 03 Sep 2012 06:26:15 PM EDT | | OS: Linux: 3.2.0-29-generic Mon 03 Sep 2012 06:26:15 PM EDT | | Memory: 7.80 GB physical, 7.96 GB virtual Mon 03 Sep 2012 06:26:15 PM EDT | | Disk: 922.57 GB total, 858.81 GB free Mon 03 Sep 2012 06:26:15 PM EDT | | Local time is UTC -4 hours Mon 03 Sep 2012 06:26:15 PM EDT | | NVIDIA GPU 0: GeForce GTX 460 (driver version unknown, CUDA version 4.20, compute capability 2.1, 134214656MB, 134214259MB available, 941 GFLOPS peak) Mon 03 Sep 2012 06:26:15 PM EDT | | NVIDIA GPU 1: GeForce GTX 460 (driver version unknown, CUDA version 4.20, compute capability 2.1, 134214656MB, 134214616MB available, 941 GFLOPS peak) Mon 03 Sep 2012 06:26:15 PM EDT | | OpenCL: NVIDIA GPU 0: GeForce GTX 460 (driver version 295.49, device version OpenCL 1.1 CUDA, 1024MB, 134214259MB available) Mon 03 Sep 2012 06:26:15 PM EDT | | OpenCL: NVIDIA GPU 1: GeForce GTX 460 (driver version 295.49, device version OpenCL 1.1 CUDA, 1024MB, 134214616MB available) Mon 03 Sep 2012 06:26:15 PM EDT | | Config: GUI RPC allowed from: Mon 03 Sep 2012 06:26:15 PM EDT | | A new version of BOINC is available. <a href=http://boinc.berkeley.edu/download.php>Download it.</a> Mon 03 Sep 2012 06:26:15 PM EDT | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 6753818; resource share 100 Mon 03 Sep 2012 06:26:15 PM EDT | SETI@home | General prefs: from SETI@home (last modified 15-Feb-2010 22:13:44) Mon 03 Sep 2012 06:26:15 PM EDT | SETI@home | Host location: none Mon 03 Sep 2012 06:26:15 PM EDT | SETI@home | General prefs: using your defaults Mon 03 Sep 2012 06:26:15 PM EDT | | Reading preferences override file Mon 03 Sep 2012 06:26:15 PM EDT | | Preferences: Mon 03 Sep 2012 06:26:15 PM EDT | | max memory usage when active: 7185.52MB Mon 03 Sep 2012 06:26:15 PM EDT | | max memory usage when idle: 7824.23MB Mon 03 Sep 2012 06:26:15 PM EDT | | max disk usage: 100.00GB Mon 03 Sep 2012 06:26:15 PM EDT | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager) Mon 03 Sep 2012 06:26:15 PM EDT | | Not using a proxy |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
what should i do to correct this? You may want to try updating your BOINC to 7.0.33 or above (self-compile the latest (7.0.35) from source code). Although even in that version it may not have been squashed. The developers have been busy getting it whacked since some time, but it's difficult. |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
What is weird is that the OpenCL part does say what the driver version is (295.49). I also hope that the driver sleep bug that was in the 295.xx and 296.xx drivers on the Windows side is not present in the Linux drivers. Boinc seems to be reporting an obscene amount of Video ram at startup |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
@Tron, The BOINC developers ask if you can test with 7.0.33 and the 301 drivers. Berkeley built 7.0.33 for 32bit. Berkeley built 7.0.33 for 64bit. Mind, if you're now using BOINC from repositories, this installs to a different directory than the Berkeley installed one does. For more information, see http://boinc.berkeley.edu/wiki/Installing_BOINC#Linux |
Tron Send message Joined: 16 Aug 09 Posts: 180 Credit: 2,250,468 RAC: 0 |
Ok will try that tonight and report my findings I'm still a total newbie with linux commands so it may be a bit of a challenge :P |
Tron Send message Joined: 16 Aug 09 Posts: 180 Credit: 2,250,468 RAC: 0 |
installed nividia 295.59 and boinc 7.0.33 on a different machine that had the same issue, I feel more comfortable mucking around with this one. did not resolve the bug ..see below I will attempt to install the 301 driver next Thu 06 Sep 2012 02:32:24 AM EDT | | No config file found - using defaults Thu 06 Sep 2012 02:32:24 AM EDT | | Starting BOINC client version 7.0.33 for x86_64-pc-linux-gnu Thu 06 Sep 2012 02:32:24 AM EDT | | log flags: file_xfer, sched_ops, task Thu 06 Sep 2012 02:32:24 AM EDT | | Libraries: libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3 Thu 06 Sep 2012 02:32:24 AM EDT | | Data directory: /home/bboinc/BOINC Thu 06 Sep 2012 02:32:24 AM EDT | | Processor: 4 GenuineIntel Intel(R) Atom(TM) CPU 330 @ 1.60GHz [Family 6 Model 28 Stepping 2] Thu 06 Sep 2012 02:32:24 AM EDT | | Processor: 512.00 KB cache Thu 06 Sep 2012 02:32:24 AM EDT | | Processor features: fpu vme de tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts nopl aperfmperf pni dtes64 monitor ds_cpl tm2 ssse3 cx16 xtpr pdcm movbe lahf_lm dtherm Thu 06 Sep 2012 02:32:24 AM EDT | | OS: Linux: 3.2.0-29-generic Thu 06 Sep 2012 02:32:24 AM EDT | | Memory: 2.93 GB physical, 9.36 GB virtual Thu 06 Sep 2012 02:32:24 AM EDT | | Disk: 921.17 GB total, 758.06 GB free Thu 06 Sep 2012 02:32:24 AM EDT | | Local time is UTC -4 hours Thu 06 Sep 2012 02:32:24 AM EDT | | NVIDIA GPU 0: ION (driver version unknown, CUDA version 4.20, compute capability 1.1, 510MB, 134213963MB available, 53 GFLOPS peak) Thu 06 Sep 2012 02:32:24 AM EDT | | OpenCL: NVIDIA GPU 0: ION (driver version 295.59, device version OpenCL 1.0 CUDA, 510MB, 134213963MB available) Thu 06 Sep 2012 02:32:24 AM EDT | | No general preferences found - using defaults Thu 06 Sep 2012 02:33:00 AM EDT | | Fetching configuration file from http://setiathome.berkeley.edu/get_project_config.php Thu 06 Sep 2012 02:33:14 AM EDT | | Running CPU benchmarks Thu 06 Sep 2012 02:33:14 AM EDT | | Suspending computation - CPU benchmarks in progress Thu 06 Sep 2012 02:33:45 AM EDT | | Benchmark results: Thu 06 Sep 2012 02:33:45 AM EDT | | Number of CPUs: 4 Thu 06 Sep 2012 02:33:45 AM EDT | | 628 floating point MIPS (Whetstone) per CPU Thu 06 Sep 2012 02:33:45 AM EDT | | 2267 integer MIPS (Dhrystone) per CPU Thu 06 Sep 2012 02:33:47 AM EDT | | Resuming computation Thu 06 Sep 2012 02:33:52 AM EDT | SETI@home | Master file download succeeded |
Tron Send message Joined: 16 Aug 09 Posts: 180 Credit: 2,250,468 RAC: 0 |
Ok so I updated the nvidia driver to 302.xx and installed 7.0.33 boinc on the newer machine . The memory reporting problem is still present , but now at least the GPU memory being reported to the server is correct (before it reflected the giant erroneous number that was displayed in the event log quote above. Below : the new log data, still shows the wrong memory and now also states the incorect nvidia driver version. ($modinfo in terminal shows that the new 302.xx is in use) wrote:
Also I have yet to see these GPUs get a single job ,Before I suspected the memory reporting issue was to blame, but now that it's "fixed" I still see no work in the pipe ... Maybe its just due to catchup of the servers or whatever crazy AI scheduling thing boinc does. I don't know. It's still only been a short while since the weekly outage. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Also I have yet to see these GPUs get a single job ,Before I suspected the memory reporting issue was to blame, but now that it's "fixed" I still see no work in the pipe ... There is no stock CUDA Linux app. If you want to use those GPUs under Linux, you'll have to install a third-party application. |
Tron Send message Joined: 16 Aug 09 Posts: 180 Credit: 2,250,468 RAC: 0 |
Strange that it would ask for nvidia work if there is no application for it |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
You at a bookshop yesterday: - Do you have the book "Is it wise to not ask at all if you don't know what will be the answer?"? - Sorry, we don't have it. You at a bookshop today: - Do you have the book "Is it wise to not ask at all if you don't know what will be the answer?"? - Sorry, we don't have it. You at a bookshop tomorrow: - Do you have the book "Is it wise to not ask at all if you don't know what will be the answer?"? - Yes, we have it. It was just released to market... Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Strange that it would ask for nvidia work if there is no application for it In none of your logs is BOINC asking for work for CUDA. It is only detecting that you have CUDA capable applications. When you have the "Use Nvidia GPU" option set in your project preferences, it will ask for 1 second worth of work for the GPU, but always get the answer from the server that this project does not have a CUDA application for x86_64-pc-linux-gnu. That's not asking for work, that's asking for initialization, to get the science application and perhaps 1 task. As how else is BOINC to know that a project has CUDA work for the platform you're on? It doesn't need to know that. If it's available, it'll download it. If not, then it won't, but it will continue asking, as maybe, perhaps, the project will release an application tomorrow (is what Bilbg tried to say). The memory detection thing may have been fixed in BOINC 7.0.36, although some reports say otherwise. It won't negatively affect the getting or running of work, as most science applications check for amounts of available memory prior to starting work. |
Tron Send message Joined: 16 Aug 09 Posts: 180 Credit: 2,250,468 RAC: 0 |
hi Ageless, I did not post the entire log entrys in either posts, only the relevant information. I've been running boinc on an off ,for a few years now and generally know how to work it. work fetch was repeatedly asking for nvidia work at a time when there was no application for it ...that seems strange to me because I though boinc asked for work based on what apps you had .. not 'asking for everything' only to be refused I installed lunatics apps last night , so NOW there is an application. approximately 30% of cuda tasks are erring out in the first minute or so, is this normal? also a batch of 20 CPU tasks came down and all tasks failed instantly after starting. log reporting "no output file" ..that was with/using the ak-v8 sse3 app. however the lunatics Astropulse sse3 application runs cpu tasks fine. Do you think it's a permissions issue or what? web searches I've done concerning this suggested that may be the case Edit) : I think I know why the CPU tasks are error now . going to switch back to a stock app for CPU work an see |
Tron Send message Joined: 16 Aug 09 Posts: 180 Credit: 2,250,468 RAC: 0 |
figured out the problem with the MB CPU tasks , I had the wrong application for my CPU. still very curious about the 30-35% error rate I'm getting with the CUDA work no overclocking is in use nor any over-temperature issues Randomly chosen error task wrote:
|
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
work fetch was repeatedly asking for nvidia work at a time when there was no application for it ...that seems strange to me because I though boinc asked for work based on what apps you had .. not 'asking for everything' only to be refused So how then will BOINC ask for work when you just attach to a project? Then it has no applications, thus can't ask for anything either. ;-) Which is why it asks for 1 second of work from the scheduler, the scheduler server will check if there is an application available for the OS that your BOINC version says it's running on, then the server will either report back that there's no work for your platform, or it will send you the application and at least one task. It's done this way in case the project will have an application for your OS tomorrow, then BOINC will do its question of asking for 1 second, the request will be checked and the application will be sent accordingly. And any next requests for work for this application will then receive bucket loads of work. It's done in this automatic way so that you do not have to check daily whether or not the project already has an application for your OS and that you then have to enable some preference somewhere stating that you want work for the piece of hardware. approximately 30% of cuda tasks are erring out in the first minute or so, is this normal? See this BOINC FAQ for options on what process exited with code 1 (0x1) means. |
Tron Send message Joined: 16 Aug 09 Posts: 180 Credit: 2,250,468 RAC: 0 |
Your misunderstanding that this was not a project initialization ... all available stock apps were local already ,CPU tasks running .. work fetch asked for nividia work many times only to be refused. it simply appeared that there was no work to be given ,not that any particular application was missing or present ... for days I stared at it wondering why it gets no gpu work when it's asking for it vehemently. Realistically, the server should have blipped a message saying "no cuda 'application available for x86_64-linux-gnu" instead of "no tasks available" If I goto the bookstore and ask the clerk for a certain book and they say " we don't have any ink" This is not a valid response to my query. Though it might be inferred that once ink has been procured, my book might become available, it still does not answer the real question. you see, it's not the asking that is the issue, it's the reply. As for the actual problem I'm having , seems the error rate has gone down to about 10% and a random comparison of other users with the same task : in most cases show they had failing errors too. so I wonder if it was'nt just a batch of tainted work .. the link provided in the previous post did not contain anything helpful for this particular situation since I'm running an advanced version of both boinc and nvidia drivers. I'll keep an eye on it and post back if it persists. Thank you for your time |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Your misunderstanding that this was not a project initialization ... There is no stock CUDA application for Linux. There are, at the time of me writing this answer, only GPU applications available for Windows. See Seti's application list. So you cannot ever have had a Linux science application for CUDA (or OpenCL for that matter), so your BOINC cannot have been asking for work for it, other than the initial 1 second. It won't go ask for hundreds or thousands of seconds of work for an application that it doesn't have. So go on, show us what the log actually said at the time of your BOINC asking for work for the CPU and non-existent GPU applications. If not overwritten by now, you should be able to get one or more lines from stdoutdae.txt or stdoutdae.old |
Tron Send message Joined: 16 Aug 09 Posts: 180 Credit: 2,250,468 RAC: 0 |
As I wrote above, the machine with the error issue is now running optimized apps. However this other machine that I also updated to 7.0.33 but otherwise stock ..has a embedded cuda capable graphics card that I do not wish to use for seti but was a test bed for updating to this version. it also shows the same type of requests stdoutdae.txt wrote:
|
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
To be fair, the message is "No tasks sent", which is considerably different from "No tasks available" as you stated previously. But yes, it does indeed look like it kept requesting work, but as stated also, it will keep asking because it is a resource that could be used if there were a stock app for it. The BOINC framework isn't currently smart enough to know not to ask for work for which there is no application as of yet. That's why "No tasks sent" is considerably different from "No tasks available", which implies that there is work that could be downloaded were there some available, but the former suggests that no tasks can be sent at all. |
Tron Send message Joined: 16 Aug 09 Posts: 180 Credit: 2,250,468 RAC: 0 |
The error activity has come back .this time at 50% failure .. always device #2 (running a double 460 in one slot) disabling device, stopped error tasks. I do notice when starting all the error tasks have estimated completion times of about 2 minutes only, and error out within the first minute all successful tasks had estimated completion times of 10-15 minutes. I could pick out the tasks that were going to error before they did. jumped up to nvidia 304.51 to see if that helped : no change next, going to try sliding back to 295 .xx |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Your error is "process exited with code 22 (0x16, -234)", with sub-message execv: Permission denied. This may be a problem with missing ia32 libraries. Did you install the 32bit compatibility libraries on your system, and if not, please try that before doing anything else. The GPU application is 32bit. Either use ldd on the Lunatics GPU application's binary that you got, so you can see what libraries are missing, or else immediately try http://boinc.berkeley.edu/wiki/Installing_BOINC#Ubuntu. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.