Message boards :
Number crunching :
vlar running over 21hrs on cpu
Message board moderation
Author | Message |
---|---|
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
I have a vlar running on a cpu over 21 hours and it still doesn't show any percentage done, "Progress" is still 0.000%. and "Remaining" is "---". I'm running OpenSuse 12.3 Linux on a 6core/HT 3960 with a Nvidia GTX460/1G (but there isn't a Nvidia GPU program yet for Linux is there)? Boinc is 7.0.65 Is this normal to be running so long? Here is the pertinent vlar data: <workunit_header> <name>22oc08ac.9463.11115.3.12.24.vlar</name> <group_info> <tape_info> <name>22oc08ac</name> <start_time>2454762.4252491</start_time> <last_block_time>2454762.4252491</last_block_time> <last_block_done>11115</last_block_done> <missed>0</missed> <tape_quality>0</tape_quality> <beam>0</beam> </tape_info> <name>22oc08ac</name> <data_desc> <start_ra>19.116034178315</start_ra> <start_dec>10.033974364501</start_dec> <end_ra>19.1160411811</end_ra> <end_dec>10.034132742991</end_dec> <true_angle_range>0.0080150689258589</true_angle_range> <time_recorded>Wed Oct 22 22:12:21 2008</time_recorded> <time_recorded_jd>2454762.4252479</time_recorded_jd> <nsamples>1048576</nsamples> |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
Is this normal to be running so long? No, it's stuck. Restart BOINC to get the task running normally. |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
That fixed things, its now showing Percent 1.279% and climbing Elapsed 3:49 Remaining is 2:58:44 - thanks What about the GPU? Is there a nvidia cuda/openCL program for 64 bit linux? I had a fermi gpu executable which worked great until the switch to V7 but under V7 it creates way to many errors so I removed it |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
What about the GPU? Is there a nvidia cuda/openCL program for 64 bit linux? I had a fermi gpu executable which worked great until the switch to V7 but under V7 it creates way to many errors so I removed it If you want a stock app, a new version is in the works. You have your machines hidden but your posting history shows you have been using optimized apps. According to Ubuntu and Nvidia tasks X41g should work too. |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
The typical error using "Multibeam x41g Preview, Cuda 3.20" is: Cuda error 'cudaMalloc((void**) &dev_t_funct_cache' in file 'cuda/cudaAcc_pulsefind.cu' in line 851 : out of memory. PulseFind Init failed... setiathome_CUDA: CUDA runtime ERROR in device memory allocation... initiating boinc temporary exit (180 secs)... 08jn09ab.5493.11051.7.12.225 application SETI@home v7 created 9 Jul 2013, 21:41:20 UTC minimum quorum 2 initial replication 2 max # of error/total/success tasks 5, 10, 5 I see the same task farmed out to Windows using a nvidia on cud22 also failed I can unhide my computers for awhile if you like |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
GPU computing isn't really my thing but lets see... The typical error using "Multibeam x41g Preview, Cuda 3.20" is: So your card doesn't have enough free memory. Did you say it has one gigabyte of memory? Do you run some fancy desktop environment, maybe multiple workspaces, lots of tabs in web browser with hardware acceleration? Those could eat some good amount of VRAM. I can unhide my computers for awhile if you like That's usually the requirement when asking for help. |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
My computers are visible now. I do run kde 4 on 2 monitors but that's what I was doing before V7 came out. I just opened up nvidia-settings, if I am reading this right i am only using 226M of the 1024M ram on the card. "Used dedicated memory: 226M" I googled around and found 2 items where they had similar failures but they're old. In those cases it looks like it was due to the gpu app not releasing all its memory and the next did the same, and the next until it finally couldn't support the GPU app. Is there any way to tell if people are already running linux 64 bit and with Seti V7 using nvidia and successfully processing GPU tasks? If that turns out to be the case then i should compare my setup with a successful one, maybe I'm not holding my mouth right |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
I have 4 more (3 ordinary and 1 vlar) WU's that were not showing any percent done or remaining time, i restarted boinc again and they now look proper. I am concerned this is going to be a continuing problem, Is there anything that can be done to avoid it? |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
I have 4 more (3 ordinary and 1 vlar) WU's that were not showing any percent done or remaining time, i restarted boinc again and they now look proper. I am concerned this is going to be a continuing problem, Is there anything that can be done to avoid it? If the task gets stuck it usually happens at the start when the application is benchmarking different functions. Optimized apps don't have the benchmarking code so they don't get stuck (in there anyway). So switching to optimized is one option. I was hoping that someone who is actually using NVIDIA cards with Linux would step in and give you some advice on the GPU apps. No such luck... You have completed some V7 tasks. Some of them show warnings that there's less than 300MB of free VRAM while others don't. This one is especially interesting. When it starts the GPU has less than 300MB of free memory, at some point the memory runs out and the app does a three minutes temporary exit. After the three minutes has passed the GPU suddenly has plenty of free memory. I'm afraid the only idea I have is to keep monitoring the VRAM usage and try to identify what application is hogging the memory. |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
I have compiled the cuda 5 samples and they run ok, maybe i can take the src of one of those as an example and be able to get a free ram figure. If so i can set up a monitor to log some info about what was running and what the free gpu ram was. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
I have compiled the cuda 5 samples and they run ok, maybe i can take the src of one of those as an example and be able to get a free ram figure. If so i can set up a monitor to log some info about what was running and what the free gpu ram was. Before I posted I did some Googling and it looked like nvidia-settings has command line interface as well. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Not sure if that could be the source of your troubles but your log say´s: Multibeam x41g Preview, Cuda 3.20 an old version with few bugs, try to update you host to the most updated version. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I have compiled the cuda 5 samples and they run ok, maybe i can take the src of one of those as an example and be able to get a free ram figure. If so i can set up a monitor to log some info about what was running and what the free gpu ram was. The command is nvidia-smi -a I'm running Ubuntu 10.04 with one 19" monitor, BONIC 6.12.33 with one x41g task, a couple terminal windows, and a couple FireFox tabs. It says; ==============NVSMI LOG============== Timestamp : Sun Jul 21 19:54:50 2013 Driver Version : 260.19.44 GPU 0: Product Name : GeForce GTS 250 PCI Device/Vendor ID : 61510de PCI Location ID : 0:1:0 Display : Connected Temperature : 73 C Fan Speed : 63% Utilization GPU : 92% Memory : 53% Power State : PSTATE 0 Power Capping : Disabled So, if I had a 512mb card, I would be out of vram. It's just a single 19" screen.. |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
juan BFB, I think you are on to something there. But I am not sure how to do this as ldd shows it needs the older version. I tried creating links libcudart.so.3 and libcufft.so.3 pointing to to the respective cuda 5.0 libs and setting the LD_LIBRARY_PATH appropriately but it fails. It must examine the links themselves and realize I lied. Do you know what other linux users are running for their nvidia gpu crunching? # ldd setiathome_x41g_x86_64-pc-linux-gnu_cuda32 linux-vdso.so.1 (0x00007fffa03fb000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2b13d79000) libcudart.so.3 => xxx/libcudart.so.3 (0x00007f2b13b2c000) libcufft.so.3 => xxx/libcufft.so.3 (0x00007f2b11d76000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f2b11a70000) libm.so.6 => /lib64/libm.so.6 (0x00007f2b11772000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f2b1155c000) libc.so.6 => /lib64/libc.so.6 (0x00007f2b111af000) /lib64/ld-linux-x86-64.so.2 (0x00007f2b13f95000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f2b10fab000) librt.so.1 => /lib64/librt.so.1 (0x00007f2b10da3000) # ldd lib* libcudart.so.3: linux-vdso.so.1 (0x00007fffd2621000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f3043507000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f30432ea000) librt.so.1 => /lib64/librt.so.1 (0x00007f30430e2000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f3042ddc000) libm.so.6 => /lib64/libm.so.6 (0x00007f3042add000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f30428c7000) libc.so.6 => /lib64/libc.so.6 (0x00007f304251a000) /lib64/ld-linux-x86-64.so.2 (0x00007f3043982000) libcufft.so.3: linux-vdso.so.1 (0x00007fffc5517000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f7092866000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7092649000) libcudart.so.3 => xxx/libcudart.so.3 (0x00007f70923fc000) libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f70920f6000) libm.so.6 => /lib64/libm.so.6 (0x00007f7091df7000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f7091be1000) libc.so.6 => /lib64/libc.so.6 (0x00007f7091834000) /lib64/ld-linux-x86-64.so.2 (0x00007f709484a000) librt.so.1 => /lib64/librt.so.1 (0x00007f709162b000) |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
Tbar: I ran a bunch of open browser windows, some graphics and a series of cuda apps and watched the remaining free gpu-ram. It never dropped below 600M, so I don't think I am dong anything outside of seti that would cause gpu-ram to get used up, in fact most of the time the computer is just crunching and not much else running. |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
I also have need of some instruction on these app_info.xml files. Is it just me or are these an absolute abomination? I don’t see the order there. When i try to run the app_info.xml i ONLY get nvidia work never any cpu work. I tried removing the setiathome_enhanced sections but no change. Basically i have 2 apps: setiathome_7.01_x86_64-pc-linux-gnu setiathome_x41g_x86_64-pc-linux-gnu_cuda32 How do i get a proper app_info.xml written? <app_info> <app> <name>setiathome_v7</name> </app> <file_info> <name>setiathome_7.01_x86_64-pc-linux-gnu</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>701</version_num> <platform>x86_64-pc-linux-gnu</platform> <avg_ncpus>1.000000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <file_ref> <file_name>setiathome_7.01_x86_64-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> <file_info> <name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</name> <executable/> </file_info> <file_info> <name>libcudart.so.3</name> <executable/> </file_info> <file_info> <name>libcufft.so.3</name> <executable/> </file_info> <app_version> <app_name>setiathome_v7</app_name> <version_num>700</version_num> <platform>x86_64-pc-linux-gnu</platform> <plan_class>cuda32</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>1.0</max_ncpus> <coproc> <type>CUDA</type> <count>1.0</count> </coproc> <file_ref> <file_name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</file_name> <main_program/> </file_ref> <file_ref> <file_name>libcudart.so.3</file_name> </file_ref> <file_ref> <file_name>libcufft.so.3</file_name> </file_ref> </app_version> </app_info> |
Bernd Noessler Send message Joined: 15 Nov 09 Posts: 99 Credit: 52,635,434 RAC: 0 |
Do you know what other linux users are running for their nvidia gpu crunching? It doesn't matter what other users are using. Your x41g is compiled against the cuda 3.2 headers. So you have to use the 3.2 libraries. Otherwise crazy things could happen. If your gcc is 4.4 or newer and like to give x41zc a try (compiled for cuda 4.1/ sm 2.1) send me a PM. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Tbar: I ran a bunch of open browser windows, some graphics and a series of cuda apps and watched the remaining free gpu-ram. It never dropped below 600M, so I don't think I am dong anything outside of seti that would cause gpu-ram to get used up, in fact most of the time the computer is just crunching and not much else running. Well, the App says you're running out of vRam. Considering the other problems you're having with the CPU App, I'd say you have something wrong with the SETI part of your system. Most likely, something is spiking the vRam usage causing the App to abort. I just had a similar bout with trying to install a driver that has an installer incompatible with my system. After about a day of trying other people's suggestions, my system was borked. The CPU tasks were taking over twice as long as they should...sound familiar? I installed a new system, most of it is working fine again. I have a problem with the GPU AP App for now, but x41g and the CPU AP App are working great... |
Eric B Send message Joined: 9 Mar 00 Posts: 88 Credit: 168,875,085 RAC: 762 |
Even with a completely fresh install of boinc, i have issues with cpu mb, They seem to get stuck and make no fwd progress, this happens about every 4 hours or so. If i restart boinc they take off like they should until the next few WU's then I'm back in the same boat, the WU runs but makes no fwd progress at all. When I do restart boinc, i also notice that every WU that was in progress is reset back to 0%. I have another system with the same OS, well, sorta the same. One is OpenSuse 12.1 with 3.1 kernel the other is OpenSuse 12.3 with 3.8 kernel I don’t have trouble on OpenSuse 12.1 for some reason. |
tullio Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1 |
Even with a completely fresh install of boinc, i have issues with cpu mb, They seem to get stuck and make no fwd progress, this happens about every 4 hours or so. If i restart boinc they take off like they should until the next few WU's then I'm back in the same boat, the WU runs but makes no fwd progress at all. When I do restart boinc, i also notice that every WU that was in progress is reset back to 0%. I have another system with the same OS, well, sorta the same. One is OpenSuse 12.1 with 3.1 kernel the other is OpenSuse 12.3 with 3.8 kernel I don’t have trouble on OpenSuse 12.1 for some reason. I have SETI@home 7.01 running on OpenSuSE 12.2 and Astropulse 6.01 by Lunatics on OpenSuSE 12.1 with no problem. No more 6.01 on my Solaris Virtual Machine. Tullio |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.