vlar running over 21hrs on cpu

Message boards : Number crunching : vlar running over 21hrs on cpu
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1392973 - Posted: 20 Jul 2013, 19:38:34 UTC
Last modified: 20 Jul 2013, 19:49:48 UTC

I have a vlar running on a cpu over 21 hours and it still doesn't show any percentage done, "Progress" is still 0.000%. and "Remaining" is "---".
I'm running OpenSuse 12.3 Linux on a 6core/HT 3960 with a Nvidia GTX460/1G (but there isn't a Nvidia GPU program yet for Linux is there)? Boinc is 7.0.65
Is this normal to be running so long?

Here is the pertinent vlar data:
 <workunit_header>
    <name>22oc08ac.9463.11115.3.12.24.vlar</name>
    <group_info>
      <tape_info>
        <name>22oc08ac</name>
        <start_time>2454762.4252491</start_time>
        <last_block_time>2454762.4252491</last_block_time>
        <last_block_done>11115</last_block_done>
        <missed>0</missed>
        <tape_quality>0</tape_quality>
        <beam>0</beam>
      </tape_info>
      <name>22oc08ac</name>
      <data_desc>
        <start_ra>19.116034178315</start_ra>
        <start_dec>10.033974364501</start_dec>
        <end_ra>19.1160411811</end_ra>
        <end_dec>10.034132742991</end_dec>
        <true_angle_range>0.0080150689258589</true_angle_range>
        <time_recorded>Wed Oct 22 22:12:21 2008</time_recorded>
        <time_recorded_jd>2454762.4252479</time_recorded_jd>
        <nsamples>1048576</nsamples>
ID: 1392973 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1392978 - Posted: 20 Jul 2013, 19:48:33 UTC - in response to Message 1392973.  

Is this normal to be running so long?

No, it's stuck. Restart BOINC to get the task running normally.
ID: 1392978 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1392981 - Posted: 20 Jul 2013, 19:59:34 UTC - in response to Message 1392978.  

That fixed things, its now showing Percent 1.279% and climbing Elapsed 3:49 Remaining is 2:58:44 - thanks

What about the GPU? Is there a nvidia cuda/openCL program for 64 bit linux? I had a fermi gpu executable which worked great until the switch to V7 but under V7 it creates way to many errors so I removed it
ID: 1392981 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1392986 - Posted: 20 Jul 2013, 20:14:16 UTC - in response to Message 1392981.  

What about the GPU? Is there a nvidia cuda/openCL program for 64 bit linux? I had a fermi gpu executable which worked great until the switch to V7 but under V7 it creates way to many errors so I removed it

If you want a stock app, a new version is in the works. You have your machines hidden but your posting history shows you have been using optimized apps. According to Ubuntu and Nvidia tasks X41g should work too.
ID: 1392986 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1393009 - Posted: 20 Jul 2013, 20:42:54 UTC - in response to Message 1392986.  

The typical error using "Multibeam x41g Preview, Cuda 3.20" is:
Cuda error 'cudaMalloc((void**) &dev_t_funct_cache' in file 'cuda/cudaAcc_pulsefind.cu' in line 851 : out of memory.
PulseFind Init failed...
setiathome_CUDA: CUDA runtime ERROR in device memory allocation... initiating boinc temporary exit (180 secs)...

08jn09ab.5493.11051.7.12.225
application	SETI@home v7
created	9 Jul 2013, 21:41:20 UTC
minimum quorum	2
initial replication	2
max # of error/total/success tasks	5, 10, 5

I see the same task farmed out to Windows using a nvidia on cud22 also failed
I can unhide my computers for awhile if you like
ID: 1393009 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1393028 - Posted: 20 Jul 2013, 21:34:27 UTC - in response to Message 1393009.  

GPU computing isn't really my thing but lets see...

The typical error using "Multibeam x41g Preview, Cuda 3.20" is:
[code]
Cuda error 'cudaMalloc((void**) &dev_t_funct_cache' in file 'cuda/cudaAcc_pulsefind.cu' in line 851 : out of memory.

So your card doesn't have enough free memory. Did you say it has one gigabyte of memory? Do you run some fancy desktop environment, maybe multiple workspaces, lots of tabs in web browser with hardware acceleration? Those could eat some good amount of VRAM.

I can unhide my computers for awhile if you like

That's usually the requirement when asking for help.
ID: 1393028 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1393057 - Posted: 20 Jul 2013, 23:25:54 UTC - in response to Message 1393028.  

My computers are visible now. I do run kde 4 on 2 monitors but that's what I was doing before V7 came out. I just opened up nvidia-settings, if I am reading this right i am only using 226M of the 1024M ram on the card. "Used dedicated memory: 226M" I googled around and found 2 items where they had similar failures but they're old. In those cases it looks like it was due to the gpu app not releasing all its memory and the next did the same, and the next until it finally couldn't support the GPU app. Is there any way to tell if people are already running linux 64 bit and with Seti V7 using nvidia and successfully processing GPU tasks? If that turns out to be the case then i should compare my setup with a successful one, maybe I'm not holding my mouth right
ID: 1393057 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1393365 - Posted: 21 Jul 2013, 19:32:33 UTC - in response to Message 1393057.  

I have 4 more (3 ordinary and 1 vlar) WU's that were not showing any percent done or remaining time, i restarted boinc again and they now look proper. I am concerned this is going to be a continuing problem, Is there anything that can be done to avoid it?
ID: 1393365 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1393395 - Posted: 21 Jul 2013, 21:16:58 UTC - in response to Message 1393365.  

I have 4 more (3 ordinary and 1 vlar) WU's that were not showing any percent done or remaining time, i restarted boinc again and they now look proper. I am concerned this is going to be a continuing problem, Is there anything that can be done to avoid it?

If the task gets stuck it usually happens at the start when the application is benchmarking different functions. Optimized apps don't have the benchmarking code so they don't get stuck (in there anyway). So switching to optimized is one option.

I was hoping that someone who is actually using NVIDIA cards with Linux would step in and give you some advice on the GPU apps. No such luck...

You have completed some V7 tasks. Some of them show warnings that there's less than 300MB of free VRAM while others don't. This one is especially interesting. When it starts the GPU has less than 300MB of free memory, at some point the memory runs out and the app does a three minutes temporary exit. After the three minutes has passed the GPU suddenly has plenty of free memory.

I'm afraid the only idea I have is to keep monitoring the VRAM usage and try to identify what application is hogging the memory.
ID: 1393395 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1393397 - Posted: 21 Jul 2013, 21:24:35 UTC - in response to Message 1393395.  

I have compiled the cuda 5 samples and they run ok, maybe i can take the src of one of those as an example and be able to get a free ram figure. If so i can set up a monitor to log some info about what was running and what the free gpu ram was.
ID: 1393397 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1393401 - Posted: 21 Jul 2013, 21:50:07 UTC - in response to Message 1393397.  

I have compiled the cuda 5 samples and they run ok, maybe i can take the src of one of those as an example and be able to get a free ram figure. If so i can set up a monitor to log some info about what was running and what the free gpu ram was.

Before I posted I did some Googling and it looked like nvidia-settings has command line interface as well.
ID: 1393401 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1393408 - Posted: 21 Jul 2013, 22:58:35 UTC
Last modified: 21 Jul 2013, 23:01:16 UTC

Not sure if that could be the source of your troubles but your log say´s:
Multibeam x41g Preview, Cuda 3.20 an old version with few bugs, try to update you host to the most updated version.
ID: 1393408 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1393415 - Posted: 21 Jul 2013, 23:57:07 UTC - in response to Message 1393401.  
Last modified: 22 Jul 2013, 0:00:07 UTC

I have compiled the cuda 5 samples and they run ok, maybe i can take the src of one of those as an example and be able to get a free ram figure. If so i can set up a monitor to log some info about what was running and what the free gpu ram was.

Before I posted I did some Googling and it looked like nvidia-settings has command line interface as well.

The command is nvidia-smi -a
I'm running Ubuntu 10.04 with one 19" monitor, BONIC 6.12.33 with one x41g task, a couple terminal windows, and a couple FireFox tabs. It says;
==============NVSMI LOG==============
Timestamp			: Sun Jul 21 19:54:50 2013
Driver Version			: 260.19.44
GPU 0:
	Product Name		: GeForce GTS 250
	PCI Device/Vendor ID	: 61510de
	PCI Location ID		: 0:1:0
	Display			: Connected
	Temperature		: 73 C
	Fan Speed		: 63%
	Utilization
	    GPU			: 92%
	    Memory		: 53%
	Power State		: PSTATE 0
	Power Capping		: Disabled

So, if I had a 512mb card, I would be out of vram. It's just a single 19" screen..
ID: 1393415 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1393470 - Posted: 22 Jul 2013, 4:44:22 UTC - in response to Message 1393408.  

juan BFB, I think you are on to something there. But I am not sure how to do this as ldd shows it needs the older version. I tried creating links libcudart.so.3 and libcufft.so.3 pointing to to the respective cuda 5.0 libs and setting the LD_LIBRARY_PATH appropriately but it fails. It must examine the links themselves and realize I lied.
Do you know what other linux users are running for their nvidia gpu crunching?

# ldd setiathome_x41g_x86_64-pc-linux-gnu_cuda32
        linux-vdso.so.1 (0x00007fffa03fb000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2b13d79000)
        libcudart.so.3 => xxx/libcudart.so.3 (0x00007f2b13b2c000)
        libcufft.so.3 => xxx/libcufft.so.3 (0x00007f2b11d76000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f2b11a70000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f2b11772000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f2b1155c000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f2b111af000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2b13f95000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f2b10fab000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f2b10da3000)
# ldd lib*
libcudart.so.3:
        linux-vdso.so.1 (0x00007fffd2621000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f3043507000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f30432ea000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f30430e2000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f3042ddc000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f3042add000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f30428c7000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f304251a000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f3043982000)
libcufft.so.3:
        linux-vdso.so.1 (0x00007fffc5517000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f7092866000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f7092649000)
        libcudart.so.3 => xxx/libcudart.so.3 (0x00007f70923fc000)
        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f70920f6000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f7091df7000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f7091be1000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f7091834000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f709484a000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f709162b000)


ID: 1393470 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1393471 - Posted: 22 Jul 2013, 4:47:45 UTC - in response to Message 1393415.  

Tbar: I ran a bunch of open browser windows, some graphics and a series of cuda apps and watched the remaining free gpu-ram. It never dropped below 600M, so I don't think I am dong anything outside of seti that would cause gpu-ram to get used up, in fact most of the time the computer is just crunching and not much else running.
ID: 1393471 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1393478 - Posted: 22 Jul 2013, 5:15:08 UTC - in response to Message 1393471.  

I also have need of some instruction on these app_info.xml files. Is it just me or are these an absolute abomination? I don’t see the order there. When i try to run the app_info.xml i ONLY get nvidia work never any cpu work. I tried removing the setiathome_enhanced sections but no change.
Basically i have 2 apps:
setiathome_7.01_x86_64-pc-linux-gnu
setiathome_x41g_x86_64-pc-linux-gnu_cuda32
How do i get a proper app_info.xml written?

<app_info>
    <app>
        <name>setiathome_v7</name>
    </app>
        <file_info>
                <name>setiathome_7.01_x86_64-pc-linux-gnu</name>
                <executable/>
        </file_info>
        <app_version>
                <app_name>setiathome_enhanced</app_name>
                <version_num>701</version_num>
                <platform>x86_64-pc-linux-gnu</platform>
                <avg_ncpus>1.000000</avg_ncpus>
                <max_ncpus>1.000000</max_ncpus>
                <file_ref>
                        <file_name>setiathome_7.01_x86_64-pc-linux-gnu</file_name>
                        <main_program/>
                </file_ref>
        </app_version>
        <file_info>
                <name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</name>
                <executable/>
        </file_info>
        <file_info>
                <name>libcudart.so.3</name>
                <executable/>
        </file_info>
        <file_info>
                <name>libcufft.so.3</name>
                <executable/>
        </file_info>
        <app_version>
                <app_name>setiathome_v7</app_name>
                <version_num>700</version_num>
                <platform>x86_64-pc-linux-gnu</platform>
                <plan_class>cuda32</plan_class>
                <avg_ncpus>0.05</avg_ncpus>
                <max_ncpus>1.0</max_ncpus>
                <coproc>
                        <type>CUDA</type>
                        <count>1.0</count>
                </coproc>
                <file_ref>
                        <file_name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</file_name>
                        <main_program/>
                </file_ref>
                <file_ref>
                        <file_name>libcudart.so.3</file_name>
                </file_ref>
                <file_ref>
                        <file_name>libcufft.so.3</file_name>
                </file_ref>
        </app_version>
</app_info>

ID: 1393478 · Report as offensive
Bernd Noessler

Send message
Joined: 15 Nov 09
Posts: 99
Credit: 52,635,434
RAC: 0
Germany
Message 1393488 - Posted: 22 Jul 2013, 6:06:14 UTC - in response to Message 1393470.  

Do you know what other linux users are running for their nvidia gpu crunching?


It doesn't matter what other users are using.
Your x41g is compiled against the cuda 3.2 headers. So you have to use
the 3.2 libraries. Otherwise crazy things could happen.

If your gcc is 4.4 or newer and like to give x41zc a try
(compiled for cuda 4.1/ sm 2.1) send me a PM.

ID: 1393488 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1393509 - Posted: 22 Jul 2013, 7:10:56 UTC - in response to Message 1393471.  
Last modified: 22 Jul 2013, 7:12:12 UTC

Tbar: I ran a bunch of open browser windows, some graphics and a series of cuda apps and watched the remaining free gpu-ram. It never dropped below 600M, so I don't think I am dong anything outside of seti that would cause gpu-ram to get used up, in fact most of the time the computer is just crunching and not much else running.

Well, the App says you're running out of vRam. Considering the other problems you're having with the CPU App, I'd say you have something wrong with the SETI part of your system. Most likely, something is spiking the vRam usage causing the App to abort. I just had a similar bout with trying to install a driver that has an installer incompatible with my system. After about a day of trying other people's suggestions, my system was borked. The CPU tasks were taking over twice as long as they should...sound familiar? I installed a new system, most of it is working fine again. I have a problem with the GPU AP App for now, but x41g and the CPU AP App are working great...
ID: 1393509 · Report as offensive
Profile Eric B

Send message
Joined: 9 Mar 00
Posts: 88
Credit: 168,875,085
RAC: 762
United States
Message 1394517 - Posted: 25 Jul 2013, 3:53:03 UTC - in response to Message 1393509.  

Even with a completely fresh install of boinc, i have issues with cpu mb, They seem to get stuck and make no fwd progress, this happens about every 4 hours or so. If i restart boinc they take off like they should until the next few WU's then I'm back in the same boat, the WU runs but makes no fwd progress at all. When I do restart boinc, i also notice that every WU that was in progress is reset back to 0%. I have another system with the same OS, well, sorta the same. One is OpenSuse 12.1 with 3.1 kernel the other is OpenSuse 12.3 with 3.8 kernel I don’t have trouble on OpenSuse 12.1 for some reason.
ID: 1394517 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1394533 - Posted: 25 Jul 2013, 4:03:51 UTC - in response to Message 1394517.  

Even with a completely fresh install of boinc, i have issues with cpu mb, They seem to get stuck and make no fwd progress, this happens about every 4 hours or so. If i restart boinc they take off like they should until the next few WU's then I'm back in the same boat, the WU runs but makes no fwd progress at all. When I do restart boinc, i also notice that every WU that was in progress is reset back to 0%. I have another system with the same OS, well, sorta the same. One is OpenSuse 12.1 with 3.1 kernel the other is OpenSuse 12.3 with 3.8 kernel I don’t have trouble on OpenSuse 12.1 for some reason.

I have SETI@home 7.01 running on OpenSuSE 12.2 and Astropulse 6.01 by Lunatics on OpenSuSE 12.1 with no problem. No more 6.01 on my Solaris Virtual Machine.
Tullio
ID: 1394533 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : vlar running over 21hrs on cpu


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.