Concerning the memory reported for my gpu's

Questions and Answers : GPU applications : Concerning the memory reported for my gpu's
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1288783 - Posted: 28 Sep 2012, 12:07:39 UTC - in response to Message 1288296.  

07-Sep-2012 16:25:16 [SETI@home] Sending scheduler request: To fetch work.
07-Sep-2012 16:25:16 [SETI@home] Requesting new tasks for NVIDIA
07-Sep-2012 16:25:21 [SETI@home] Scheduler request completed: got 0 new tasks
07-Sep-2012 16:25:21 [SETI@home] No tasks sent

Addendum to TNG's post:
You don't see how much work BOINC is asking either, as that's considered confusing information these days by the BOINC developers. So you'd have to add a cc_config.xml file and add into it:

<cc_config>
<log_flags>
<sched_op_debug>1</sched_op_debug>
</log_flags>
</cc_config>


Save it into /etc/boinc-client when using the package maintained version.
Save it into /your home/BOINC/ directory when using Berkeley's BOINC.

Next open BOINC Manager->Advanced view->Advanced->Read config file.

Now the log will show how much work is being asked.
Then you'll see it do things like this:
example from my system, for Einstein work for my ATI GPU wrote:
28/09/2012 02:52:55 | Einstein@Home | [sched_op] Starting scheduler request
28/09/2012 02:52:55 | Einstein@Home | Sending scheduler request: To report completed tasks.
28/09/2012 02:52:55 | Einstein@Home | Reporting 1 completed tasks
28/09/2012 02:52:55 | Einstein@Home | Requesting new tasks for ATI
28/09/2012 02:52:55 | Einstein@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
28/09/2012 02:52:55 | Einstein@Home | [sched_op] ATI work request: 845.22 seconds; 0.00 devices
28/09/2012 02:52:59 | Einstein@Home | Scheduler request completed: got 1 new tasks
28/09/2012 02:52:59 | Einstein@Home | [sched_op] Server version 611
28/09/2012 02:52:59 | Einstein@Home | Project requested delay of 60 seconds
28/09/2012 02:52:59 | Einstein@Home | [sched_op] estimated total CPU task duration: 0 seconds
28/09/2012 02:52:59 | Einstein@Home | [sched_op] estimated total ATI task duration: 4599 seconds


And (hypothetically) if you were to delete the Lunatics application and try to do this with your BOINC with all default stock applications, you'd see that on the work request there'd be a request for 1 second for the Nvidia card.
ID: 1288783 · Report as offensive
Profile Tron

Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,250,468
RAC: 0
United States
Message 1288983 - Posted: 28 Sep 2012, 19:11:16 UTC
Last modified: 28 Sep 2012, 19:52:52 UTC

those first 5 errors (code22) are cpu work that failed instantly. it was an attempt to use another optimized cpu application, (ak-v8-ssse3) that didn't go so well.

The GPU errors are my main concern at this time.

Ageless wrote:
The GPU application is 32bit.


The application I downloaded and installed "said" 64 bit, is it ?

app : Lunatics_x41g_linux64_cuda32


I did have to install some dependency stuff to get Boinc 7.0.33 to work initially , xscreensaver libs and opensuse (these packages provided the missing libs that boinc was unable to find)

Random thoughts :
gpu "2" was chossen by boinc on all failed gpu work , but not all gpu work sent to gpu2 failed. only the files with very short estimated time errored.

Disabling gpu2 with cc.config seemed to stop the errors
however I have 2 gpus I'd like to have them both working.

I had a heck of a time with syntax errors in every "provided" app.info.xml that was distributed with the optimized apps , I had to add a few close tags and change some spacing before boinc would read it (btw I tried several app.info files that were supposedly copy/paste-able.)

so heres a valid question , is there a more in-depth faq or guideline for writing a app.info.xml ?
... I have the right idea how it goes but am not sure about where to define the version between the 528 and 603 SAH stock cpu app

More specifically, what purpose is the <app> line ?
is that to enclose an app of similar name that may have more than one version?
Should it be used for each app section (sah , AP , CUDA_FeRMI) ?

output from ldd for x41cuda app :
ldd wrote:
zotacii@zotacii-desktop:/home/BOINC/projects/setiathome.berkeley.edu$ ldd setiathome_x41g_x86_64-pc-linux-gnu_cuda32
linux-vdso.so.1 => (0x00007fff80af0000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f86669d5000)
libcudart.so.3 => not found
libcufft.so.3 => not found
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f86666d4000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f86663da000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f86661c3000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8665e06000)
/lib64/ld-linux-x86-64.so.2 (0x00007f8666c0e000)


the two files "not found" ARE present in the projects/sah folder .. do they need to be somewhere else?

Does this look right? (I want to add the gpu apps and optimized cpu astropulse while retaining the stock multi-band cpu apps)


app.info.xml wrote:
<app_info>
<app>
<name>setiathome_enhanced</name>
</app>

<file_info>
<name>setiathome-5.28.x86_64-pc-linux-gnu</name>
<executable/>
</file_info>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>528</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<file_ref>
<file_name>setiathome-5.28.x86_64-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
</app_version>

<file_info>
<name>setiathome_6.03_i686-pc-linux-gnu</name>
<executable/>
</file_info>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>603</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<file_ref>
<file_name>setiathome_6.03_i686-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
</app_version>

<app>
<name>astropulse_v6</name>
</app>
<file_info>
<name>ap_6.01r546_sse3_linux64</name>
<executable/>
</file_info>

<app_version>
<app_name>astropulse_v6</app_name>
<version_num>601</version_num>
<file_ref>
<file_name>ap_6.01r546_sse3_linux64</file_name>
<main_program/>
</file_ref>

</app_version>

<file_info>
<name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</name>
<executable/>
</file_info>

<file_info>
<name>libcudart.so.3</name>
<executable/>
</file_info>

<file_info>
<name>libcufft.so.3</name>
<executable/>
</file_info>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>611</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class>cuda_fermi</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1.0</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1.0</count>
</coproc>
<file_ref>
<file_name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libcudart.so.3</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.3</file_name>
</file_ref>
</app_version>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>609</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class>cuda_fermi</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1.0</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1.0</count>
</coproc>
<file_ref>
<file_name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libcudart.so.3</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.3</file_name>
</file_ref>
</app_version>

<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>608</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class>cuda_fermi</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>1.0</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1.0</count>
</coproc>
<file_ref>
<file_name>setiathome_x41g_x86_64-pc-linux-gnu_cuda32</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libcudart.so.3</file_name>
</file_ref>
<file_ref>
<file_name>libcufft.so.3</file_name>
</file_ref>
</app_version>
</app_info>
ID: 1288983 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1289063 - Posted: 28 Sep 2012, 22:01:08 UTC - in response to Message 1288983.  

the two files "not found" ARE present in the projects/sah folder .. do they need to be somewhere else?

Do those two files have the executable bit set?

Gruß,
Gundolf
ID: 1289063 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1289067 - Posted: 28 Sep 2012, 22:16:29 UTC - in response to Message 1288983.  
Last modified: 28 Sep 2012, 22:18:38 UTC

Sorry, I missed that those were CPU tasks. Not awake yet. :)
But the exit code 1, unspecified launch failure can be a segfault problem, either in the code of the application, or there's a problem with your videocard's memory.

Ageless wrote:
The GPU application is 32bit.


The application I downloaded and installed "said" 64 bit, is it ?
app : Lunatics_x41g_linux64_cuda32

Hmmm, as far as I know all GPU apps are 32bit. I think this is because the Lunatics app requires that the CPU has SSE3, which all 64bit CPUs and only a handful of 32bit CPUs do. It also shows that in the actual application name: setiathome_x41g_x86_64-pc-linux-gnu-cuda32; Setiathome, x41g is an instruction set specifically for Nvidia GPUs, x86_64 is 64bit CPU (as opposed to intelx86, which is 32bit CPU), PC (as opposed to Mac), Linux, GNU is "GNU's Not Unix!", the GPU requires CUDA 3.2 and above drivers.

All GPU applications actually run on the CPU, but none need a memory access of more than 4GB, or be able to address that amount of space, so it's really not necessary to make a specific 64bit application.

What about videocards with 4GB and more of memory on them? Also not necessary that the application is 64bit, as the application doesn't address memory on the videocard, it just translates the task to kernels that the GPU can read, then transports those kernels to the GPU, which computes them, then the outcome is transported back to the PC, translated by the CPU back into something the humans can read, written to disk and the cycle repeats itself.

so heres a valid question , is there a more in-depth faq or guideline for writing a app.info.xml ?

Well, first off it's an app_info.xml file, just as it is a cc_config.xml file. It's an underscore _ not a dot .

Check out Anonymous platform in the BOINC Wiki User Manual for more information.

More specifically, what purpose is the <app> line ?
is that to enclose an app of similar name that may have more than one version?
Should it be used for each app section (sah , AP , CUDA_FeRMI) ?

It's to enclose the name of the main application that you're specifying in the sections thereafter. setiathome_enhanced and astropulse are the main applications.

libcudart.so.3 => not found
libcufft.so.3 => not found

the two files "not found" ARE present in the projects/sah folder .. do they need to be somewhere else?

These two files are part of the Lunatics package that you installed in that directory. They're copied to the correct slot when the task runs.

However, BOINC will also try to find the libcudart.so file on startup, it's how it detects if your GPU is CUDA capable. The videocard drivers install this file in this instance. So it should be elsewhere on your system as well, albeit possibly a different version than the one in the Lunatics package. They add it for complete compatibility with their application, as they used these files when they compiled their application.

You app_info.xml file I'll leave to someone else to look over.
ID: 1289067 · Report as offensive
Profile Tron

Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,250,468
RAC: 0
United States
Message 1289093 - Posted: 29 Sep 2012, 0:14:36 UTC - in response to Message 1289063.  
Last modified: 29 Sep 2012, 0:17:56 UTC

the two files "not found" ARE present in the projects/sah folder .. do they need to be somewhere else?

Do those two files have the executable bit set?

Gruß,
Gundolf



They do


and the above app_info.xml loaded flawlessly so I assume that means it's right :P
ID: 1289093 · Report as offensive
Profile Tron

Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,250,468
RAC: 0
United States
Message 1289107 - Posted: 29 Sep 2012, 0:47:51 UTC

ok , so up and running again, one gpu ignored... still making more errors than success 60/40

work is suspended till a solution is found.

recent error wrote:

Cuda error '(cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, (cudaAcc_NumDataPoints / fftlen) * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost))' in file 'cuda/cudaAcc_summax.cu' in line 239 : unspecified launch failure.


what can i use to test video ram within linux?
ID: 1289107 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1289109 - Posted: 29 Sep 2012, 0:51:44 UTC - in response to Message 1289107.  

https://simtk.org/home/memtest has MemtestG80 and MemtestCL: Memory Testers for CUDA- and OpenCL-enabled GPUs for Linux and Windows.
ID: 1289109 · Report as offensive
Profile Tron

Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,250,468
RAC: 0
United States
Message 1292124 - Posted: 6 Oct 2012, 17:50:56 UTC

Memory was fine.
All systems good , all #1 errors came from inadequate power due to a bad receptacle.

Running stable at 3.07ghz (i7 870 2.8) could go higher but it runs out of cooling capacity.
ID: 1292124 · Report as offensive
Previous · 1 · 2

Questions and Answers : GPU applications : Concerning the memory reported for my gpu's


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.