Posts by petri33 - 3(1MB+1AP) streaming


log in
1) Message boards : Number crunching : Linux, Nvidia 750-TI and Multiple Tasks. (Message 1665968)
Posted 8 days ago by Profile petri33 - 3(1MB+1AP) streamingProject donor
I have looked at the GPU usage with Nvidia profiler and identified a few places where computation can overlap more that it does now. So I hope to make some advances still.

Autocorrelations is one place to start with.

--
petri33
2) Message boards : Number crunching : Linux, Nvidia 750-TI and Multiple Tasks. (Message 1665849)
Posted 9 days ago by Profile petri33 - 3(1MB+1AP) streamingProject donor
Unfortunately I don't know enough about the Linux driver model to say things with authority, but can point out the reasons it works like that on Windows so you can compare.

First main point is that the reason running multiple Cuda/OpenCL tasks on Windows is effective, is the driver architecture has considerable latency (i.e. stalls for some period when issued a command). Because of the way the applications have had to scale from very small to very large GPUs, the current (let's call them 'traditional') applications are quite 'chatty', and make limited use of Cuda-streams, which is another finer grained latency hiding mechism. So for now on Windows at least, running multiple tasks per GPU amounts to a coarse grained and not-super-efficient latency hiding mechanism, that is pretty hard on cache mechanisms, PCIe and drivers.

Second point is that the Linux drivers are I believe (again limited knowledge here) built in as kernel modules (while the Windows ones involve 'user mode drivers'. Leaving out a whole swathe of DirectX related commands they are probably substantially leaner and able to use some GPU features more directly (manifesting in a lower latency). That's important because if you try to hide latencies that aren't there, you'll get the extra costs imposed with extra tasks ( Cache abuse and OS scheduling overheads ) without gaining much from latency hiding (because there isn't as much there to hide).

Lastly, the 750ti itself doesn't have a huge number of multiprocessors or high bandwidth (even though certainly much 'larger' than low power cards in previous generations). Those considerations form a performance ceiling highly dependant on the application design, which is over-due for some major changes.

There are a lot of changes happening simultaneously, and I'll probably miss some, but here they are in bullet points:
- We (Petri33 and myself) are building and testing more use of Cuda streams and 'larger' code into some application areas, as well as reducing 'chattiness'
- Cuda 7, which came out recently, has better support for Maxwell architecture, which combined with Kepler 2 ( GK110, GTX780 etc ) changes involved big shifts in the way parallelism is done on Big K onwards, which we are still coming to grips with. This also mandates 64 bit, which has a performance penalty on the GPUs through larger addresses chewing up registers, but the hope is that improved mechanisms might bury those costs.
- Windows 10 is also moving to a lower latency driver architecture, so things will change with respect to optimal number of tasks there as well (to some extent even on earlier OS where the drivers will change even though DirectX12 won't be available on older gen)

That all amounts to a picture where in the future running fewer tasks will probably be better ( more efficient, higher overall throughput etc), but will still vary by OS, drivers, GPU generation, application(s), and your particular goals. Pretty complex, but ongoing maturation of GPGPU has meant trying to manage these changes, which hasn't been without some pretty hefty bumps in the road.


Here is a snapshot of my desktop just now when going to sleep for an hour and a half before a working day.

(I'm running one at a time per card. When testing run times of my latest compilations I shut down the boincmgr.)

[/img]
3) Message boards : Number crunching : Problem with Nvidia driver version 350.12 (Message 1665104)
Posted 11 days ago by Profile petri33 - 3(1MB+1AP) streamingProject donor
bool2 was not a reserved type in previous opencl versions.

edit replace bool2 with bool_2
4) Message boards : Number crunching : Panic Mode On (97) Server Problems? (Message 1664026)
Posted 14 days ago by Profile petri33 - 3(1MB+1AP) streamingProject donor
I got one :)

It completed in 122 seconds. Inconclusive.
5) Message boards : Number crunching : Lunatics_x41g_linux64_cuda32.7z (Message 1660841)
Posted 22 days ago by Profile petri33 - 3(1MB+1AP) streamingProject donor
Looking good. None of the new WUs have validated yet but I am not seeing any overflows which is a good sign.


Seeing some [Cuda 6] valids there now, so Cuda 3.2 no good on Maxwell confirmed for Linux. (for this application anyway, as on Windows)


I have done some research in/on linux and a hand full of GTX780's (1- max 4).

What would be the best way to deliver the code to the community To Help Us All to do more science before we get the Jason's x42? (Jason?)

I have done some kernel optimizations (780 specific, not tested anywhere else. just on my computer) and some stream-inclined/induced/oriented changes to the already Well and Good optimized Jason's and his precursors code.


What would be the most neutral way to publish? (I can not host any piece of a code for three-four years) ??


I'll drop the source to Your mail box or whatever.

I'm running two MB at at time with 3 GPUs and in between an AP on GPU or CPU if available.

See for yourselves (on top hosts) .. (remember to divide the time by 2 for any MB).
6) Message boards : Number crunching : Lunatics_x41g_linux64_cuda32.7z (Message 1660398)
Posted 23 days ago by Profile petri33 - 3(1MB+1AP) streamingProject donor
@dsh



I am not sure if it is relevant or not but I am still receiving

boinc: /lib64/libssl.so.1.0.0: no version information available (required by boinc) boinc: /usr/lib64/libcurl.so.4: no version information available (required by boinc) boinc: /lib64/libcrypto.so.1.0.0: no version information available (required by boinc)



Have you tried installing libssl, libcurl and libcrypto?

sudo apt-get install XXX (where XXX is the missing part)
or
sudo yum install XXX


google "how to install XXX on linux YYY" (where YYY is your system).

I had to install the "-dev" packages i.e. "libcurl-dev" (+ssl-dev and the same for the crypto library) instead of the regular libraries when compiling my own version.


OR then if you have them installed
a) the PATH/LIBPATH/... is somehow wrong (highly unlikely)
b) There is a version conflict between Your boinc client and the boinc.

Hard to say.
7) Message boards : Number crunching : Lunatics_x41g_linux64_cuda32.7z (Message 1659869)
Posted 24 days ago by Profile petri33 - 3(1MB+1AP) streamingProject donor


boinc: /lib64/libssl.so.1.0.0: no version information available (required by boinc) boinc: /usr/lib64/libcurl.so.4: no version information available (required by boinc) boinc: /lib64/libcrypto.so.1.0.0: no version information available (required by boinc)





The above could mean that you have to install libssl, libcurl and libcrypto. It says reuired by boinc not seti app.

Something like
sudo apt-get install libssl
or
sudo yum install libssl

and repeat that for libcurl and libcrypto.


My linux says: for the seti executable
linux-vdso.so.1 => (0x00007fff37da0000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f59255d8000) libcudart.so.6.5 => /home/petri/Downloads/BOINC/projects/setiathome.berkeley.edu/./libcudart.so.6.5 (0x00007f5925387000) libcufft.so.6.5 => /home/petri/Downloads/BOINC/projects/setiathome.berkeley.edu/./libcufft.so.6.5 (0x00007f5922963000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f5922654000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f592234d000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5922136000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5921d72000) /lib64/ld-linux-x86-64.so.2 (0x00007f5925818000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f5921b6d000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f5921965000)



and for the boinc (lines 2, 3 and 4)
ldd boinc linux-vdso.so.1 => (0x00007fff4f199000) libcurl.so.4 => /usr/lib/x86_64-linux-gnu/libcurl.so.4 (0x00007f275caf5000) libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f275c896000) libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007f275c4b2000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f275c2ae000) libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f275c095000) libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f275bd5b000) libXss.so.1 => /usr/lib/x86_64-linux-gnu/libXss.so.1 (0x00007f275bb57000) libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f275b945000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f275b726000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f275b417000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f275b111000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f275aef9000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f275ab35000) libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007f275a902000) librtmp.so.1 => /usr/lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f275a6e5000) libgssapi_krb5.so.2 => /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f275a49d000) liblber-2.4.so.2 => /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007f275a28e000) libldap_r-2.4.so.2 => /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007f275a03b000) /lib64/ld-linux-x86-64.so.2 (0x00007f275cd81000) libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f2759e1c000) libgnutls-deb0.so.28 => /usr/lib/x86_64-linux-gnu/libgnutls-deb0.so.28 (0x00007f2759b12000) libhogweed.so.2 => /usr/lib/x86_64-linux-gnu/libhogweed.so.2 (0x00007f27598e4000) libnettle.so.4 => /usr/lib/x86_64-linux-gnu/libnettle.so.4 (0x00007f27596b3000) libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f2759432000) libkrb5.so.3 => /usr/lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f2759163000) libk5crypto.so.3 => /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f2758f33000) libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f2758d2e000) libkrb5support.so.0 => /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f2758b23000) libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f2758908000) libsasl2.so.2 => /usr/lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f27586ec000) libgssapi.so.3 => /usr/lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007f27584ae000) libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f27581d0000) libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f2757fcb000) libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f2757dc5000) libp11-kit.so.0 => /usr/lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f2757b83000) libtasn1.so.6 => /usr/lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f2757970000) libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f275776c000) libheimntlm.so.0 => /usr/lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007f2757562000) libkrb5.so.26 => /usr/lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007f27572da000) libasn1.so.8 => /usr/lib/x86_64-linux-gnu/libasn1.so.8 (0x00007f2757039000) libhcrypto.so.4 => /usr/lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007f2756e05000) libroken.so.18 => /usr/lib/x86_64-linux-gnu/libroken.so.18 (0x00007f2756bf0000) libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f27569eb000) libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007f27567e2000) libwind.so.0 => /usr/lib/x86_64-linux-gnu/libwind.so.0 (0x00007f27565b9000) libheimbase.so.1 => /usr/lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007f27563ab000) libhx509.so.5 => /usr/lib/x86_64-linux-gnu/libhx509.so.5 (0x00007f2756160000) libsqlite3.so.0 => /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f2755e9b000) libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f2755c62000)
8) Message boards : Number crunching : Show and tell your machine. Here's mine. (Message 1655029)
Posted 20 Mar 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
The computing environment changed drastically in the middle of the day.
No green men visible though.



... a partial solar eclipse!
9) Message boards : Number crunching : Intel GPU (Message 1654068)
Posted 18 Mar 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
Thanks Josef for the response.

I have 2 of these machines, identical. 1 has been running for months, without GPU use.

This one: http://setiathome.berkeley.edu/show_host_detail.php?hostid=7317240

Interesting thing is the machine I am trying GPU on is taking at least twice as long to complete CPU units as the one that is not running GPU.


It means the GPU task requires a lot of CPU attention or the memory bus is overcommitted and the CPU and GPU are both starving. I'd try freeing up a core.
10) Message boards : Number crunching : Panic Mode On (96) Server Problems? (Message 1653355)
Posted 15 Mar 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
I (my computer) got AP work. Should I panic?
11) Message boards : Number crunching : CPU Power Usage Varies Significantly With RAM Frequency (Message 1652617)
Posted 13 Mar 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
Why not change the whole thing?

Could it be that the memory tells the computer what to access/compute next?

To program a GPU efficiently one must do just that right now in his/hers mind.

After all - the memory is the slowest link in the chain.

Imagine You were the one running 100m with or without the hurdles -- would it be nice if the marathon runners were already giving You the way? (Insted of stepping aside when politely asked or after a CPU/OS predetermined timeout)

HDD? Finally forget it!
12) Message boards : Number crunching : Panic Mode On (95) Server Problems? (Message 1647948)
Posted 28 Feb 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
Yay! Got 46 tasks GPU tasks for my empty host... but I can't download them... hmph...



Got some downloaded. All GPU!

A new order?
13) Message boards : Number crunching : The Highest Ranked SETI AMD Host is a MAC: Time for a STOCK MAC APP? (Message 1646306)
Posted 25 Feb 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
It works!

...



Nice work!

If You have problems with high cpu usage try googling libsleep or take a look at https://bitcointalk.org/index.php?topic=180658.0

Google "nvidia sched yield" too and find the nvidia initialization code from your seti app source. Try with busy waiting/sleep/yield or whatever options Mac has.
14) Message boards : Number crunching : The Highest Ranked SETI AMD Host is a MAC: Time for a STOCK MAC APP? (Message 1645795)
Posted 24 Feb 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
Please try : rename it so that it has numbers.
15) Message boards : Number crunching : The Highest Ranked SETI AMD Host is a MAC: Time for a STOCK MAC APP? (Message 1645762)
Posted 23 Feb 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
You need to have a file MultiBeam_Kernels_r2120.cl in your seti directory.

Just copy the MultiBeam_Kernels.cl file to that file.
16) Message boards : Number crunching : GPU performance of GTX580 and GTX780Ti (Message 1644728)
Posted 21 Feb 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
Thank you for your advices.
I quickly appended "app_config.xml" to run 2 tasks per GPU.

But why 2 or 3 tasks are recommended for the performance?
BOINC's default setting is 1 task per GPU.



One at at time works for all (If any).

Sending work, receiving answers, seding new .. They all have a wait time (latency). Both the CPU and GPU can do something valuable when waiting for the other.
17) Message boards : Number crunching : Error While Computing.. (Message 1644660)
Posted 20 Feb 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
... and the stdrerr is:
[]
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -4 (0xfffffffc)
</message>
<stderr_txt>
setiathome_v7 7.00 DevC++/MinGW/g++ 4.5.2
libboinc: 7.1.0

Work Unit Info:
...............
WU true angle range is : 2.714941
Optimal function choices:
--------------------------------------------------------
name timing error
--------------------------------------------------------
v_BaseLineSmooth (no other)
v_vGetPowerSpectrumUnrolled 0.001067 0.00000
Memory allocation failed in ChooseChirp
BH SSE folding 0.001108 0.00000
SETI@home error -4 Can't allocate memory
!binsAboveThreshold
File: ../pulsefind.cpp
Line: 89


</stderr_txt>
]]>
[]
18) Message boards : Number crunching : GPU performance of GTX580 and GTX780Ti (Message 1644657)
Posted 20 Feb 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
Do you use app_config.xml to run 2 tasks per GPU?
... ...


I'd too advise of doing 2 (maybe 3) at a time.
19) Message boards : Number crunching : GPU performance of GTX580 and GTX780Ti (Message 1644656)
Posted 20 Feb 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
I can't comment on the gtx580, but my pair of GTX780 are doing about 45000....

My three GTX780 are doing ~ 70000+ (on MB only for now).
20) Message boards : Number crunching : near vlar? (AR 0.01) super slow on NVIDIA gtx 780 (Message 1642245)
Posted 15 Feb 2015 by Profile petri33 - 3(1MB+1AP) streamingProject donor
The vlars can be sent to my nvidia any time from now on. I made a bug to the code that checks the AR and assigns a char zero to null whenever ar < z.

setiathome enhanced x41zc, Cuda 6.50 special Detected setiathome_enhanced_v7 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.008647 SIGSEGV: segmentation violation Stack trace (8 frames): ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65[0x4ae0a0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfc90)[0x7f2cc7e6bc90] ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65[0x41654e] ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65[0x423275] ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65[0x42def4] ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65[0x406a26] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f2cc4618ec5] ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65[0x40a027] Exiting... </stderr_txt> ]]>


Next 20

Copyright © 2015 University of California