Message boards :
Number crunching :
All of a sudden: Errors on APs
Message board moderation
Author | Message |
---|---|
Ulrich Metzner Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 |
Hi there, all of a sudden i start getting errors in APs, which i don't understand: http://setiathome.berkeley.edu/results.php?hostid=157931&offset=0&show_names=0&state=6&appid= The app is r2399 and is running on driver 342.50 on this computer: http://setiathome.berkeley.edu/show_host_detail.php?hostid=157931 Any help/suggestions highly appreciated... :? Aloha, Uli |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Looks like app crash after finnish to me. It happens when your host lacks ressources IE is busy otherwise during boinc_finnish call. Try to increase -unroll and ffa_fetch values. You are using -use_sleep so thats recommended. With each crime and every kindness we birth our future. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Yes, it's elusive "crash after finish" bug that plagues mostly Nv builds. Unfortunately, it didn't eradicated so far and shows itself time to time in AP 7.03 NV too. I experienced it on my host too with high number of crashes... and suddenly all crashes disappear for few weaks. Maybe host reboot helped, maybe some Windows updates.... I plan to take next attempt against this bug soon but for now we must just live with it. Try to reboot host. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
As another possible fix try to add -cpu_lock option and see if there will be any crashes with this option active? EDIT: also you could try to help in debugging by participation in beta: http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2186&postid=52208 |
Ulrich Metzner Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 |
Ok, the other Aps went thru fine. But i have an interesting observation for the developers of OpenCL on NVidias: If i run only one AP per GPU, i can only get to exactly 50% GPU load, that's why i let ran 2 per GPU. Now on the last AP left running alone on the main GPU, i had the 47-50% GPU load. And now for the WOW: The moment i started DVB-C streaming, the GPU load got up to nearly 100% - and no, it's not because of the streaming, the calculation with one WU is really crunching faster, if there is some "background load" running parallel on the same GPU. The moment i stop the streaming, the GPU load drops to values below 50% and the crunching is SLOWER! I'm stunned! :? :? :? Aloha, Uli |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Check GPU clock for shader/memory.. Also, it can be not hardware effect but software one. Change driver priority of smth alike. Even it can be CPU effect - higher CPU (not GPU) power state allows quicker CPU response to driver needs. It's all connected... |
Ulrich Metzner Send message Joined: 3 Jul 02 Posts: 1256 Credit: 13,565,513 RAC: 13 |
I just tested r2667 of the AP executables and it totally fails on XP: http://setiathome.berkeley.edu/results.php?hostid=157931&offset=0&show_names=0&state=6&appid= ...back to r2058 for now. :/ Aloha, Uli |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I hope that this fixed in latest AP v7 builds. Soon 7.04 will be deployed and will see. |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
Im getting AP errors in droves. On my I7 920, This host I will reboot and see what happens. I just recently up garded this machine to the latest lunatics app. Is it my machine or something else. [/quote] Old James |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Try r_2667 please. You can get it from my website. With each crime and every kindness we birth our future. |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
Update.When I noticed the erros I had 6. when i went to rebbot I had 9. after reboot ( which was a power down and then back on ) I had 11 errors. Here is a copy and paste. <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> (unknown error) - exit code -42 (0xffffffd6) </message> <stderr_txt> Running on device number: 0 Priority of worker thread raised successfully Priority of process adjusted successfully, below normal priority class used OpenCL platform detected: NVIDIA Corporation BOINC assigns device 0 Info: BOINC provided OpenCL device ID used Used GPU device parameters are: Number of compute units: 16 Single buffer allocation size: 256MB max WG size: 512 FERMI path used: no Build features: Non-graphics OpenCL USE_OPENCL_NV TWIN_FFA OCL_ZERO_COPY COMBINED_DECHIRP_KERNEL FFTW USE_INCREASED_PRECISION USE_SSE2 x86 CPUID: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz Cache: L1=64K L2=256K CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AstroPulse v6 Windows x86 rev 2399, V6 match, by Raistmer with support of Lunatics.kwsn.net team. SSE2 OpenCL version by Raistmer ffa threshold mods by Joe Segur SSE3 dechirping by JDWhale Combined dechirp kernel by Frizz Number of OpenCL platforms: 1 OpenCL Platform Name: NVIDIA CUDA Number of devices: 1 Max compute units: 16 Max work group size: 512 Max clock frequency: 1836Mhz Max memory allocation: 268435456 Cache type: None Cache line size: 0 Cache size: 0 Global memory size: 1073741824 Constant buffer size: 65536 Max number of constant args: 9 Local memory type: Scratchpad Local memory size: 16384 Queue properties: Out-of-Order: Yes Name: GeForce GTS 250 Vendor: NVIDIA Corporation Driver version: 337.88 Version: OpenCL 1.0 CUDA Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics state.fold_buf_size_short=65536; state.fold_buf_size_long=262144 INFO: can't open binary kernel file: C:\ProgramData\BOINC/projects/setiathome.berkeley.edu\AstroPulse_Kernels_r2399.cl_GeForceGTS250.bin_V6_TWIN_FFA_33788, continue with recompile... Error : Building Program (source, clBuildProgram):main kernels: not OK code -42 ptxas : error : Entry function 'GPU_fetch_array_kernel_twin_1D_cl' uses too much shared data (0x4034 bytes, 0x4000 max) </stderr_txt> ]]> HOME PARTICIPATE ABOUT COMMUNITY ACCOUNT STATISTICS @Mike- Is what I have going on, The same thing affecting others? Edit- I have suspended work untill I get some advice. Erros are up to 14 now so a reboot didnt help. So far my i7-3770s are error free. [/quote] Old James |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
It happens on some pre fermi cards. Nothing to do with crash after exit. With each crime and every kindness we birth our future. |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
It happens on some pre fermi cards. So I should try r_2667? And should I abort my AP work?Seeing as its erroring outanyway? [/quote] Old James |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
It happens on some pre fermi cards. Yes, it should be fixed in 2667. You dont need to abort work. Just change the name of the app in appinfo.xml after copying the files. Stop boinc first. With each crime and every kindness we birth our future. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It happens on some pre fermi cards. Your best bet is to Stop BOINC and run the Old Lunatics installer and Install r1843. You don't have to abort any work. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Are you a Lunatic now ? With each crime and every kindness we birth our future. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Or you could download and run the v0.42a installer from the link in my message 1560695 |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
No, and unless James has progressed quite a bit since the last time I tried to have him do a manual install, He should stick to Installers. Ask him about the time I tried to get him to install a CPU App from Your site. |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
Or you could download and run the v0.42a installer from the link in my message 1560695 That is the one I downloaded. I made sure to use the app for the 200 series of Nvidea cards. Well as my APs were self destructing and I dont have more than 35 APs left on this host. I will try most anything. Now downloading the r_2667 and then changing the name of the app in the app_info.xml file sounds daunting. But how many times is that r-2667 named in that file? Or do I need to scan the whole file changing the app names? Id like to try what ever works. [/quote] Old James |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Or you could download and run the v0.42a installer from the link in my message 1560695 I guess 6 times. 3 sections 2 times each. IIRC With each crime and every kindness we birth our future. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.