Public beta for nVidia AstroPulse, rev 521

Author	Message
Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1137408 - Posted: 7 Aug 2011, 20:49:31 UTC - in response to Message 1137405. and also because of the excessive CPU usage. Claggy Let's be more precise. Could you point to reports about high CPU usage on 280.xx drivers? Maybe I missed them? ID: 1137408 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1137415 - Posted: 7 Aug 2011, 20:53:54 UTC - in response to Message 1137408. And don't forget it's a test!. Good tester should try one config, record all possible issues, then try another one. We want to get fuill picture here, not just to find good config for tester himself and stay in dark for all other possible configs. From this point of view I would recommed all who experience high CPU usage on low blanked tasks to test this behavior with 280.xx drivers, report it, and if high CPU usage remains, suspend testing until next binary/driver update or revert to proven driver version like 267.xx. ID: 1137415 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1137420 - Posted: 7 Aug 2011, 21:12:07 UTC - in response to Message 1137408. Last modified: 7 Aug 2011, 21:23:17 UTC and also because of the excessive CPU usage. Claggy Let's be more precise. Could you point to reports about high CPU usage on 280.xx drivers? Maybe I missed them? I've been doing AP and v7 benches all day with 280.19 and Cat 11.7/11.8, both the NV_r521 apps and the ATI_r521 apps use lots of CPU, as does the v7 ATI__r331 app, bench results in their respective threads. I also did a couple of AP tasks with 280.19 and Cat 11.7, the NV_r521 app even got stalled by a CPU Einstein task starting, i had to suspend the Einstein task to get the GPU usage to go back up, Here are the two NV_AP tasks that took a long time and had heavy CPU usage (normal CPU usage is around 600 secs for 0% Blanked tasks): resultid=2014222887 resultid=2014199218 and here are two ATI_AP tasks that had excessive CPU usage with Cat 11.7: resultid=2000626885 resultid=2000626711 I've since then gone back down to 267.24 and Cat 11.5/SDK2.5 Claggy ID: 1137420 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1137422 - Posted: 7 Aug 2011, 21:23:41 UTC - in response to Message 1137420. I see, thanks. Interesting, that zero-blanked task has lower CPU time than low-blanked one... two times increased elapsed time... It's definitely not normal behavior. I await another reports, preferably from NV-GPUs only and different NV GPU generations, non-FERMI too. ID: 1137422 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1137425 - Posted: 7 Aug 2011, 21:33:30 UTC - in response to Message 1137422. I see, thanks. Interesting, that zero-blanked task has lower CPU time than low-blanked one... two times increased elapsed time... It's definitely not normal behavior. I await another reports, preferably from NV-GPUs only and different NV GPU generations, non-FERMI too. The first NV task i ran with 280.19 had about 5% GPU usage, i suspended it after an hour and a half at 50%, i let the GTX460 do a couple of shorties, then unsuspended the AP task, then it showed almost normal GPU usage, Claggy ID: 1137425 ·

Paul D Harris Volunteer tester Send message Joined: 1 Dec 99 Posts: 1122 Credit: 33,600,005 RAC: 0	Message 1137432 - Posted: 7 Aug 2011, 22:32:04 UTC - in response to Message 1137383. Add this to my cmdline -no_cpu_lock with space before the -no_cpu_lock raistmerâ€™s post -no_cpu_lock - disables affinity setting <cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 8 -instances_per_device 2 -hp -no_cpu_lock</cmdline> ID: 1137432 ·

Cliff Harding Volunteer tester Send message Joined: 18 Aug 99 Posts: 1432 Credit: 110,967,840 RAC: 67	Message 1137449 - Posted: 8 Aug 2011, 0:46:43 UTC Current System Info: i7/950 64-bit, EVGA GTX460SE, EVGA GTS250, 6Gb ram, nVIDIA 266.58, BOINC 6.12.33, beta nVIDIA Astropulse r521, BOINC Recheduler 2.6 History: BIONC 6.13.1, nVIDIA 270.61, beta nVIDIA Astropulse r521, BOINC Recheduler 2.6 Under the conditions listed under History, ran 4 AP tasks which ended in -202 errors, with numerous restarts. The AP tasks were rescheduled manually via the rescheduler. http://setiathome.berkeley.edu/results.php?hostid=5501972&offset=0&show_names=0&state=5&appid=5 After more continued reading of this thread, I decided to revert both the nVIDIA driver and the BOINC application to that which is listed under Current System Info. Questions: 1) The AP currently being d/l (1) or waiting to start (5) appear to be normal ap tasks that I was getting prior to testing, even though none have started at this point. Shouldn't there be an application description change to note that these tasks are for cuda processing as in the change from MB6.03 to 6.10(fermi)? 2) Do the AP tasks automatically run as cuda tasks or must they be rescheduled as such? Did this old man's eyes have missed something that touched on either of these questions? If so, I apologize. I don't buy computers, I build them!! ID: 1137449 ·

perryjay Volunteer tester Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0	Message 1137456 - Posted: 8 Aug 2011, 1:20:03 UTC - in response to Message 1137449. Sorry Cliff but you are right, all the AP work you show are for the CPU. Any that you get for the GPU will be marked as astropulse_v505 5.05 (cuda) in your BM. Also, on your tasks page it will show up as Astropulse v505 Anonymous platform (NVIDIA GPU) PROUD MEMBER OF Team Starfire World BOINC ID: 1137456 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1137539 - Posted: 8 Aug 2011, 6:22:58 UTC Correction. If the units are resheduled they still show as CPU work. You only see in stderr.txt when finnished by GPU. Look at my host, i have hundreds of MB work marked as ATI finnished by CPU. With each crime and every kindness we birth our future. ID: 1137539 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 1137549 - Posted: 8 Aug 2011, 7:10:12 UTC Raistmer.............. On both of my boxes containing GTX 580 I found very high CPU use while running the AP Open CL app. Using version NVIDIA drivers 280.19 and version 275.33. Installed driver version 266.58 and CPU useage went to normal. I was also seeing driver reset after Boinc stopped. Reduced unroll to 8 and solved this problem. Currently using this cmdline and count. <cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 8 -instances_per_device 4</cmdline> <count>0.25</count> Boinc....Boinc....Boinc....Boinc.... ID: 1137549 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1137560 - Posted: 8 Aug 2011, 8:37:05 UTC - in response to Message 1137549. I was also seeing driver reset after Boinc stopped. Reduced unroll to 8 and solved this problem. Currently using this cmdline and count. <cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 8 -instances_per_device 4</cmdline> <count>0.25</count> Thanks, it's a new info. It means driver reset at BOINC stop not nessasary connected with any BOINC API flaws. ID: 1137560 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 1137567 - Posted: 8 Aug 2011, 9:43:22 UTC - in response to Message 1137560. I was also seeing driver reset after Boinc stopped. Reduced unroll to 8 and solved this problem. Currently using this cmdline and count. <cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 8 -instances_per_device 4</cmdline> <count>0.25</count> Thanks, it's a new info. It means driver reset at BOINC stop not nessasary connected with any BOINC API flaws. Well, there are a lot of reasons for driver resets - including overheating, undervolting, bad VRAM - and probably more, even before we get on to driver bugs and application quirks. But we do need to get rid of all the known application issues, so we can clearly see whatever else is lying underneath, and move on to dealing with that. ID: 1137567 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 1137578 - Posted: 8 Aug 2011, 11:37:24 UTC - in response to Message 1137560. I was also seeing driver reset after Boinc stopped. Reduced unroll to 8 and solved this problem. Currently using this cmdline and count. <cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 8 -instances_per_device 4</cmdline> <count>0.25</count> Thanks, it's a new info. It means driver reset at BOINC stop not nessasary connected with any BOINC API flaws. My unscientific observation "felt like" it was due to the AP Open CL app since it only happened when that app had been running. Boinc....Boinc....Boinc....Boinc.... ID: 1137578 ·

Slavac Volunteer tester Send message Joined: 27 Apr 11 Posts: 1932 Credit: 17,952,639 RAC: 0	Message 1137654 - Posted: 8 Aug 2011, 17:32:18 UTC - in response to Message 1137578. This is new: 8/8/2011 12:32:02 PM \| SETI@home \| Task ap_22ap11ac_B5_P1_00172_20110804_31703.wu_1 exited with zero status but no 'finished' file 8/8/2011 12:32:02 PM \| SETI@home \| If this happens repeatedly you may need to reset the project. Executive Director GPU Users Group Inc. - brad@gpuug.org ID: 1137654 ·

Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0	Message 1137677 - Posted: 8 Aug 2011, 19:00:17 UTC - in response to Message 1137654. Last modified: 8 Aug 2011, 19:01:47 UTC That is no error message. And furthermore, it's not application specific but has to do with the heartbeat checks between every application and the BOINC client. GruÃŸ, Gundolf [edit]And never reset the project because of that recommandation. It's absolute nonsense![/edit] ID: 1137677 ·

Matthias Lehmkuhl Volunteer tester Send message Joined: 5 Oct 99 Posts: 28 Credit: 10,832,348 RAC: 53	Message 1137685 - Posted: 8 Aug 2011, 19:57:10 UTC Last modified: 8 Aug 2011, 20:02:39 UTC could finish two more results. http://setiathome.berkeley.edu/result.php?resultid=2029837505 runtime 4,042.37 CPU time 558.98 single pulses: 4 repetitive pulses: 0 percent blanked: 0.00 waiting for validation to Astropulse v505 v5.05 finished 22 Jun 2011 \| 13:01:24 UTC Will hopefully validate when ap_validate will run again. edit: sorry this result runs on CPU not GPU http://setiathome.berkeley.edu/result.php?resultid=2031637483 runtime 12.46 CPU time 10.44 In ap_remove_radar.cpp: get_indices_to_randomize: num_ffts_forecast < 100. Blanking too much RFI? waiting for validation to Astropulse v505 v5.05 finished 8 Aug 2011 \| 11:22:27 UTC Will hopefully validate when ap_validate will run again. Matthias ID: 1137685 ·

CryptokiD Send message Joined: 2 Dec 00 Posts: 150 Credit: 3,216,632 RAC: 0	Message 1137749 - Posted: 8 Aug 2011, 21:46:47 UTC - in response to Message 1137109. half the work units were invalid, running ap made my desktop laggy, when after endless tweaking of the app info. I did fast look through your results and found 8k and 16k values for FFA block params. Did you try lower values too? Could you list FFA params values you tried already? Also, could you remember non-overflowed results (not 30 pulses in both categories) that give invalid/inconclusive? Or all of them were overflows ? i did try some lower ffa values, but it didnt seem to matter, the desktop was always laggy when running astropulse 521. i don't remember the exact params i tried but i did go pretty low to try and iron things out. most of my errors were from the -30 overflows. since i stopped running astropulse i have been able to add 400mhz to my cida cards memory speed and not get errors. and 15mhz to the core. it seems to me that running astropulse really taxes the heck out of a cuda card. you cant have the slightest instability or it will error out. with multibeam you can overclock to the point where you are unstable and still get 98% of the work units validated. by the way i did try running the card at default values for core shader and memory speed, just to see if that would help but i still got errors. it just didnt work for me. ID: 1137749 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1137782 - Posted: 8 Aug 2011, 22:32:34 UTC - in response to Message 1137749. Last modified: 8 Aug 2011, 22:35:11 UTC half the work units were invalid, running ap made my desktop laggy, when after endless tweaking of the app info. I did fast look through your results and found 8k and 16k values for FFA block params. Did you try lower values too? Could you list FFA params values you tried already? Also, could you remember non-overflowed results (not 30 pulses in both categories) that give invalid/inconclusive? Or all of them were overflows ? i did try some lower ffa values, but it didnt seem to matter, the desktop was always laggy when running astropulse 521. i don't remember the exact params i tried but i did go pretty low to try and iron things out. most of my errors were from the -30 overflows. since i stopped running astropulse i have been able to add 400mhz to my cida cards memory speed and not get errors. and 15mhz to the core. it seems to me that running astropulse really taxes the heck out of a cuda card. you cant have the slightest instability or it will error out. with multibeam you can overclock to the point where you are unstable and still get 98% of the work units validated. by the way i did try running the card at default values for core shader and memory speed, just to see if that would help but i still got errors. it just didnt work for me. You're running the ap_5.06_win_x86_SSE3_OpenCL_NV_r521.exe app on a AMD Athlon 64 Processor 3200+, correct me if i'm wrong, But I'm sure it only has SSE2, not the SSE3 the app requires, which might be why almost every task is inconclusive. Claggy ID: 1137782 ·

CryptokiD Send message Joined: 2 Dec 00 Posts: 150 Credit: 3,216,632 RAC: 0	Message 1137902 - Posted: 9 Aug 2011, 4:49:39 UTC - in response to Message 1137782. according to cpu-z my cpu has mmx, 3dnow!, see 1 2 and 3, and is x86-64 capable. ID: 1137902 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1137959 - Posted: 9 Aug 2011, 7:05:35 UTC You have to wait til validators are running again to see if your units get validated. I had 35 in a row last week all validated except 1. For screen lag you should try lower unroll factor. Your GPU only has 4 compute units so i would try unroll 4 or 3 to see if it gets better. What i could see on your results you only changed FFA values. Even with high params 16384 you have 2 results with 0/0 signals found. With each crime and every kindness we birth our future. ID: 1137959 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.