Public beta for nVidia AstroPulse, rev 521

Message boards : Number crunching : Public beta for nVidia AstroPulse, rev 521
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 30 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1137408 - Posted: 7 Aug 2011, 20:49:31 UTC - in response to Message 1137405.  

and also because of the excessive CPU usage.

Claggy


Let's be more precise. Could you point to reports about high CPU usage on 280.xx drivers? Maybe I missed them?
ID: 1137408 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1137415 - Posted: 7 Aug 2011, 20:53:54 UTC - in response to Message 1137408.  

And don't forget it's a test!. Good tester should try one config, record all possible issues, then try another one. We want to get fuill picture here, not just to find good config for tester himself and stay in dark for all other possible configs.

From this point of view I would recommed all who experience high CPU usage on low blanked tasks to test this behavior with 280.xx drivers, report it, and if high CPU usage remains, suspend testing until next binary/driver update or revert to proven driver version like 267.xx.
ID: 1137415 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1137420 - Posted: 7 Aug 2011, 21:12:07 UTC - in response to Message 1137408.  
Last modified: 7 Aug 2011, 21:23:17 UTC

and also because of the excessive CPU usage.

Claggy


Let's be more precise. Could you point to reports about high CPU usage on 280.xx drivers? Maybe I missed them?

I've been doing AP and v7 benches all day with 280.19 and Cat 11.7/11.8, both the NV_r521 apps and the ATI_r521 apps use lots of CPU, as does the v7 ATI__r331 app, bench results in their respective threads.
I also did a couple of AP tasks with 280.19 and Cat 11.7, the NV_r521 app even got stalled by a CPU Einstein task starting, i had to suspend the Einstein task to get the GPU usage to go back up,

Here are the two NV_AP tasks that took a long time and had heavy CPU usage (normal CPU usage is around 600 secs for 0% Blanked tasks):

resultid=2014222887

resultid=2014199218

and here are two ATI_AP tasks that had excessive CPU usage with Cat 11.7:

resultid=2000626885

resultid=2000626711

I've since then gone back down to 267.24 and Cat 11.5/SDK2.5

Claggy
ID: 1137420 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1137422 - Posted: 7 Aug 2011, 21:23:41 UTC - in response to Message 1137420.  

I see, thanks.
Interesting, that zero-blanked task has lower CPU time than low-blanked one... two times increased elapsed time... It's definitely not normal behavior.

I await another reports, preferably from NV-GPUs only and different NV GPU generations, non-FERMI too.
ID: 1137422 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1137425 - Posted: 7 Aug 2011, 21:33:30 UTC - in response to Message 1137422.  

I see, thanks.
Interesting, that zero-blanked task has lower CPU time than low-blanked one... two times increased elapsed time... It's definitely not normal behavior.

I await another reports, preferably from NV-GPUs only and different NV GPU generations, non-FERMI too.

The first NV task i ran with 280.19 had about 5% GPU usage, i suspended it after an hour and a half at 50%, i let the GTX460 do a couple of shorties, then unsuspended the AP task, then it showed almost normal GPU usage,

Claggy
ID: 1137425 · Report as offensive
Profile Paul D Harris
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1122
Credit: 33,600,005
RAC: 0
United States
Message 1137432 - Posted: 7 Aug 2011, 22:32:04 UTC - in response to Message 1137383.  

Add this to my cmdline -no_cpu_lock with space before the -no_cpu_lock raistmer’s post

-no_cpu_lock - disables affinity setting


<cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 8 -instances_per_device 2 -hp -no_cpu_lock</cmdline>

ID: 1137432 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1137449 - Posted: 8 Aug 2011, 0:46:43 UTC

Current System Info:
i7/950 64-bit, EVGA GTX460SE, EVGA GTS250, 6Gb ram, nVIDIA 266.58, BOINC 6.12.33, beta nVIDIA Astropulse r521, BOINC Recheduler 2.6
History:
BIONC 6.13.1, nVIDIA 270.61, beta nVIDIA Astropulse r521, BOINC Recheduler 2.6

Under the conditions listed under History, ran 4 AP tasks which ended in -202 errors, with numerous restarts. The AP tasks were rescheduled manually via the rescheduler.
http://setiathome.berkeley.edu/results.php?hostid=5501972&offset=0&show_names=0&state=5&appid=5

After more continued reading of this thread, I decided to revert both the nVIDIA driver and the BOINC application to that which is listed under Current System Info.

Questions:
1) The AP currently being d/l (1) or waiting to start (5) appear to be normal ap tasks that I was getting prior to testing, even though none have started at this point. Shouldn't there be an application description change to note that these tasks are for cuda processing as in the change from MB6.03 to 6.10(fermi)?

2) Do the AP tasks automatically run as cuda tasks or must they be rescheduled as such?

Did this old man's eyes have missed something that touched on either of these questions? If so, I apologize.


I don't buy computers, I build them!!
ID: 1137449 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1137456 - Posted: 8 Aug 2011, 1:20:03 UTC - in response to Message 1137449.  

Sorry Cliff but you are right, all the AP work you show are for the CPU. Any that you get for the GPU will be marked as astropulse_v505 5.05 (cuda) in your BM. Also, on your tasks page it will show up as Astropulse v505
Anonymous platform (NVIDIA GPU)


PROUD MEMBER OF Team Starfire World BOINC
ID: 1137456 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1137539 - Posted: 8 Aug 2011, 6:22:58 UTC

Correction.

If the units are resheduled they still show as CPU work.
You only see in stderr.txt when finnished by GPU.
Look at my host, i have hundreds of MB work marked as ATI finnished by CPU.



With each crime and every kindness we birth our future.
ID: 1137539 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 1137549 - Posted: 8 Aug 2011, 7:10:12 UTC

Raistmer..............

On both of my boxes containing GTX 580 I found very high CPU use while running the AP Open CL app. Using version NVIDIA drivers 280.19 and version 275.33. Installed driver version 266.58 and CPU useage went to normal. I was also seeing driver reset after Boinc stopped. Reduced unroll to 8 and solved this problem. Currently using this cmdline and count.

<cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 8 -instances_per_device 4</cmdline>
<count>0.25</count>

Boinc....Boinc....Boinc....Boinc....
ID: 1137549 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1137560 - Posted: 8 Aug 2011, 8:37:05 UTC - in response to Message 1137549.  

I was also seeing driver reset after Boinc stopped. Reduced unroll to 8 and solved this problem. Currently using this cmdline and count.

<cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 8 -instances_per_device 4</cmdline>
<count>0.25</count>


Thanks, it's a new info. It means driver reset at BOINC stop not nessasary connected with any BOINC API flaws.
ID: 1137560 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1137567 - Posted: 8 Aug 2011, 9:43:22 UTC - in response to Message 1137560.  

I was also seeing driver reset after Boinc stopped. Reduced unroll to 8 and solved this problem. Currently using this cmdline and count.

<cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 8 -instances_per_device 4</cmdline>
<count>0.25</count>

Thanks, it's a new info. It means driver reset at BOINC stop not nessasary connected with any BOINC API flaws.

Well, there are a lot of reasons for driver resets - including overheating, undervolting, bad VRAM - and probably more, even before we get on to driver bugs and application quirks.

But we do need to get rid of all the known application issues, so we can clearly see whatever else is lying underneath, and move on to dealing with that.
ID: 1137567 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 1137578 - Posted: 8 Aug 2011, 11:37:24 UTC - in response to Message 1137560.  

I was also seeing driver reset after Boinc stopped. Reduced unroll to 8 and solved this problem. Currently using this cmdline and count.

<cmdline>-ffa_block 6144 -ffa_block_fetch 1536 -unroll 8 -instances_per_device 4</cmdline>
<count>0.25</count>


Thanks, it's a new info. It means driver reset at BOINC stop not nessasary connected with any BOINC API flaws.


My unscientific observation "felt like" it was due to the AP Open CL app since it only happened when that app had been running.
Boinc....Boinc....Boinc....Boinc....
ID: 1137578 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1137654 - Posted: 8 Aug 2011, 17:32:18 UTC - in response to Message 1137578.  

This is new:

8/8/2011 12:32:02 PM | SETI@home | Task ap_22ap11ac_B5_P1_00172_20110804_31703.wu_1 exited with zero status but no 'finished' file
8/8/2011 12:32:02 PM | SETI@home | If this happens repeatedly you may need to reset the project.



Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1137654 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1137677 - Posted: 8 Aug 2011, 19:00:17 UTC - in response to Message 1137654.  
Last modified: 8 Aug 2011, 19:01:47 UTC

That is no error message.

And furthermore, it's not application specific but has to do with the heartbeat checks between every application and the BOINC client.

Gruß,
Gundolf
[edit]And never reset the project because of that recommandation. It's absolute nonsense![/edit]
ID: 1137677 · Report as offensive
Matthias Lehmkuhl Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 5 Oct 99
Posts: 28
Credit: 10,832,348
RAC: 53
Germany
Message 1137685 - Posted: 8 Aug 2011, 19:57:10 UTC
Last modified: 8 Aug 2011, 20:02:39 UTC

could finish two more results.
http://setiathome.berkeley.edu/result.php?resultid=2029837505
runtime 4,042.37
CPU time 558.98
single pulses: 4
repetitive pulses: 0
percent blanked: 0.00
waiting for validation to Astropulse v505 v5.05 finished 22 Jun 2011 | 13:01:24 UTC
Will hopefully validate when ap_validate will run again.

edit: sorry this result runs on CPU not GPU
http://setiathome.berkeley.edu/result.php?resultid=2031637483
runtime 12.46
CPU time 10.44
In ap_remove_radar.cpp: get_indices_to_randomize: num_ffts_forecast < 100. Blanking too much RFI?
waiting for validation to Astropulse v505 v5.05 finished 8 Aug 2011 | 11:22:27 UTC
Will hopefully validate when ap_validate will run again.
Matthias

ID: 1137685 · Report as offensive
CryptokiD
Avatar

Send message
Joined: 2 Dec 00
Posts: 150
Credit: 3,216,632
RAC: 0
United States
Message 1137749 - Posted: 8 Aug 2011, 21:46:47 UTC - in response to Message 1137109.  

half the work units were invalid, running ap made my desktop laggy, when after endless tweaking of the app info.

I did fast look through your results and found 8k and 16k values for FFA block params. Did you try lower values too? Could you list FFA params values you tried already?

Also, could you remember non-overflowed results (not 30 pulses in both categories) that give invalid/inconclusive? Or all of them were overflows ?


i did try some lower ffa values, but it didnt seem to matter, the desktop was always laggy when running astropulse 521. i don't remember the exact params i tried but i did go pretty low to try and iron things out.

most of my errors were from the -30 overflows.

since i stopped running astropulse i have been able to add 400mhz to my cida cards memory speed and not get errors. and 15mhz to the core. it seems to me that running astropulse really taxes the heck out of a cuda card. you cant have the slightest instability or it will error out. with multibeam you can overclock to the point where you are unstable and still get 98% of the work units validated.

by the way i did try running the card at default values for core shader and memory speed, just to see if that would help but i still got errors. it just didnt work for me.
ID: 1137749 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1137782 - Posted: 8 Aug 2011, 22:32:34 UTC - in response to Message 1137749.  
Last modified: 8 Aug 2011, 22:35:11 UTC

half the work units were invalid, running ap made my desktop laggy, when after endless tweaking of the app info.

I did fast look through your results and found 8k and 16k values for FFA block params. Did you try lower values too? Could you list FFA params values you tried already?

Also, could you remember non-overflowed results (not 30 pulses in both categories) that give invalid/inconclusive? Or all of them were overflows ?


i did try some lower ffa values, but it didnt seem to matter, the desktop was always laggy when running astropulse 521. i don't remember the exact params i tried but i did go pretty low to try and iron things out.

most of my errors were from the -30 overflows.

since i stopped running astropulse i have been able to add 400mhz to my cida cards memory speed and not get errors. and 15mhz to the core. it seems to me that running astropulse really taxes the heck out of a cuda card. you cant have the slightest instability or it will error out. with multibeam you can overclock to the point where you are unstable and still get 98% of the work units validated.

by the way i did try running the card at default values for core shader and memory speed, just to see if that would help but i still got errors. it just didnt work for me.

You're running the ap_5.06_win_x86_SSE3_OpenCL_NV_r521.exe app on a AMD Athlon 64 Processor 3200+, correct me if i'm wrong, But I'm sure it only has SSE2, not the SSE3 the app requires, which might be why almost every task is inconclusive.

Claggy
ID: 1137782 · Report as offensive
CryptokiD
Avatar

Send message
Joined: 2 Dec 00
Posts: 150
Credit: 3,216,632
RAC: 0
United States
Message 1137902 - Posted: 9 Aug 2011, 4:49:39 UTC - in response to Message 1137782.  

according to cpu-z my cpu has mmx, 3dnow!, see 1 2 and 3, and is x86-64 capable.
ID: 1137902 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1137959 - Posted: 9 Aug 2011, 7:05:35 UTC


You have to wait til validators are running again to see if your units get validated.
I had 35 in a row last week all validated except 1.

For screen lag you should try lower unroll factor.
Your GPU only has 4 compute units so i would try unroll 4 or 3 to see if it gets better.

What i could see on your results you only changed FFA values.

Even with high params 16384 you have 2 results with 0/0 signals found.



With each crime and every kindness we birth our future.
ID: 1137959 · Report as offensive
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 30 · Next

Message boards : Number crunching : Public beta for nVidia AstroPulse, rev 521


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.