VLARs now also sent to CUDA app?

Message boards : Number crunching : VLARs now also sent to CUDA app?
Message board moderation

To post messages, you must log in.

AuthorMessage
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1765148 - Posted: 15 Feb 2016, 9:54:16 UTC

Hi there,

i recently got a VLAR wu on the CUDA 5.0 app:

http://setiathome.berkeley.edu/result.php?resultid=4719989683

Since it will be gone in a few hours, here the stdout.txt:

Name	15oc15ab.6106.10701.8.35.245_1
Workunit	2059707558
Created	11 Feb 2016, 9:14:20 UTC
Sent	11 Feb 2016, 12:12:03 UTC
Report deadline	2 Apr 2016, 12:01:09 UTC
Received	15 Feb 2016, 0:41:20 UTC
Server state	Over
Outcome	Success
Client state	Done
Exit status	0 (0x0)
Computer ID	157931
Run time	5 hours 12 min 37 sec
CPU time	6 min 5 sec
Validate state	Valid
Credit	175.29
Device peak FLOPS	692.35 GFLOPS
Application version	SETI@home v8
Anonymous platform (NVIDIA GPU)
Peak working set size	91.43 MB
Peak swap size	127.82 MB
Peak disk usage	0.03 MB
Stderr output

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
v8 task detected
setiathome_CUDA: Found 2 CUDA device(s):
nVidia Driver Version 361.75
  Device 1: GeForce GT 640, 2048 MiB, regsPerBlock 65536
     computeCap 3.0, multiProcs 2 
     pciBusID = 1, pciSlotID = 0
  Device 2: GeForce GT 430, 512 MiB, regsPerBlock 32768
     computeCap 2.1, multiProcs 2 
     pciBusID = 5, pciSlotID = 0
     clockRate = 1400 MHz
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GT 640 is okay
SETI@home using CUDA accelerated device GeForce GT 640
mbcuda.cfg, matching pci device processpriority key detected
mbcuda.cfg, matching pci device pfblockspersm key detected
pulsefind: blocks per SM 2 
mbcuda.cfg, matching pci device pfperiodsperlaunch key detected
pulsefind: periods per launch 100 (default)
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zi (baseline v8), Cuda 5.00

setiathome_v8 task detected
Detected Autocorrelations as enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.080614

GPU current clockRate = 901 MHz

re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes
re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes
Thread call stack limit is: 1k
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
  Worker Acknowledging exit request, spinning-> boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0
v8 task detected
setiathome_CUDA: Found 2 CUDA device(s):
nVidia Driver Version 361.75
  Device 1: GeForce GT 640, 2048 MiB, regsPerBlock 65536
     computeCap 3.0, multiProcs 2 
     pciBusID = 1, pciSlotID = 0
  Device 2: GeForce GT 430, 512 MiB, regsPerBlock 32768
     computeCap 2.1, multiProcs 2 
     pciBusID = 5, pciSlotID = 0
     clockRate = 1400 MHz
In cudaAcc_initializeDevice(): Boinc passed DevPref 2
setiathome_CUDA: CUDA Device 2 specified, checking...
   Device 2: GeForce GT 430 is okay
SETI@home using CUDA accelerated device GeForce GT 430
mbcuda.cfg, matching pci device processpriority key detected
mbcuda.cfg, matching pci device pfblockspersm key detected
pulsefind: blocks per SM 4 (Fermi or newer default)
mbcuda.cfg, matching pci device pfperiodsperlaunch key detected
pulsefind: periods per launch 200 
Priority of process set to BELOW_NORMAL (default) successfully
Priority of worker thread set successfully

setiathome enhanced x41zi (baseline v8), Cuda 5.00

setiathome_v8 task detected
Detected Autocorrelations as enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.080614
re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes
re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes
Thread call stack limit is: 1k
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.

Flopcounter: 63430621422201.633000

Spike count:    1
Autocorr count: 0
Pulse count:    7
Triplet count:  0
Gaussian count: 0
Worker preemptively acknowledging a normal exit.->
called boinc_finish
Exit Status: 0
boinc_exit(): requesting safe worker shutdown ->
boinc_exit(): received safe worker shutdown acknowledge ->
Cuda threadsafe ExitProcess() initiated, rval 0

</stderr_txt>
]]>


Have a look at the AR:
"WU true angle range is : 0.080614"

This WU ran nearly "forever" compared to other WUs and i had a lot of lags while this was running. Is this a new adjustment to the server routine?
Aloha, Uli

ID: 1765148 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1765150 - Posted: 15 Feb 2016, 10:19:31 UTC - in response to Message 1765148.  

Have a look at the AR:
"WU true angle range is : 0.080614"

This WU ran nearly "forever" compared to other WUs and i had a lot of lags while this was running. Is this a new adjustment to the server routine?

Yes, that matches the discussion and conclusion we reached a week ago. A small arithmetic error in preparing the splitters to handle Green Bank (and other) telescope data shifted the threshhold for definition as a VLAR down by 50% - from 0.12 down to 0.06 (see Panic Mode On (102) Server Problems?)

Eric did reply "I'll fix it during this week's outage", but evidently something intervened and it dropped off the ToDo list. I'll remind him tomorrow, closer to the likely timeframe for action. (Today is a Federal Holiday - Presidents' Day - in the USA, so no point in writing today.)
ID: 1765150 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1765159 - Posted: 15 Feb 2016, 11:57:32 UTC

Richard, thanks for the info!
I must have overlooked that in the panic thread.
Aloha, Uli

ID: 1765159 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1765280 - Posted: 15 Feb 2016, 22:31:32 UTC

Yes they sure are slow: 0.06 in 14 minutes.

Name	14oc11af.23215.148572.15.42.51_0
Workunit	2059161656
Created	10 Feb 2016, 21:26:02 UTC
Sent	11 Feb 2016, 1:53:31 UTC
Report deadline	4 Apr 2016, 21:29:56 UTC
Received	11 Feb 2016, 4:59:36 UTC
Server state	Over
Outcome	Success
Client state	Done
Exit status	0 (0x0)
Computer ID	7475713
Run time	14 min 5 sec
CPU time	2 min
Validate state	Valid
Credit	105.25
Device peak FLOPS	7,698.43 GFLOPS
Application version	SETI@home v8
Anonymous platform (NVIDIA GPU)
Stderr output

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 4 CUDA device(s):
  Device 1: GeForce GTX 980, 4095 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 16 
     pciBusID = 1, pciSlotID = 0
  Device 2: GeForce GTX 780, 3071 MiB, regsPerBlock 65536
     computeCap 3.5, multiProcs 12 
     pciBusID = 2, pciSlotID = 0
  Device 3: GeForce GTX 980, 4095 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 16 
     pciBusID = 3, pciSlotID = 0
  Device 4: GeForce GTX 780, 3071 MiB, regsPerBlock 65536
     computeCap 3.5, multiProcs 12 
     pciBusID = 4, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 4
setiathome_CUDA: CUDA Device 4 specified, checking...
   Device 4: GeForce GTX 780 is okay
SETI@home using CUDA accelerated device GeForce GTX 780
Using pfb = 64 from command line args
Using pfp = 3 from command line args

setiathome v8 enhanced x41p_zm, Cuda 7.50 special
Compiled with NVCC 7.5, using 6.5 libraries. Modifications done by petri33.



Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.068821
Sigma 20
Thread call stack limit is: 1k
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
1,2,3,4,5,6,7,8,9,10,10,11,12,cudaAcc_free() DONE.
13
Flopcounter: 54084954807375.851562

Spike count:    4
Autocorr count: 0
Pulse count:    14
Triplet count:  4
Gaussian count: 0
06:55:48 (1116): called boinc_finish(0)

</stderr_txt>
]]>


To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1765280 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1765297 - Posted: 15 Feb 2016, 23:56:07 UTC - in response to Message 1765280.  

The reason for blocking VLARs from NVidia GPUs was that they caused significant system issues (screen lag, system becomes very sluggish/unresponsive).

Other than taking a long time to crunch (sometimes longer than the estimated times) my systems aren't showing any signs of sluggishness or lack of responsiveness.

So personally I wouldn't have any issue with them as long as the credit given reflected the work done.
Unfortunately that doesn't appear to be the case.

1,455.32 secs 66.27 credits
1,486.28 secs 101.89 credits
1,645.92 secs 123.29 credits
1,716.61 secs 77.68 credits
2,006.53 secs 91.27 credits
2,066.25 secs 88.43 credits
2,080.84 secs 109.99 credits
2,204.44 secs 103.19 credits
Grant
Darwin NT
ID: 1765297 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1765300 - Posted: 16 Feb 2016, 0:10:35 UTC

That's most definitely the very next challenge, as soon as the 3 main platforms, and possibly a fourth, Jetson TK1 I hear by PM today someone is working on :), come into lockstep. From there we basically make Petri style streaming + CPU use + some other special optimisations configurable to how a user wants to run (with special tools to help decide how best to do so, minimal nerdiness required)

Big long drawn out process for me, but I think worth it in the long run, over churning out builds with compromises.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1765300 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1765492 - Posted: 16 Feb 2016, 15:24:43 UTC

The pay is not always that bad. But compared to 40 points in 70 seconds for shoties there is still something.

This is a 0.07 in 11 minutes (227 credits).

Name	19mr11ad.15949.4984.13.40.98_1
Workunit	2065134343
Created	16 Feb 2016, 4:44:32 UTC
Sent	16 Feb 2016, 9:40:38 UTC
Report deadline	9 Apr 2016, 19:38:48 UTC
Received	16 Feb 2016, 14:45:14 UTC
Server state	Over
Outcome	Success
Client state	Done
Exit status	0 (0x0)
Computer ID	7475713
Run time	11 min 12 sec
CPU time	2 min 8 sec
Validate state	Valid
Credit	227.85
Device peak FLOPS	7,698.43 GFLOPS
Application version	SETI@home v8
Anonymous platform (NVIDIA GPU)
Stderr output

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_CUDA: Found 4 CUDA device(s):
  Device 1: GeForce GTX 980, 4095 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 16 
     pciBusID = 1, pciSlotID = 0
  Device 2: GeForce GTX 780, 3071 MiB, regsPerBlock 65536
     computeCap 3.5, multiProcs 12 
     pciBusID = 2, pciSlotID = 0
  Device 3: GeForce GTX 980, 4095 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 16 
     pciBusID = 3, pciSlotID = 0
  Device 4: GeForce GTX 780, 3071 MiB, regsPerBlock 65536
     computeCap 3.5, multiProcs 12 
     pciBusID = 4, pciSlotID = 0
In cudaAcc_initializeDevice(): Boinc passed DevPref 1
setiathome_CUDA: CUDA Device 1 specified, checking...
   Device 1: GeForce GTX 980 is okay
SETI@home using CUDA accelerated device GeForce GTX 980
Using pfb = 64 from command line args
Using pfp = 3 from command line args

setiathome v8 enhanced x41p_zm, Cuda 7.50 special
Compiled with NVCC 7.5, using 6.5 libraries. Modifications done by petri33.



Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.070283
Sigma 19
Thread call stack limit is: 1k
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
1,2,3,4,5,6,7,8,9,10,10,11,12,cudaAcc_free() DONE.
13
Flopcounter: 53215262177658.898438

Spike count:    4
Autocorr count: 1
Pulse count:    7
Triplet count:  2
Gaussian count: 0
16:44:55 (22723): called boinc_finish(0)

</stderr_txt>
]]>



To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1765492 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1765495 - Posted: 16 Feb 2016, 15:28:07 UTC

And the kitties continue to crunch whatever is sent to them with the best apps available to them.
Without complaint.
Meow.

Although they do send the optimizers their best wishes and Godspeed.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1765495 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1769011 - Posted: 2 Mar 2016, 16:28:58 UTC - in response to Message 1765150.  
Last modified: 2 Mar 2016, 16:32:51 UTC

Ulrich Metzner wrote:
Have a look at the AR:
"WU true angle range is : 0.080614"

This WU ran nearly "forever" compared to other WUs and i had a lot of lags while this was running. Is this a new adjustment to the server routine?


Richard Haselgrove wrote:

Yes, that matches the discussion and conclusion we reached a week ago. A small arithmetic error in preparing the splitters to handle Green Bank (and other) telescope data shifted the threshhold for definition as a VLAR down by 50% - from 0.12 down to 0.06 (see Panic Mode On (102) Server Problems?)

Eric did reply "I'll fix it during this week's outage", but evidently something intervened and it dropped off the ToDo list. I'll remind him tomorrow, closer to the likely timeframe for action. (Today is a Federal Holiday - Presidents' Day - in the USA, so no point in writing today.)


FYI

My PC still get/got a few ARs <0.12 for the AMD/ATI GPU app (examples):

0.08x
29ja10aa.19856.74043.4.31.195_0
Created 2 Mar 2016, 3:20:51 UTC
Sent 2 Mar 2016, 7:19:26 UTC

0.09x
29ja10aa.19856.75270.4.31.229_1
Created 2 Mar 2016, 3:26:04 UTC
Sent 2 Mar 2016, 7:24:48 UTC

29ja10aa.19856.75270.4.31.233_1
Created 2 Mar 2016, 3:26:04 UTC
Sent 2 Mar 2016, 7:24:48 UTC
ID: 1769011 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1769013 - Posted: 2 Mar 2016, 16:51:23 UTC - in response to Message 1769011.  

Others seem to have received some VLAR wu's (0.087915) too.

But the 26 min 40 sec from a 750Ti (Mac, Darwin) is not too bad.


http://setiathome.berkeley.edu/workunit.php?wuid=2080603401
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1769013 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1769015 - Posted: 2 Mar 2016, 16:55:51 UTC - in response to Message 1769011.  

Eric restored the VLAR angle range bound to the original value of 0.12 (for Arecibo recordings) at 21:56 UTC last night (1 Mar 2016) - round about the time the project was brought back up after maintenance.

https://setisvn.ssl.berkeley.edu/trac/changeset/3396

Obviously, tasks split before maintenance will still be working their way through the system, but those examples do look suspicious.

It's possible that the source code was updated, but new splitters aren't going to be deployed until after testing of the other change. Keep an eye on things, and let us know if you see any more.
ID: 1769015 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1769044 - Posted: 2 Mar 2016, 19:33:30 UTC - in response to Message 1769032.  
Last modified: 2 Mar 2016, 19:38:31 UTC

Agree still see a few on my cuda machine

Tut... Was this with Raistmer's SoG? Looking at the stderr it looks like it was the OpenCl version
ID: 1769044 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1769065 - Posted: 2 Mar 2016, 20:54:10 UTC - in response to Message 1769058.  

Yea I saw of few on my SoG machine as well.

AR 0.08
ID: 1769065 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1769220 - Posted: 3 Mar 2016, 12:18:15 UTC

And here, like WU 2081841575

created 3 Mar 2016, 2:00:06 UTC
WU true angle range is : 0.070042

I'll drop a line to Eric this afternoon.
ID: 1769220 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1769261 - Posted: 3 Mar 2016, 16:53:59 UTC - in response to Message 1769220.  

Eric says they deployed the new splitter to Beta first just in case, but they'll deploy it here 'soon'.
ID: 1769261 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1769363 - Posted: 4 Mar 2016, 0:11:47 UTC - in response to Message 1769261.  

Eric says they deployed the new splitter to Beta first just in case, but they'll deploy it here 'soon'.

LOL, sorry about that... %)
Aloha, Uli

ID: 1769363 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1769549 - Posted: 4 Mar 2016, 20:06:52 UTC - in response to Message 1769261.  

Eric says they deployed the new splitter to Beta first just in case, but they'll deploy it here 'soon'.

This new splitter you speak of, does it have strange properties?
I'm just looking over the tasks at Beta and pondering what could be causing all those overflows.
https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=77980&offset=40
ID: 1769549 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1769552 - Posted: 4 Mar 2016, 20:14:44 UTC - in response to Message 1769549.  

Dunno, you'd better ask Eric that.
ID: 1769552 · Report as offensive

Message boards : Number crunching : VLARs now also sent to CUDA app?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.