Crunching time with respect to Angle Range

Message boards : Number crunching : Crunching time with respect to Angle Range
Message board moderation

To post messages, you must log in.

AuthorMessage
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19048
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1998628 - Posted: 18 Jun 2019, 1:05:26 UTC

There are some observations of the variation of crunch times of the blc** tasks, it is probably an effect of the AR (Angle Range), and the work of our dear departed friend Joe Segur might help to explain the variations.

Estimates and Deadlines revisited - 19 Dec 2007
ID: 1998628 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1998647 - Posted: 18 Jun 2019, 6:38:25 UTC - in response to Message 1998628.  

It would be interesting to look at the ARs, but I suspect the BLC variation will turn out to be a more subtle second-order effect.

We did that runtime research - over 11 years ago! - before there were any GPU applications. AR was certainly the most important determinant for runtime in those far-off CPU-only days: I wonder whether we could find (or if anybody could find us) an equivalent of the high-performance CPU hosts we used for that study. I suspect that we might find that 'runtime variation by BLC number' would be much reduced or even eliminated when CPU records are examined.

Instead, I think it might be some other factor in the search specification which dominates for GPUs. But I have no idea what that might be.
ID: 1998647 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19048
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1998652 - Posted: 18 Jun 2019, 7:24:20 UTC - in response to Message 1998647.  

It would be interesting to look at the ARs, but I suspect the BLC variation will turn out to be a more subtle second-order effect.

We did that runtime research - over 11 years ago! - before there were any GPU applications. AR was certainly the most important determinant for runtime in those far-off CPU-only days: I wonder whether we could find (or if anybody could find us) an equivalent of the high-performance CPU hosts we used for that study. I suspect that we might find that 'runtime variation by BLC number' would be much reduced or even eliminated when CPU records are examined.

Instead, I think it might be some other factor in the search specification which dominates for GPUs. But I have no idea what that might be.

The AR's for blc41 look to be mid range at 0.047687 rather than the VLAR that they are labeled as.
ID: 1998652 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1998653 - Posted: 18 Jun 2019, 7:26:36 UTC

OK, here's a first very rough look. I have an old laptop (8 years old now) which does have a GPU, but a weak and power-hungry one by modern standards. So I'm running it as a CPU only machine now. In the last year, it's done about 5,000 BLC jobs: I've plotted runtime vs BLC number.


So there is a trend, but I'm estimating no more than 50% increase from BLC01 to BLC36. GPUs will have to wait - I've got to go out this morning.
ID: 1998653 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1998656 - Posted: 18 Jun 2019, 7:40:20 UTC - in response to Message 1998652.  
Last modified: 18 Jun 2019, 7:44:43 UTC

The AR's for blc41 look to be mid range at 0.047687 rather than the VLAR that they are labeled as.
No, you've slipped a decimal point.

Mid AR would be 0.47 - ten times higher.

VLAR used to be formally defined as 0.05 - the beam width of the Arecibo ALFA antenna. For GPU processing purposes, it was bumped up to around 0.12 or 0.13, which was the empirical point where the display lag on the early NVidia apps got intolerable.

The geometry of the Green Bank telescope is different, but I've no idea

(a) what the BLC beam width is - we could probably extract that from the data header.
(b) whether the VLAR cutoff has been adjusted in the GBT splitters - we probably can't.

Edit:
  <receiver_cfg>
    <s4_id>19</s4_id>
    <name>Green Bank Telescope, Rcvr8_10, Pol 0</name>
    <beam_width>0.025646461712448</beam_width>
ID: 1998656 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1998657 - Posted: 18 Jun 2019, 7:55:12 UTC - in response to Message 1998656.  
Last modified: 18 Jun 2019, 8:15:16 UTC

I just looked at all of my BLC41s on this one Host and it appears the run-times are all over. Some are even not labeled as VLAR, whiles others named blc41_2bit_guppi_58543_62163_PSR...vlar finish in what would be considered 'normal' times.
I only had one CPU task and it is running quickly, probably be finished in about 10 minutes.
This assortment of 41s is somewhat confusing, https://setiathome.berkeley.edu/results.php?hostid=6906726&offset=4100&show_names=1&state=0&appid=
ID: 1998657 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1998662 - Posted: 18 Jun 2019, 8:20:21 UTC - in response to Message 1998647.  
Last modified: 18 Jun 2019, 8:29:20 UTC


Instead, I think it might be some other factor in the search specification which dominates for GPUs. But I have no idea what that might be.

Number of "just under threshold" signals.
In Pulses and Triplets (at least) they will cause either serialization (perhaps for CUDA special) or re-calculation by CPU(for SoG).

EDIT: to prove or disprove this ones running OpenCL builds could look into statistics part of stderr:


class Gaussian_transfer_not_needed: total=0, N=0, <>=0, min=0 max=0
class Gaussian_transfer_needed: total=0, N=0, <>=0, min=0 max=0


class Gaussian_skip1_no_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip2_bad_group_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip3_too_weak_peak: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip4_too_big_ChiSq: total=0, N=0, <>=0, min=0 max=0
class Gaussian_skip6_low_power: total=0, N=0, <>=0, min=0 max=0


class Gaussian_new_best: total=0, N=0, <>=0, min=0 max=0
class Gaussian_report: total=0, N=0, <>=0, min=0 max=0
class Gaussian_miss: total=0, N=0, <>=0, min=0 max=0


class PC_triplet_find_hit: total=41744, N=41744, <>=1, min=1 max=1
class PC_triplet_find_miss: total=832, N=832, <>=1, min=1 max=1


class PC_pulse_find_hit: total=30209, N=30209, <>=1, min=1 max=1
class PC_pulse_find_miss: total=12, N=12, <>=1, min=1 max=1
class PC_pulse_find_early_miss: total=7, N=7, <>=1, min=1 max=1
class PC_pulse_find_2CPU: total=0, N=0, <>=0, min=0 max=0


class PoT_transfer_not_needed: total=41737, N=41737, <>=1, min=1 max=1
class PoT_transfer_needed: total=840, N=840, <>=1, min=1 max=1

Names mostly self-explanatory ( I hope)

For example: class PC_triplet_find_miss: total=832, N=832, <>=1, min=1 max=1
So, 832 times triplets were relooked by CPU. Obviously task can't have so many Triplets in result so they were signal candidates,not real Triplets.
Nevertheless their number slowed down progress of that particular task.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1998662 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1998664 - Posted: 18 Jun 2019, 8:33:42 UTC

Well, the first one didn't take long, so I had time to re-run it for a GPU. This is a 1050Ti running SoG under Windows - about 23,000 tasks this time, in slightly less than a year (I must have run the CPU apps during last year's WOW! challenge)


Well, I didn't expect that - a very similar 50% trend in runtime, through with greater variability. Back to the thinking board.
ID: 1998664 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1998667 - Posted: 18 Jun 2019, 9:03:23 UTC
Last modified: 18 Jun 2019, 9:09:57 UTC

Longest WU runtime I've had on this CPU (that I've noticed) has been 1hr 35min- an Arecibo VLAR. Longest BLC task 1hr 32min, most finish in around 1hr.

I've got 2 BLC41's being processed on my CPU at the moment.
One is at 39.9% after 1hr 25min.
The other is at 21.2% after 49min.
Estimated remaining times are in flux, but run times are looking at being roughly 3 times the usual run time.

Generally BLC tasks finish faster on the CPU than similar Arecibo WUs, although they take longer on the GPU (Running SoG).
Grant
Darwin NT
ID: 1998667 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1998668 - Posted: 18 Jun 2019, 9:20:11 UTC - in response to Message 1998667.  

Generally BLC tasks finish faster on the CPU than similar Arecibo WUs, although they take longer on the GPU (Running SoG).
Yes, that was my observation too, which led to my assumption that the BLC tasks weren't 'more' work than Arecibo, but 'harder' (for GPU) work.

But in that scenario, why did my CPU graph show such a trendline? It's running the x64 SSE3 VS2008 opti app (too old for AVX) - unlike that old work with Joe, where we concentrated on stock apps.
ID: 1998668 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1998669 - Posted: 18 Jun 2019, 9:45:31 UTC - in response to Message 1998668.  
Last modified: 18 Jun 2019, 9:58:03 UTC

But in that scenario, why did my CPU graph show such a trendline? It's running the x64 SSE3 VS2008 opti app (too old for AVX) - unlike that old work with Joe, where we concentrated on stock apps.

So the difference in processing "effort" (for want of a better term) between different AR BLC tasks on the GPU is strongly similar to the difference in "effort" between different AR BLC tasks on the CPU makes sense to me as it means that the difference in processing "effort" between the CPU & GPU applications for processing GBT & Arecibo work remains relatively constant.

ie
Using Arecibo WUs as our reference, the CPU application doesn't take much more time (if any) to process a similar AR GBT WU (actually, often slightly faster). Where as the GPU applications do have a significantly longer runtime compared to a similar AR Arecibo WU, and that difference remains fairly constant across angle ranges.
If that makes any sense at all.
Grant
Darwin NT
ID: 1998669 · Report as offensive

Message boards : Number crunching : Crunching time with respect to Angle Range


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.