Crunching time with respect to Angle Range

Author	Message
W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19048 Credit: 40,757,560 RAC: 67	Message 1998628 - Posted: 18 Jun 2019, 1:05:26 UTC There are some observations of the variation of crunch times of the blc** tasks, it is probably an effect of the AR (Angle Range), and the work of our dear departed friend Joe Segur might help to explain the variations. Estimates and Deadlines revisited - 19 Dec 2007 ID: 1998628 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1998647 - Posted: 18 Jun 2019, 6:38:25 UTC - in response to Message 1998628. It would be interesting to look at the ARs, but I suspect the BLC variation will turn out to be a more subtle second-order effect. We did that runtime research - over 11 years ago! - before there were any GPU applications. AR was certainly the most important determinant for runtime in those far-off CPU-only days: I wonder whether we could find (or if anybody could find us) an equivalent of the high-performance CPU hosts we used for that study. I suspect that we might find that 'runtime variation by BLC number' would be much reduced or even eliminated when CPU records are examined. Instead, I think it might be some other factor in the search specification which dominates for GPUs. But I have no idea what that might be. ID: 1998647 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19048 Credit: 40,757,560 RAC: 67	Message 1998652 - Posted: 18 Jun 2019, 7:24:20 UTC - in response to Message 1998647. It would be interesting to look at the ARs, but I suspect the BLC variation will turn out to be a more subtle second-order effect. We did that runtime research - over 11 years ago! - before there were any GPU applications. AR was certainly the most important determinant for runtime in those far-off CPU-only days: I wonder whether we could find (or if anybody could find us) an equivalent of the high-performance CPU hosts we used for that study. I suspect that we might find that 'runtime variation by BLC number' would be much reduced or even eliminated when CPU records are examined. Instead, I think it might be some other factor in the search specification which dominates for GPUs. But I have no idea what that might be. The AR's for blc41 look to be mid range at 0.047687 rather than the VLAR that they are labeled as. ID: 1998652 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1998653 - Posted: 18 Jun 2019, 7:26:36 UTC OK, here's a first very rough look. I have an old laptop (8 years old now) which does have a GPU, but a weak and power-hungry one by modern standards. So I'm running it as a CPU only machine now. In the last year, it's done about 5,000 BLC jobs: I've plotted runtime vs BLC number. So there is a trend, but I'm estimating no more than 50% increase from BLC01 to BLC36. GPUs will have to wait - I've got to go out this morning. ID: 1998653 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1998656 - Posted: 18 Jun 2019, 7:40:20 UTC - in response to Message 1998652. Last modified: 18 Jun 2019, 7:44:43 UTC The AR's for blc41 look to be mid range at 0.047687 rather than the VLAR that they are labeled as. No, you've slipped a decimal point. Mid AR would be 0.47 - ten times higher. VLAR used to be formally defined as 0.05 - the beam width of the Arecibo ALFA antenna. For GPU processing purposes, it was bumped up to around 0.12 or 0.13, which was the empirical point where the display lag on the early NVidia apps got intolerable. The geometry of the Green Bank telescope is different, but I've no idea (a) what the BLC beam width is - we could probably extract that from the data header. (b) whether the VLAR cutoff has been adjusted in the GBT splitters - we probably can't. Edit: <receiver_cfg> <s4_id>19</s4_id> <name>Green Bank Telescope, Rcvr8_10, Pol 0</name> <beam_width>0.025646461712448</beam_width> ID: 1998656 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1998657 - Posted: 18 Jun 2019, 7:55:12 UTC - in response to Message 1998656. Last modified: 18 Jun 2019, 8:15:16 UTC I just looked at all of my BLC41s on this one Host and it appears the run-times are all over. Some are even not labeled as VLAR, whiles others named blc41_2bit_guppi_58543_62163_PSR...vlar finish in what would be considered 'normal' times. I only had one CPU task and it is running quickly, probably be finished in about 10 minutes. This assortment of 41s is somewhat confusing, https://setiathome.berkeley.edu/results.php?hostid=6906726&offset=4100&show_names=1&state=0&appid= ID: 1998657 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1998662 - Posted: 18 Jun 2019, 8:20:21 UTC - in response to Message 1998647. Last modified: 18 Jun 2019, 8:29:20 UTC Instead, I think it might be some other factor in the search specification which dominates for GPUs. But I have no idea what that might be. Number of "just under threshold" signals. In Pulses and Triplets (at least) they will cause either serialization (perhaps for CUDA special) or re-calculation by CPU(for SoG). EDIT: to prove or disprove this ones running OpenCL builds could look into statistics part of stderr: class Gaussian_transfer_not_needed: total=0, N=0, <>=0, min=0 max=0 class Gaussian_transfer_needed: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip1_no_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip2_bad_group_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip3_too_weak_peak: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip4_too_big_ChiSq: total=0, N=0, <>=0, min=0 max=0 class Gaussian_skip6_low_power: total=0, N=0, <>=0, min=0 max=0 class Gaussian_new_best: total=0, N=0, <>=0, min=0 max=0 class Gaussian_report: total=0, N=0, <>=0, min=0 max=0 class Gaussian_miss: total=0, N=0, <>=0, min=0 max=0 class PC_triplet_find_hit: total=41744, N=41744, <>=1, min=1 max=1 class PC_triplet_find_miss: total=832, N=832, <>=1, min=1 max=1 class PC_pulse_find_hit: total=30209, N=30209, <>=1, min=1 max=1 class PC_pulse_find_miss: total=12, N=12, <>=1, min=1 max=1 class PC_pulse_find_early_miss: total=7, N=7, <>=1, min=1 max=1 class PC_pulse_find_2CPU: total=0, N=0, <>=0, min=0 max=0 class PoT_transfer_not_needed: total=41737, N=41737, <>=1, min=1 max=1 class PoT_transfer_needed: total=840, N=840, <>=1, min=1 max=1 Names mostly self-explanatory ( I hope) For example: class PC_triplet_find_miss: total=832, N=832, <>=1, min=1 max=1 So, 832 times triplets were relooked by CPU. Obviously task can't have so many Triplets in result so they were signal candidates,not real Triplets. Nevertheless their number slowed down progress of that particular task. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1998662 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1998664 - Posted: 18 Jun 2019, 8:33:42 UTC Well, the first one didn't take long, so I had time to re-run it for a GPU. This is a 1050Ti running SoG under Windows - about 23,000 tasks this time, in slightly less than a year (I must have run the CPU apps during last year's WOW! challenge) Well, I didn't expect that - a very similar 50% trend in runtime, through with greater variability. Back to the thinking board. ID: 1998664 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1998667 - Posted: 18 Jun 2019, 9:03:23 UTC Last modified: 18 Jun 2019, 9:09:57 UTC Longest WU runtime I've had on this CPU (that I've noticed) has been 1hr 35min- an Arecibo VLAR. Longest BLC task 1hr 32min, most finish in around 1hr. I've got 2 BLC41's being processed on my CPU at the moment. One is at 39.9% after 1hr 25min. The other is at 21.2% after 49min. Estimated remaining times are in flux, but run times are looking at being roughly 3 times the usual run time. Generally BLC tasks finish faster on the CPU than similar Arecibo WUs, although they take longer on the GPU (Running SoG). Grant Darwin NT ID: 1998667 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1998668 - Posted: 18 Jun 2019, 9:20:11 UTC - in response to Message 1998667. Generally BLC tasks finish faster on the CPU than similar Arecibo WUs, although they take longer on the GPU (Running SoG). Yes, that was my observation too, which led to my assumption that the BLC tasks weren't 'more' work than Arecibo, but 'harder' (for GPU) work. But in that scenario, why did my CPU graph show such a trendline? It's running the x64 SSE3 VS2008 opti app (too old for AVX) - unlike that old work with Joe, where we concentrated on stock apps. ID: 1998668 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1998669 - Posted: 18 Jun 2019, 9:45:31 UTC - in response to Message 1998668. Last modified: 18 Jun 2019, 9:58:03 UTC But in that scenario, why did my CPU graph show such a trendline? It's running the x64 SSE3 VS2008 opti app (too old for AVX) - unlike that old work with Joe, where we concentrated on stock apps. So the difference in processing "effort" (for want of a better term) between different AR BLC tasks on the GPU is strongly similar to the difference in "effort" between different AR BLC tasks on the CPU makes sense to me as it means that the difference in processing "effort" between the CPU & GPU applications for processing GBT & Arecibo work remains relatively constant. ie Using Arecibo WUs as our reference, the CPU application doesn't take much more time (if any) to process a similar AR GBT WU (actually, often slightly faster). Where as the GPU applications do have a significantly longer runtime compared to a similar AR Arecibo WU, and that difference remains fairly constant across angle ranges. If that makes any sense at all. Grant Darwin NT ID: 1998669 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.