Gflop estimates?

Message boards : Number crunching : Gflop estimates?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1830096 - Posted: 12 Nov 2016, 17:50:19 UTC

Hi,

It's been a while since I looked into the Boincmgr task properties.

Today I did. A task that took approximately 50 seconds was labeled as having over 15 000 something. Another task that took just over three minutes (300+) seconds was labeled having 5600 something. The something is Gflops.

I'd like to think that there is still some optimizing to be done.

How do You think?
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1830096 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1830113 - Posted: 12 Nov 2016, 18:53:27 UTC - in response to Message 1830096.  
Last modified: 12 Nov 2016, 19:01:02 UTC

As you know, folks are working on optimizing the GPU performance ...

But it is interesting looking at your stats for example Task 5284613511 and your wing man's Task 5284613512. work unit 3216223047762.

you are reporting a flopcount of 32,103,790,054,745.789062 your wing man is reporting a flopcount of 3,216,223,047,762.2627 10% of your number.

It seems "someone" is counting flops wrong (or I can't copy/past very well).

Any Ideas out there in smart people land?

Ed F

p.s. it looks like "SETI@home v8 v8.19 (opencl_nvidia_SoG)
windows_intelx86" undercounts by 90% ???
ID: 1830113 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1830133 - Posted: 12 Nov 2016, 21:07:02 UTC - in response to Message 1830113.  

As you know, folks are working on optimizing the GPU performance ...

But it is interesting looking at your stats for example Task 5284613511 and your wing man's Task 5284613512. work unit 3216223047762.

you are reporting a flopcount of 32,103,790,054,745.789062 your wing man is reporting a flopcount of 3,216,223,047,762.2627 10% of your number.

It seems "someone" is counting flops wrong (or I can't copy/past very well).

Any Ideas out there in smart people land?

Ed F

p.s. it looks like "SETI@home v8 v8.19 (opencl_nvidia_SoG)
windows_intelx86" undercounts by 90% ???


I have been cutting the lines of code that do nothing, but I've been very careful not to cut the lines that are affecting the 'estimated' FPU ops in the code. I've optimized the autocorr path to do a quarter of the original flops and memory access (that is not being counted anywhere) but I still reflect the original number.

My observation was about the hardness of a task before computing.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1830133 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1830135 - Posted: 12 Nov 2016, 21:18:39 UTC - in response to Message 1830133.  

Yes, your numbers correctly reflect "SETI@home v8 v8.00
windows_intelx86" (99.95% ). I would like to hear from Raistmer about the apparent divergence from the standard release, if he knows. (not that it makes any real difference).

Now back to your question ...

I love what you folks are doing!!

Ed F
ID: 1830135 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1830138 - Posted: 12 Nov 2016, 21:23:50 UTC - in response to Message 1830135.  

Yes, your numbers correctly reflect "SETI@home v8 v8.00
windows_intelx86" (99.95% ). I would like to hear from Raistmer about the apparent divergence from the standard release, if he knows. (not that it makes any real difference).

Now back to your question ...

I love what you folks are doing!!

Ed F


Thank You. I know Raistmer knows.

It is just a glitch in a code path that reports some human readable digits and the real science is not affected.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1830138 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1830155 - Posted: 12 Nov 2016, 23:04:58 UTC - in response to Message 1830096.  
Last modified: 12 Nov 2016, 23:27:36 UTC

Hi,

It's been a while since I looked into the Boincmgr task properties.

Today I did. A task that took approximately 50 seconds was labeled as having over 15 000 something. Another task that took just over three minutes (300+) seconds was labeled having 5600 something. The something is Gflops.

I'd like to think that there is still some optimizing to be done.

How do You think?


Yes, best case generic/baseline, CPU or GPU, code tends to ~5% compute efficiency with little latency hiding (except some already in libraries). multi instance GPU reaches in the region of 10-15% in good cases, and your hand GPU ~20-25%. With refined implementations along the same path, up to ~50% should be feasible (though increasingly more work for diminishing returns and hardware specificity).

Alternative methods of lower complexity than the Fourier analysis being used exist, that could easily drive a reduced/sparse form of the traditional Fourier analyses to draw out the same numbers. These include wavelet/chirplet multiresolution analysis, and ai/deep-learning feature recognition approaches. Depending on the combination of techniques/algorithms used, that could reduce the problem from order number_of_chirps*nffts*ndatapoints^2*log(ndatapoints), down to O(nlogn), though memory practical algorithms will likely tile/block to something more like O(k*sqrtn*sqrtn*sqrtn*Log(sqrtn))

So yeah, IMO there is still a lot of room for optimisation at hardware and algorithmic levels, though the infrastructure to support the more futuristic end isn't there just yet. The code (CPU and GPU in general) modularity and quality needs to ramp a few notches before some of the possible paths become feasible.

[Edit:] Analogy, a lot is possible with a Model-T chassis, but a lot more is possible with a f22-Raptor frame, despite both being more or less a similar empty vessel.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1830155 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1830815 - Posted: 16 Nov 2016, 11:03:00 UTC - in response to Message 1830135.  

Yes, your numbers correctly reflect "SETI@home v8 v8.00
windows_intelx86" (99.95% ). I would like to hear from Raistmer about the apparent divergence from the standard release, if he knows. (not that it makes any real difference).

Cause that number seems doesn't allow to get correct credits anyway I didn't follow its behavior for recent releases.
Any more observations (low-AR, high-AR) about % of missing FLOPs? Same % or quite different? (This would allow to say where some counting missed).
Perhaps it was deliberately disabled to allow more linear percentage progress over task duration (though it's far from linear anyway).
For VHAR SoG app enqueuing almost all possible work to GPU very fast and then just sitting awaiting GPU. That's those 99% done and long wait after come from.
One of attempts was to simplify progress reporting just as fraction of icfft done (but actually it will hit same issue). Perhaps on that attempt I disabled some of FLOPs counting.
Also, FLOP counting as it is quite costly operation (it contains log() for example). If FLOPs counting needs to be restored I would look how to simplify it in first place.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1830815 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1830819 - Posted: 16 Nov 2016, 11:23:54 UTC - in response to Message 1830815.  

Yes, your numbers correctly reflect "SETI@home v8 v8.00
windows_intelx86" (99.95% ). I would like to hear from Raistmer about the apparent divergence from the standard release, if he knows. (not that it makes any real difference).

Cause that number seems doesn't allow to get correct credits anyway I didn't follow its behavior for recent releases.
Any more observations (low-AR, high-AR) about % of missing FLOPs? Same % or quite different? (This would allow to say where some counting missed).
Perhaps it was deliberately disabled to allow more linear percentage progress over task duration (though it's far from linear anyway).
For VHAR SoG app enqueuing almost all possible work to GPU very fast and then just sitting awaiting GPU. That's those 99% done and long wait after come from.
One of attempts was to simplify progress reporting just as fraction of icfft done (but actually it will hit same issue). Perhaps on that attempt I disabled some of FLOPs counting.
Also, FLOP counting as it is quite costly operation (it contains log() for example). If FLOPs counting needs to be restored I would look how to simplify it in first place.

I suspect you're right that flop counting and progress reporting are closely related, so it may be worth reminding you that we currently seem to have two different progress meters:

From state_sah (checkpoint file)
<prog>0.24597392</prog>

From boinc_task_state.xml (reported via manager)
fraction_done>0.581406</fraction_done>

Same task, as close as possible to the same time.
ID: 1830819 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1830840 - Posted: 16 Nov 2016, 16:17:10 UTC - in response to Message 1830815.  

I hope this isn't making life miserable for you folks!!

My only purpose here was to point put that since we have at least 3 competing MB standards they all ought to report nearly the same "stats".

Since AP does not report flopcount, it seems someone considered it an unnecessary overhead.

I like seeing the count so I can compare my/other hardware and know mine is running on par and not ill configured.

'nuf said, I guess, sorry for the distraction.

Ed F
ID: 1830840 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1830845 - Posted: 16 Nov 2016, 16:40:26 UTC - in response to Message 1830819.  


Same task, as close as possible to the same time.

VHAR or VLAR?
Bettter to re-check if such strong discrepance occurs on VLAR/BLC VLAR too.
For such tasks pulse search used on almost every icfft so possible weirdness from SoG-specific work enqueuing is minimal. Omission of FLOP counting could present in such tasks too still.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1830845 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1830851 - Posted: 16 Nov 2016, 17:26:53 UTC - in response to Message 1830845.  
Last modified: 16 Nov 2016, 17:48:59 UTC

In my experience, it's universal, but I'll check a few cases. First, a BLC VLAR (blc3_2bit_guppi_57424_40882_HIP63608_OFF_0012.16764.831.17.26.83.vlar):

<prog>0.37921291</prog>
<fraction_done>0.723879</fraction_done>

I'll keep an eye on other types as they roll through.

Next up VHAR: 07au09aa.27444.1759971.16.43.29:

<prog>0.08652382</prog>
<fraction_done>0.700190</fraction_done>

And to complete the set, mid-AR: 07au09aa.27444.1760789.16.43.39

<prog>0.22397651</prog>
<fraction_done>0.699606</fraction_done>

Methodology - open Windows explorer window to show the slot directory (all this is being done during live running, single task on GTX 750Ti). Let it run long enough to be interesting, select both files, right-click to open both files in Notepad++, separate tabs at the same time. Copy and paste data without allowing NP++ to refresh files if they change in the meantime.
ID: 1830851 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1830853 - Posted: 16 Nov 2016, 17:44:12 UTC - in response to Message 1830851.  

Much stronger of VHAR it seems.
But ~2x on VLAR too. So, most probably it's combination of 2 different causes.
Stronger (VHAR) case unfixable w/o performance drop. But why x2 difference happens on VLAR - worth to check.

do you have any idea from what app-reported data both values calculated?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1830853 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1830857 - Posted: 16 Nov 2016, 17:55:41 UTC - in response to Message 1830853.  

do you have any idea from what app-reported data both values calculated?

No, I don't, but I think I have a machine downstairs with both your source code and the BOINC API code loaded, so I'll try walking it.

<fraction_done> must be done by/during a BOINC API call.

<prog> - these days - is a SETI-only value, generated during checkpointing.

In both cases, I presume we'll be looking at the parameters passed, or static variables referenced, in a standard function call.
ID: 1830857 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1830862 - Posted: 16 Nov 2016, 18:12:10 UTC - in response to Message 1830857.  
Last modified: 16 Nov 2016, 18:18:05 UTC

one of those...

// fraction_done sets the boinc_fraction_done() with a smooth transition 
// from using "progress" to "remaining" to determine the fraction done.
// The speed of the transition is determine by PROG_POWER.  6 means that
// "remaining" is the dominant term for about the final 1/6th of the run.
#define PROG_POWER 6
void fraction_done(double progress,double remaining) {
  double prog2=1.0-remaining;
//double weight = pow(prog2, PROG_POWER);
  double prog2_2 = prog2*prog2;
  double prog2_4 = prog2_2*prog2_2;
  double weight = prog2_2 * prog2_4; // using pow to compute integer power is wasteful!
  progress=std::min(progress,1.0);
  double prog=progress*(1.0-weight)+prog2*weight;
  boinc_fraction_done(prog);
}


And if MSVC code highlighting correct the only place SoG calls it is at the main loop end with 1,0 params %)

So no idea from where BOINC takes those values.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1830862 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1830863 - Posted: 16 Nov 2016, 18:30:00 UTC - in response to Message 1830862.  

OK, seems checkpoint reports real progress as it's counted.
And the same value reported in"restarted @ %" stderr msg.

What BOINc reports as progress - no idea, perhaps some fake value.

Through app progress calculated in very complex way separately for each signal type.
Not sure GPU build follows these calculations or needs them.
Now the question is how progress variable linked with FLOPs.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1830863 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1830864 - Posted: 16 Nov 2016, 18:31:53 UTC - in response to Message 1830862.  

I haven't walked the code yet, but I've mocked up a spreadsheet to reproduce those calculations.

Assuming for the sake of argument that progress is valid input...

If Remaining is zero, boinc_fraction_done is 1.
If Remaining is (1-progress), boinc_fraction_done == progress
If Remaining is small, boinc_fraction_done is similar to my observations.
ID: 1830864 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1830868 - Posted: 16 Nov 2016, 18:40:31 UTC - in response to Message 1830863.  
Last modified: 16 Nov 2016, 18:55:58 UTC

Well, no, FLOPs and progress not connected.
FLOPs calculated separately and in complex way also.
That's just for spikes:
state.FLOP_counter += 5 * (double) fftlen * log((double) fftlen) / log(2.0);

Nice overhead :P indeed.

Perhaps there is a way to compute such values for whole task with less overhead than for each and every separate iteration...

But where FLOPs are used currently (besides nice line in stderr output) ?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1830868 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1830871 - Posted: 16 Nov 2016, 18:45:47 UTC - in response to Message 1830864.  
Last modified: 16 Nov 2016, 18:50:46 UTC

regarding progress I found own hack already to feed BOINc with simpler but more correct progress:

#if SIGNALS_ON_GPU
boinc_fraction_done((double)icfft/num_cfft);
#else

This reflect real (but non-linear) progress in much simplier way than original code (actually, non-linear too).

now I just need to drop any progress calculations in old way and make same value calculated for progress variable similarly. Then checkpoint and BOINc will:
1) agree
2) show ~100% at the end of task.
As I understand now checkpoint reports smth far less than 100% in @the end, right?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1830871 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1830872 - Posted: 16 Nov 2016, 18:46:00 UTC - in response to Message 1830868.  

fwiw

I looked at the last 100 verified WU's and here is the summary of WU not "wing maned" by another SOG

angle range SOG off by X
----------- ------------
0.413655      5.25  SETI@home v8 v8.00 windows_intelx86
0.413655      5.25  SETI@home v8 v8.00 windows_intelx86
0.413655      5.25  SETI@home v8 v8.00 windows_intelx86

0.727836      6.26  SETI@home v8 v8.00 windows_intelx86
0.727836      3.91  SETI@home v8 v8.00 (opencl_nvidia_mac)x86_64-apple-darwin
0.727836      6.26  SETI@home v8 v8.00 windows_intelx86
0.727836      6.26  SETI@home v8 v8.00 windows_intelx86
0.727836      3.83  SETI@home v8 v8.00 (opencl_ati5_mac)x86_64-apple-darwin

0.725423      6.25  SETI@home v8 v8.00 windows_intelx86

0.447556      5.43  SETI@home v8 v8.00 windows_intelx86
0.447556      5.43  SETI@home v8 v8.00 windows_intelx86
0.447556      5.43  SETI@home v8 v8.00 windows_intelx86
0.447556      5.43  SETI@home v8 v8.03 x86_64-apple-darwin
0.447556      5.43  SETI@home v8 v8.03 x86_64-apple-darwin
0.447556      2.74  SETI@home v8 v8.00 (opencl_ati5_mac)x86_64-apple-darwin

0.008275      1.73  SETI@home v8 v8.00 windows_intelx86
0.008275      1.89  SETI@home v8 v8.00 windows_intelx86
0.008275      1.47  SETI@home v8 v8.00 windows_intelx86
0.008275      1.47  SETI@home v8 v8.00 windows_intelx86
0.008275      3.71  SETI@home v8Anonymous platform (ATI GPU)
0.008275      3.44  SETI@home v8 v8.00 (opencl_ati5_mac)x86_64-apple-darwin

0.442300      1.00  SETI@home v8 v8.19 (opencl_ati5_nocal) windows_intelx86

0.442194      5.55  SETI@home v8 v8.00 (cuda50) windows_intelx86

0.480524      5.76  SETI@home v8 v8.00 (cuda42)windows_intelx86

1.145850    297.11 SETI@home v8 v8.00 (cuda50)windows_intelx86 wu#5291893088

151.087829 16235.11 SETI@home v8 v8.00 windows_intelx86 WU#5291876397



Ed F
ID: 1830872 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1830873 - Posted: 16 Nov 2016, 18:49:17 UTC - in response to Message 1830872.  

could you explain table and second column in particular?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1830873 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Gflop estimates?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.