Message boards :
Number crunching :
Gflop estimates?
Message board moderation
Author | Message |
---|---|
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, It's been a while since I looked into the Boincmgr task properties. Today I did. A task that took approximately 50 seconds was labeled as having over 15 000 something. Another task that took just over three minutes (300+) seconds was labeled having 5600 something. The something is Gflops. I'd like to think that there is still some optimizing to be done. How do You think? To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
As you know, folks are working on optimizing the GPU performance ... But it is interesting looking at your stats for example Task 5284613511 and your wing man's Task 5284613512. work unit 3216223047762. you are reporting a flopcount of 32,103,790,054,745.789062 your wing man is reporting a flopcount of 3,216,223,047,762.2627 10% of your number. It seems "someone" is counting flops wrong (or I can't copy/past very well). Any Ideas out there in smart people land? Ed F p.s. it looks like "SETI@home v8 v8.19 (opencl_nvidia_SoG) windows_intelx86" undercounts by 90% ??? |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
As you know, folks are working on optimizing the GPU performance ... I have been cutting the lines of code that do nothing, but I've been very careful not to cut the lines that are affecting the 'estimated' FPU ops in the code. I've optimized the autocorr path to do a quarter of the original flops and memory access (that is not being counted anywhere) but I still reflect the original number. My observation was about the hardness of a task before computing. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
Yes, your numbers correctly reflect "SETI@home v8 v8.00 windows_intelx86" (99.95% ). I would like to hear from Raistmer about the apparent divergence from the standard release, if he knows. (not that it makes any real difference). Now back to your question ... I love what you folks are doing!! Ed F |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Yes, your numbers correctly reflect "SETI@home v8 v8.00 Thank You. I know Raistmer knows. It is just a glitch in a code path that reports some human readable digits and the real science is not affected. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Hi, Yes, best case generic/baseline, CPU or GPU, code tends to ~5% compute efficiency with little latency hiding (except some already in libraries). multi instance GPU reaches in the region of 10-15% in good cases, and your hand GPU ~20-25%. With refined implementations along the same path, up to ~50% should be feasible (though increasingly more work for diminishing returns and hardware specificity). Alternative methods of lower complexity than the Fourier analysis being used exist, that could easily drive a reduced/sparse form of the traditional Fourier analyses to draw out the same numbers. These include wavelet/chirplet multiresolution analysis, and ai/deep-learning feature recognition approaches. Depending on the combination of techniques/algorithms used, that could reduce the problem from order number_of_chirps*nffts*ndatapoints^2*log(ndatapoints), down to O(nlogn), though memory practical algorithms will likely tile/block to something more like O(k*sqrtn*sqrtn*sqrtn*Log(sqrtn)) So yeah, IMO there is still a lot of room for optimisation at hardware and algorithmic levels, though the infrastructure to support the more futuristic end isn't there just yet. The code (CPU and GPU in general) modularity and quality needs to ramp a few notches before some of the possible paths become feasible. [Edit:] Analogy, a lot is possible with a Model-T chassis, but a lot more is possible with a f22-Raptor frame, despite both being more or less a similar empty vessel. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Yes, your numbers correctly reflect "SETI@home v8 v8.00 Cause that number seems doesn't allow to get correct credits anyway I didn't follow its behavior for recent releases. Any more observations (low-AR, high-AR) about % of missing FLOPs? Same % or quite different? (This would allow to say where some counting missed). Perhaps it was deliberately disabled to allow more linear percentage progress over task duration (though it's far from linear anyway). For VHAR SoG app enqueuing almost all possible work to GPU very fast and then just sitting awaiting GPU. That's those 99% done and long wait after come from. One of attempts was to simplify progress reporting just as fraction of icfft done (but actually it will hit same issue). Perhaps on that attempt I disabled some of FLOPs counting. Also, FLOP counting as it is quite costly operation (it contains log() for example). If FLOPs counting needs to be restored I would look how to simplify it in first place. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Yes, your numbers correctly reflect "SETI@home v8 v8.00 I suspect you're right that flop counting and progress reporting are closely related, so it may be worth reminding you that we currently seem to have two different progress meters: From state_sah (checkpoint file) <prog>0.24597392</prog> From boinc_task_state.xml (reported via manager) fraction_done>0.581406</fraction_done> Same task, as close as possible to the same time. |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
I hope this isn't making life miserable for you folks!! My only purpose here was to point put that since we have at least 3 competing MB standards they all ought to report nearly the same "stats". Since AP does not report flopcount, it seems someone considered it an unnecessary overhead. I like seeing the count so I can compare my/other hardware and know mine is running on par and not ill configured. 'nuf said, I guess, sorry for the distraction. Ed F |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
VHAR or VLAR? Bettter to re-check if such strong discrepance occurs on VLAR/BLC VLAR too. For such tasks pulse search used on almost every icfft so possible weirdness from SoG-specific work enqueuing is minimal. Omission of FLOP counting could present in such tasks too still. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
In my experience, it's universal, but I'll check a few cases. First, a BLC VLAR (blc3_2bit_guppi_57424_40882_HIP63608_OFF_0012.16764.831.17.26.83.vlar): <prog>0.37921291</prog> <fraction_done>0.723879</fraction_done> I'll keep an eye on other types as they roll through. Next up VHAR: 07au09aa.27444.1759971.16.43.29: <prog>0.08652382</prog> <fraction_done>0.700190</fraction_done> And to complete the set, mid-AR: 07au09aa.27444.1760789.16.43.39 <prog>0.22397651</prog> <fraction_done>0.699606</fraction_done> Methodology - open Windows explorer window to show the slot directory (all this is being done during live running, single task on GTX 750Ti). Let it run long enough to be interesting, select both files, right-click to open both files in Notepad++, separate tabs at the same time. Copy and paste data without allowing NP++ to refresh files if they change in the meantime. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Much stronger of VHAR it seems. But ~2x on VLAR too. So, most probably it's combination of 2 different causes. Stronger (VHAR) case unfixable w/o performance drop. But why x2 difference happens on VLAR - worth to check. do you have any idea from what app-reported data both values calculated? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
do you have any idea from what app-reported data both values calculated? No, I don't, but I think I have a machine downstairs with both your source code and the BOINC API code loaded, so I'll try walking it. <fraction_done> must be done by/during a BOINC API call. <prog> - these days - is a SETI-only value, generated during checkpointing. In both cases, I presume we'll be looking at the parameters passed, or static variables referenced, in a standard function call. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
one of those... // fraction_done sets the boinc_fraction_done() with a smooth transition // from using "progress" to "remaining" to determine the fraction done. // The speed of the transition is determine by PROG_POWER. 6 means that // "remaining" is the dominant term for about the final 1/6th of the run. #define PROG_POWER 6 void fraction_done(double progress,double remaining) { double prog2=1.0-remaining; //double weight = pow(prog2, PROG_POWER); double prog2_2 = prog2*prog2; double prog2_4 = prog2_2*prog2_2; double weight = prog2_2 * prog2_4; // using pow to compute integer power is wasteful! progress=std::min(progress,1.0); double prog=progress*(1.0-weight)+prog2*weight; boinc_fraction_done(prog); } And if MSVC code highlighting correct the only place SoG calls it is at the main loop end with 1,0 params %) So no idea from where BOINC takes those values. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
OK, seems checkpoint reports real progress as it's counted. And the same value reported in"restarted @ %" stderr msg. What BOINc reports as progress - no idea, perhaps some fake value. Through app progress calculated in very complex way separately for each signal type. Not sure GPU build follows these calculations or needs them. Now the question is how progress variable linked with FLOPs. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
I haven't walked the code yet, but I've mocked up a spreadsheet to reproduce those calculations. Assuming for the sake of argument that progress is valid input... If Remaining is zero, boinc_fraction_done is 1. If Remaining is (1-progress), boinc_fraction_done == progress If Remaining is small, boinc_fraction_done is similar to my observations. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Well, no, FLOPs and progress not connected. FLOPs calculated separately and in complex way also. That's just for spikes: state.FLOP_counter += 5 * (double) fftlen * log((double) fftlen) / log(2.0); Nice overhead :P indeed. Perhaps there is a way to compute such values for whole task with less overhead than for each and every separate iteration... But where FLOPs are used currently (besides nice line in stderr output) ? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
regarding progress I found own hack already to feed BOINc with simpler but more correct progress: #if SIGNALS_ON_GPU boinc_fraction_done((double)icfft/num_cfft); #else This reflect real (but non-linear) progress in much simplier way than original code (actually, non-linear too). now I just need to drop any progress calculations in old way and make same value calculated for progress variable similarly. Then checkpoint and BOINc will: 1) agree 2) show ~100% at the end of task. As I understand now checkpoint reports smth far less than 100% in @the end, right? SETI apps news We're not gonna fight them. We're gonna transcend them. |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
fwiw I looked at the last 100 verified WU's and here is the summary of WU not "wing maned" by another SOG angle range SOG off by X ----------- ------------ 0.413655 5.25 SETI@home v8 v8.00 windows_intelx86 0.413655 5.25 SETI@home v8 v8.00 windows_intelx86 0.413655 5.25 SETI@home v8 v8.00 windows_intelx86 0.727836 6.26 SETI@home v8 v8.00 windows_intelx86 0.727836 3.91 SETI@home v8 v8.00 (opencl_nvidia_mac)x86_64-apple-darwin 0.727836 6.26 SETI@home v8 v8.00 windows_intelx86 0.727836 6.26 SETI@home v8 v8.00 windows_intelx86 0.727836 3.83 SETI@home v8 v8.00 (opencl_ati5_mac)x86_64-apple-darwin 0.725423 6.25 SETI@home v8 v8.00 windows_intelx86 0.447556 5.43 SETI@home v8 v8.00 windows_intelx86 0.447556 5.43 SETI@home v8 v8.00 windows_intelx86 0.447556 5.43 SETI@home v8 v8.00 windows_intelx86 0.447556 5.43 SETI@home v8 v8.03 x86_64-apple-darwin 0.447556 5.43 SETI@home v8 v8.03 x86_64-apple-darwin 0.447556 2.74 SETI@home v8 v8.00 (opencl_ati5_mac)x86_64-apple-darwin 0.008275 1.73 SETI@home v8 v8.00 windows_intelx86 0.008275 1.89 SETI@home v8 v8.00 windows_intelx86 0.008275 1.47 SETI@home v8 v8.00 windows_intelx86 0.008275 1.47 SETI@home v8 v8.00 windows_intelx86 0.008275 3.71 SETI@home v8Anonymous platform (ATI GPU) 0.008275 3.44 SETI@home v8 v8.00 (opencl_ati5_mac)x86_64-apple-darwin 0.442300 1.00 SETI@home v8 v8.19 (opencl_ati5_nocal) windows_intelx86 0.442194 5.55 SETI@home v8 v8.00 (cuda50) windows_intelx86 0.480524 5.76 SETI@home v8 v8.00 (cuda42)windows_intelx86 1.145850 297.11 SETI@home v8 v8.00 (cuda50)windows_intelx86 wu#5291893088 151.087829 16235.11 SETI@home v8 v8.00 windows_intelx86 WU#5291876397 Ed F |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
could you explain table and second column in particular? SETI apps news We're not gonna fight them. We're gonna transcend them. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.