Posts by Raistmer

1) Message boards : Number crunching : Multicore smartphones - how to use their full potential (Message 1906210)
Posted 1 day ago by Profile Raistmer
Post:

Anyway when the phone is idle (second picture) i see only one core seems active, while when running BOINC (first picture) it seems all are active.
Cheers

That's how should be in case each task crunched on own core andphone uses all his computation nodes for crunching.
in my case 2 tasks launched by BOINC but only 1 CPU of 2 remains active.
So, each task takes ~2 times more elapsed time than CPU time to process and phone itself operates as single core devices.
2) Message boards : Number crunching : Acer Iconia A510 tablet - how to crunch? (Message 1906206)
Posted 1 day ago by Profile Raistmer
Post:
Are there any Acer Iconia A510 owner?

How do you able to use this device for BOINC?
It seems it can't charge while switched ON at all...
3) Message boards : Number crunching : Task Deadline Discussion (Message 1906005)
Posted 2 days ago by Profile Raistmer
Post:
I admit that my example doesn’t cover all deadline misses.
Lets consider separate categories:


1) "Drive-bys": Those users who visit the project because something they've read somewhere makes it seem interesting. They sign up, download a bunch of tasks, then immediately change their minds for some reason, and just simply drive away, leaving all their downloaded tasks to eventually time out. The timed-out tasks will never be replaced. A reduction in deadline times would simply clear out those WUs and their wingmens' pendings that much sooner.


Not falls in “my case”.
Deadline reduction will not change DB load (by load I understand DB records change per second) cause this load depends from quota and drive-away hosts creation rate solely. But will decrease average DB size (cause expired WUs will be eliminated faster).
But much more effective way to make improvement here is quota management improvement (to reduce number of initial WUs available for such hosts). This will affect both components: decrease DB load and decrease DB average size. So, deadline change is not right way to get improvement here.


2) Hosts that have died, or been abruptly replaced, or simply been shut down by established users for any number of reasons: Again, like in group 1, timed out tasks will not be replaced. Shorter deadlines will help.

Not “my case”. Quota management doesn’t help here.
Shortening deadline will not decrease DB load but will decrease average DB size.


3) Hosts with "ghosts": Those who have accumulated some number of tasks which still exist in the DB but, for whatever reason, no longer reside on the host and are unknown to the scheduler. These tasks have already been replaced on the host, but they can't be resent to other hosts until they time out, thus tying up DB space and resources unnecessarily. Shorter deadlines would certainly help with these, but so, in some cases, would more responsible users, or periodic temporary reactivation of the "lost" task resend mechanism, or even a web site method for users to trigger either resending or abandonment of such tasks.

If ghosts expired by missing deadline and host accumulates ghosts regularly it’s “my case”.


4) Hosts who steadily download large numbers of tasks without successfully completing any (such as the one with 6,000+ in the queue that I noted in earlier posts): These hosts are likely generating ghosts, so the daily downloads constitute a continuous, immediate replacement stream, long before the task deadlines are ever reached. Shorter deadlines would have a significant impact, though better quota management and some sort of user notification would be even more important areas to address.

“My case”. Deadline shortening will have big impact, but negative one – look example.
Note: if “long before the task deadlines are ever reached” then it out of consideration regarding deadlines. I consider only ghosts timed out by deadline of course. Others are subject to quota management change thread.


5) Hosts making sporadic contact: There are some with high turnaround times who also seem to be able to download an excessive number of tasks. They still seem to be doing some processing, occasionally returning batches of completed tasks, but they also have large numbers of time-outs. Reduced deadlines may or may not have an impact on those hosts, whose issues are probably better addressed through improved quota management.

If host misses deadline on timely fashion, it’s “my case”.


6) Hosts with temporary issues: There are certainly some who miss their deadlines for a short while, then resume normal processing. Perhaps the users went on extended vacations or business trips, took a while to replace a failed part in their PC, needed to cut their electric bill for awhile, etc. In some cases, shorter deadlines may cause tasks on these hosts to time out once they resume operations, even though those tasks might still be successfully completed. I think these hosts represent a small portion of the user base, but certainly shouldn't be ignored in the discussion.

Not “my case”. Deadline shortening will reduce chances for such hosts to recover so I expect negative overall impact.


7) Hosts running multiple projects: Depending on resource shares, and BOINC's ability to manage same, some of these hosts seem to push the deadline limits time and again, sometimes even exceeding them.

If deadline missed it’s missed on regular basis so it’s “my case”.


8) Slow, low-volume, low-power, and/or part-time hosts: "Slow" is a relative term in today's processing environment.

It’s ”my case” too.

So, out of 8 listed typed of deadline miss the clear positive impact from deadline shortening could be achieved only on type 2. On most of other impact will be negative.

And I think share of type 2 deadline misses can’t account for deadline change.
4) Message boards : Number crunching : Task Deadline Discussion (Message 1905987)
Posted 2 days ago by Profile Raistmer
Post:
The first is assuming that the "bad" host is always sent a new task to replace each one that times out.

Most of time. Either it's a 1 task per day host with such low overhead that we don't speak about it here at all.


The second, I think, is the 24-hour purge cycle for validated WUs. In your example, even a 6-day deadline is an extremely short one, compared to either the existing deadlines or my proposed reduction. Having a purge cycle that represents such a high percentage of the deadline window likely inflates your DB occupancy numbers. Since I found that a deadline of slightly over 48 days was the average in my sample, a 1 day purge cycle would represent about 2.1% of that average. So, if you reduce the 48 days to 6, wouldn't it also be necessary to reduce the 24-hour purge cycle to about 3 hours, and use that for both your "long" and "short" deadline calculations?

No matter what relative numbers are. With any numbers short deadline in my example will consume more DB resources than long one.
And it's not purge cycle, it's computation cycle for good host. Purge considered to be immediate after good result return for validation.
Adding non-zero purge only increase the difference between long and short cases (cause inflated DB will last longer).

@would make a 4.8 day "short" deadline more appropriate than the 2 day one in your example, as well.@ numbers in my example completely arbitrary, just to show in numbers effect of deadline shortening. I think one can get proportion with more real numbers if one wants.
5) Message boards : Number crunching : Task Deadline Discussion (Message 1905894)
Posted 3 days ago by Profile Raistmer
Post:
ime.

There are several identifiable categories of hosts which miss the deadlines.


I'm out of time currently so will make detailed consideration of these categories later.
6) Message boards : Number crunching : Task Deadline Discussion (Message 1905887)
Posted 3 days ago by Profile Raistmer
Post:
Well, just simple example:
Lets consider 2 hosts, one is missing deadline on regular basis, another one doing work OK.
Lets arbitrary set (just for simplicity) deadline to 6 days and 2 days (long and short) and turnaround time for good host of 1 day.
Long deadline case:

Day 1:
bad - 1, good -1 ( number is the number of records DB makes to account for sent WU) DB size:2
Day 2:
bad - 0, good -1 DB size:2
Day 3:
bad-0, good -1 DB size: 2
Day 4:
bad-0, good-1 DB size: 2
Day 5:
bad-0, good-1,DB size: 2
Day 6:
bad-0, good-1, DB size: 2
[Day 7:
bad-1, good-1 (maybe, resend from bad one but doesn't matter, each sent WU should have own record). DB size: 3 cause resend needs day to be processed, so DB holds 3 records: expired, new for bad, resend for good
]
So, total number of records made in DB for 6 days is 1+6=7
Mean size of DB for days 2-7 ( 6 days, day 1 isn't recurring one so out of averaging) is 2 and 1/6

Now short deadline case:
Day 1:
bad-1, good-1 DB size:2
Day 2:
bad-0, good-1 DB size:2
Day 3:
bad-1, good-1 (can be same task as was on bad prev days but no matter, separate record about new WU instance) DB size:3
Need to note here, that old record for bad host will still remain n this day cause good one spends day to process task.
So, on day 3 we actually have 3 simultaneous records: expired for bad, newly sent for bad and resend for good.
Day 4:
bad-0,good-1 (here DB shrinks to "normal" size of 2 records ) DB size:2
Day 5:
bad-1, good-1 DB size:3
Day 6:
bad-0, good-1 DB size:2
[Day 7:
bad-1, good-1 DB size:3]

So, total number of records made in DB for 6 days is 3+6=9
Mean size of DB for days 2-7 ( 6 days, day 1 isn't recurring one so out of averaging) is 2 and 1/2 =2.5

As one can see, short deadline case has increased both DB load and average storage size.

Where the flaw?
7) Message boards : Number crunching : Task Deadline Discussion (Message 1905575)
Posted 3 days ago by Profile Raistmer
Post:

As Raistmer has said, if you reduce the deadline you automatically increase the number of resends...
It seems to me that there's only a very tiny grain of truth to that. Again, using my data, out of 901 tasks still hanging around past the 80% mark, only 23 were eventually validated. The other 878 had to be resent no matter what. Those do not increase the number of resends.

Actually they do. Both never returning hosts as I said earlier _and_ those who regularly miss deadline.
If you trash 1 task per 3 week it's 3 times less than if you trash 1 task each week. For year you will get 3 times more resends from such hosts. + increase in such hosts number just due to increasing in processing power required to finish in time.

@The other 878 had to be resent no matter what.@ you consider this as one time event, but actually it should be considered as recurring event!
8) Message boards : Number crunching : Task Deadline Discussion (Message 1905573)
Posted 3 days ago by Profile Raistmer
Post:
Why not look from the other side?

Instead of make a short deathtime Line just increase the limit of the fastest hosts? The ones with very fast return. They are the ones who actually runs empty in the outages.

Nothing complex, something simple to code. Like, if the hosts return the work in 1 day increase the limit of WU from 100 to 200 per GPU.

That could satisfy most of the hungry hosts. And leaves some more room for the future when even faster GPU's where available.

Not mess with the rest of the SETI community and will not impact the size of the DB.

My 0.02 Cents.

That's right approach IMO, especially because nothing really disallow to make this manually on each host by introducing "virtual devices" to BOINC (either by re-scheduling CPU<-> GPU or by running multiple BOINC instances or even by creation some app_info.xml based additional "accelerators" (not tested by seems possible)).
9) Message boards : Number crunching : Task Deadline Discussion (Message 1905571)
Posted 3 days ago by Profile Raistmer
Post:
For now, at least, I would say that a 20% across-the-board reduction in deadlines could have a marked benefit to the project, with minimal impact to that tiny percentage of hosts who currently exceed the 80% of deadline threshold, at least based on the sample I analyzed.


Could you precisely formulate "benefit"? And if benefit is shrinkage in BOINC DB why it's required.
10) Message boards : Number crunching : Task Deadline Discussion (Message 1905568)
Posted 3 days ago by Profile Raistmer
Post:
I am a physicist not a ghostbuster.
Tullio


One doesn't exclude the other ;) :D
11) Message boards : Number crunching : Task Deadline Discussion (Message 1905337)
Posted 4 days ago by Profile Raistmer
Post:

I don't see how shorter deadlines would increase the rate of re-sends from broken hosts. Maybe from very slow hosts, but from broken hosts they would need to be re-sent at some point anyway. There just should be a better mechanism for not sending many more tasks to these broken hosts.

Tom


Rough speaking: N per 3 or more weeks (whatever current deadline is) or N per 1 week (or whatever suggested deadline is).
It's for never-returning hosts.
12) Message boards : Number crunching : Are we analyzing the data ? (Message 1905325)
Posted 4 days ago by Profile Raistmer
Post:
I returned a few weeks ago after a little more than year away from the seti forums . I was wondering if we are doing anything with all our data yet. From what a gathered ntpckr died a few years ago as not being doable. I tried searching the boards for something solid but haven't found any real answers yet.

(f this thread is in the wrong topic then could a mod please move it.)

Thanx
Bob

yes.
https://setiathome.berkeley.edu/forum_forum.php?id=1511
EDIT: and perhaps wrong boards were "searched".
13) Message boards : Number crunching : Task Deadline Discussion (Message 1905324)
Posted 4 days ago by Profile Raistmer
Post:
One thing that would be fairly light on the servers would be limits set by actual valid results returned. Doing so would "starve" the persistant error/invalid generators, while allowing the "good boys" a few more tasks. This is laready done to an extent, but the half-life on returning to normal is pretty short. This might also have a benifit in helping to clear out ghosts (but I haven't really thought about that side of things)


Improvement in quota management definitely needed. For now host with good + bad GPU of same vendor fully invisible for quota management.
Especially bad case when slow GPU is good and fast one is broken - small rate of good tasks enough to trash many tasks with good rate... giving almost nothing good in return.
14) Message boards : Number crunching : Task Deadline Discussion (Message 1905323)
Posted 4 days ago by Profile Raistmer
Post:

If deadlines are reduced this may allow for more work to be sent to 'reliable' hosts so they don't run dry during the weekly outage. The 100 task limit is a pain for modern high-end GPUs and if that limit could be increased it may actually improve processing power.

Tom


How so? You mix 100 task per device limit with deadline. From what you infer that shortening deadline will automatically increase 100-per-device limit?

If that limit would be increased, yes, performance would increase, but no need "to mix salt with hot". Shorter deadlines also could just increase rate of re-sends from broken hosts. Cause now they will refresh their locked tasks more often. And this could result in shortening "100" limit instead of increasing it.
15) Message boards : Number crunching : Task Deadline Discussion (Message 1905305)
Posted 5 days ago by Profile Raistmer
Post:
Until we go into realtime processing shorter deadlines == lose processing power from partially-involved hosts and nothing more.
Deadlines should be set as big and long as current server infrastructure allows.
16) Message boards : Number crunching : TestCase: blc25_2bit_guppi_57895_47387_HIP91358_0034.24610.818.23.46.191.vlar (Message 1904155)
Posted 10 days ago by Profile Raistmer
Post:
Well,I think thedifference occurs well before comparison block started on peak/average calculation stage.
If one wants to get further with this task one could just tweak threshold a little (threshold stored inside task itself) and this will makethe app reports more pulses.
That way even apps that report 10 pulses currently can report that 11th pulse too. So we can see what they calculated for this task.
The area of interest (in task header) is:
  <analysis_cfg>
    <spike_thresh>24</spike_thresh>
    <spikes_per_spectrum>1</spikes_per_spectrum>
    <autocorr_thresh>17.7999992</autocorr_thresh>
    <autocorr_per_spectrum>1</autocorr_per_spectrum>
    <autocorr_fftlen>131072</autocorr_fftlen>
    <gauss_null_chi_sq_thresh>2.46249485</gauss_null_chi_sq_thresh>
    <gauss_chi_sq_thresh>1.41999996</gauss_chi_sq_thresh>
    <gauss_power_thresh>3</gauss_power_thresh>
    <gauss_peak_power_thresh>3.20000005</gauss_peak_power_thresh>
    <gauss_pot_length>64</gauss_pot_length>
    <pulse_thresh>20.7340908</pulse_thresh>
    <pulse_display_thresh>0.5</pulse_display_thresh>
    <pulse_max>40960</pulse_max>
    <pulse_min>16</pulse_min>
    <pulse_fft_max>8192</pulse_fft_max>
    <pulse_pot_length>256</pulse_pot_length>
    <triplet_thresh>9.73841</triplet_thresh>
    <triplet_max>131072</triplet_max>
    <triplet_min>16</triplet_min>
    <triplet_pot_length>256</triplet_pot_length>
    <pot_overlap_factor>0.5</pot_overlap_factor>
    <pot_t_offset>1</pot_t_offset>
    <pot_min_slew>0.00410000002</pot_min_slew>
    <pot_max_slew>0.0203000009</pot_max_slew>
    <chirp_resolution>0.1665</chirp_resolution>
    <analysis_fft_lengths>262136</analysis_fft_lengths>
    <bsmooth_boxcar_length>8192</bsmooth_boxcar_length>
    <bsmooth_chunk_size>32768</bsmooth_chunk_size>
    <chirps>
    <chirp_parameter_t>
      <chirp_limit>30</chirp_limit>
      <fft_len_flags>262136</fft_len_flags>
    </chirp_parameter_t>
    <chirp_parameter_t>
      <chirp_limit>100</chirp_limit>
      <fft_len_flags>65528</fft_len_flags>
    </chirp_parameter_t>
  </chirps>
  <pulse_beams>1</pulse_beams>
  <max_signals>30</max_signals>
  <max_spikes>8</max_spikes>
  <max_autocorr>8</max_autocorr>
  <max_gaussians>0</max_gaussians>
  <max_pulses>0</max_pulses>
  <max_triplets>0</max_triplets>
  <keyuniq>-12226561</keyuniq>
  <credit_rate>2.8499999</credit_rate>
</analysis_cfg>


In particular, these 2 thresholds
<pulse_thresh>20.7340908</pulse_thresh>
<pulse_display_thresh>0.5</pulse_display_thresh>

If threshold will be lowered (I think first one should be used for changing reportable pulses), more signals will be reported.
17) Message boards : Number crunching : TestCase: blc25_2bit_guppi_57895_47387_HIP91358_0034.24610.818.23.46.191.vlar (Message 1903778)
Posted 12 days ago by Profile Raistmer
Post:
Such borderline effects could come also from >= usage instead of > .

Regarding thresholds- there are few of them with different computation complexity degree.
So first candidate checked with easiest one and if it passed the the test next one applied.

          thresh_scale = t_funct(di,num_adds,di+tabofst);
          dis_thresh = avg*(thresh_scale*pulse_display_thresh + (1 - pulse_display_thresh));
          float lowest_thresh = t_funct(baildi, num_adds << (ndivs - 1), baildi + bailtab);
          float bail_scale = best_pulse->score;
          if (bail_scale > 1.0f) bail_scale = 1.0f;
          else if (bail_scale < pulse_display_thresh) bail_scale = pulse_display_thresh;
          float bail_thresh = avg*(lowest_thresh*bail_scale + (1 - bail_scale));
          if (tmp_max<bail_thresh) {
            continue;
          }
          if (tmp_max>dis_thresh) {
            cur_thresh = thresh_scale * avg;
            cpy_thresh = cur_thresh + maxd;
		    float snr_tmp=(float)((tmp_max-avg)*sqrt((float)num_adds)*ravg);
			float thresh=(float)((cur_thresh-avg)*sqrt((float)num_adds)*ravg);
			if(snr_tmp/thresh>best_pulse->score){
              ReportPulseEvent(tmp_max*ravg,avg,cperiod,
                             TOffset+(int)(PulsePotLen/2),FOffset,snr_tmp,thresh, div, max_scale, 0);
			  if(verbose>=2 && verbose<6)fprintf(stderr,"B:\tthreshold %.7g; unscaled peak power: %.7g exceeds threshold for %.4g%%\n",
				  cur_thresh,tmp_max,(tmp_max-cur_thresh)/cur_thresh*100.f); 
			}

         if (tmp_max>cpy_thresh) {
                maxp = cperiod;
                maxd = tmp_max - cur_thresh;
                max = tmp_max;
                snr = snr_tmp;//(tmp_max-avg)*(float)sqrt((float)num_adds_2)*ravg;
                fthresh = thresh;//(cur_thresh-avg)*(float)sqrt((float)num_adds_2)*ravg;
                mmax_scale = max_scale;
                memcpy(FoldedPOT, div+stoffset, di*sizeof(float));
  			    max_cur_thresh=cur_thresh;//R: to print it if needed
              }

	if (maxp!=0){
      ReportPulseEvent(max*ravg,avg,maxp,TOffset+PulsePotLen/2,FOffset,
                       snr, fthresh, FoldedPOT, mmax_scale, 1);
	  if(verbose >=1) fprintf(stderr,"D:\tthreshold %.7g; unscaled peak power: %.7g exceeds threshold for %.4g%%\n",
		  max_cur_thresh,max,(max-max_cur_thresh)/max_cur_thresh*100.f); 
	}

ReportPulseEvent(...,0) is for best pulse update
ReportPulseEvent(...,1) is for reportable pulse write.

int ReportPulseEvent(float PulsePower,float MeanPower, float period,
                     int time_bin,int freq_bin, float snr, float thresh, float *folded_pot,
                     float max_scale, int write_pulse) {
  PULSE_INFO pi;
  pulse pulse;

  // pulse info
  pi.score=snr/thresh;
  pi.p.peak_power=PulsePower-1;
  pi.p.mean_power=MeanPower;
  pi.p.fft_len=ChirpFftPairs[analysis_state.icfft].FftLen;
  pi.p.chirp_rate=ChirpFftPairs[analysis_state.icfft].ChirpRate;
  pi.p.period=static_cast<float>(period*static_cast<double>(pi.p.fft_len)/swi.subband_sample_rate);
  pi.p.snr = snr;
  pi.p.thresh = thresh;
  pi.p.len_prof = len_prof;
  pi.freq_bin=freq_bin;
  pi.time_bin=time_bin;
  pi.p.freq=cnvt_bin_hz(freq_bin, pi.p.fft_len);
  double t_offset=(static_cast<double>(time_bin)+0.5)
       *static_cast<double>(pi.p.fft_len)/
         swi.subband_sample_rate;
  pi.p.detection_freq=calc_detection_freq(pi.p.freq,pi.p.chirp_rate,t_offset);
  pi.p.time=swi.time_recorded+t_offset/86400.0;
  time_to_ra_dec(pi.p.time, &pi.p.ra, &pi.p.decl);


That's main stages of pulse checking.
18) Message boards : Number crunching : TestCase: blc25_2bit_guppi_57895_47387_HIP91358_0034.24610.818.23.46.191.vlar (Message 1903607)
Posted 12 days ago by Profile Raistmer
Post:
Well, score for Pulse signal calculated as:
pi.score=snr/thresh;
where snr (signal to noise ratio) is: float _snr = (tmp_max-avg)/(cur_thresh-avg);
So yes, value of 1 or smth very close to 1 is good sign of very small exceed of threshold. Thanks for bringing that to attention, that could help in determination of such cases in the future.

Answerring on another question: here is signal report statement from OpenCL PulseFind kernel:
(tmp_max>cur_thresh)
If true - signal reported (for re-check on CPU). So, little different from score analysis.
On CPU signal report statement for best update looks like:
float snr_tmp=(float)((tmp_max-avg)*sqrt((float)num_adds)*ravg);
float thresh=(float)((cur_thresh-avg)*sqrt((float)num_adds)*ravg);
if(snr_tmp/thresh>best_pulse->score){

and report pulse itself:
if (tmp_max>cpy_thresh){....

So, score doesn't directly used for determination if Pulse will be reported or not (but score directly used for best Pulse update selection)
19) Message boards : Number crunching : TestCase: blc25_2bit_guppi_57895_47387_HIP91358_0034.24610.818.23.46.191.vlar (Message 1903522)
Posted 12 days ago by Profile Raistmer
Post:
Thanks for participation, I think the case IS closed. Task is above threshold by neglectible small value. So, it's general example of borderline discrepance.
That's why I added threshold display in stderr for almost all apps - to relatively fast analysis. If only x86 CPU would have it too...
20) Message boards : Number crunching : TestCase: blc25_2bit_guppi_57895_47387_HIP91358_0034.24610.818.23.46.191.vlar (Message 1903520)
Posted 12 days ago by Profile Raistmer
Post:
PErhaps into separate directory or in KWSN bench. Not too familiar with KWSN bench layout under Linux.


Next 20


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.