blanked AP tasks


log in

Advanced search

Message boards : Number crunching : blanked AP tasks

Author Message
atlov
Send message
Joined: 11 Aug 12
Posts: 5
Credit: 8,591,868
RAC: 4,945
Germany
Message 1468113 - Posted: 24 Jan 2014, 10:35:39 UTC

Hi folks!

I was wondering why S@H wastes resouces on delivering AP workunits that are 100% blanked. My machine takes about 2 seconds for processing, but the project has to transfer >8MBytes, which is really a waste of resources. Isn't it possible to filter out these task before distribution?

Ulrich Metzner
Volunteer tester
Avatar
Send message
Joined: 3 Jul 02
Posts: 969
Credit: 8,151,090
RAC: 10,201
Germany
Message 1468120 - Posted: 24 Jan 2014, 11:06:36 UTC

+1!

Good question!
____________
Aloha, Uli

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4203
Credit: 1,030,189
RAC: 263
United States
Message 1468400 - Posted: 24 Jan 2014, 19:28:22 UTC - in response to Message 1468113.

Hi folks!

I was wondering why S@H wastes resouces on delivering AP workunits that are 100% blanked. My machine takes about 2 seconds for processing, but the project has to transfer >8MBytes, which is really a waste of resources. Isn't it possible to filter out these task before distribution?

The recurring problem with beam 3 polarity 1 data which caused your recent tasks needing 100% blanking has indeed been an annoyance for a long time.

The proper fix would be for one of the staff to go to Arecibo and fix the intermittent hardware problem. I have no idea how much funding would have to increase to move that idea up into the practical category. Meanwhile the project limps along as best it can with what donations they get.

A server software change to work around the problem 'til it can really be fixed is of course possible, though such changes have a cost too. The code from the AP applications which checks the data for areas needing blanking could be copied into the AP splitters, for instance. Or when the data pipeline is dividing the data into tape sized files, the beam 3 polarity 1 data could be checked by gzipping it and calling it bad if that achieves more than about 25% compression. Either change would probably require modifying other code so the right thing can be done if bad data is detected.

With the servers in the colocation facility, there's currently no project impact from sending out extra 8 MB AP WUs. OTOH, for users on dial-up or some internet connection which is charged by the amount of data transferred there's obvious impact.
Joe

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2236
Credit: 8,446,025
RAC: 4,079
United States
Message 1468435 - Posted: 24 Jan 2014, 21:23:59 UTC - in response to Message 1468400.

A server software change to work around the problem 'til it can really be fixed is of course possible, though such changes have a cost too. The code from the AP applications which checks the data for areas needing blanking could be copied into the AP splitters, for instance. Or when the data pipeline is dividing the data into tape sized files, the beam 3 polarity 1 data could be checked by gzipping it and calling it bad if that achieves more than about 25% compression. Either change would probably require modifying other code so the right thing can be done if bad data is detected.

Quite true. I know there is already a system in place that detects and applies a fix for when the local land-based radar interference is present and applies pseudo data to the tapes in those locations.. so it is conceivable that it should be able to make a simple table that says "blanking starts at X-byte offset and has a duration of Y-bytes." That table can be cross-checked by the splitters and if a WU will be 100% blanked, don't bother sending it out at all.

Of course, the radar detection has no idea about B3_P1 being on the fritz again, so a relatively low-cost (to CPUs) compression check periodically on B3_P1 would be needed. I thought I remembered reading in one of the Tech News posts that the splitters got upgraded in a way that would detect 100% blanked WUs "much better" but the frequency of them seems to be about the same. So, maybe the splitters have been upgraded with that functionality to handle them, but the secondary process that tells the splitters about 100% blanked tasks hasn't been implemented yet.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3333
Credit: 19,033,501
RAC: 21,108
Sweden
Message 1480579 - Posted: 22 Feb 2014, 9:42:20 UTC
Last modified: 22 Feb 2014, 9:49:31 UTC

Why on Gods green Earth do we waste time crunching tasks that are 99.7% blanked. I have one now on my GPU, and since it is so heavily blanked 99.7%, it runs heavily on the CPU. It will take over 3 hours to crunch, and that is on a ATI HD7870.

6.04 astropulse_v6 (ati_opencl_100)
ap_17my13ag_B2_P0_00152_20140209_19932.wu_1

<fraction_blanked>0.997069</fraction_blanked

Geeze...

Edit: Finished product: http://setiathome.berkeley.edu/result.php?resultid=3378986935

single pulses: 0
repetitive pulses: 0
percent blanked: 99.71

____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,099,908
RAC: 24,560
Russia
Message 1480582 - Posted: 22 Feb 2014, 9:53:19 UTC - in response to Message 1480579.
Last modified: 22 Feb 2014, 9:54:39 UTC

Why on Gods green Earth do we waste time crunching tasks that are 99.7% blanked. I have one now on my GPU, and since it is so heavily blanked 99.7%, it runs heavily on the CPU. It will take over 3 hours to crunch, and that is on a ATI HD7870.

6.04 astropulse_v6 (ati_opencl_100)
ap_17my13ag_B2_P0_00152_20140209_19932.wu_1

<fraction_blanked>0.997069</fraction_blanked

Geeze...


Yes, such % is absolute junk (it especially hurts cause such blanking will slow GPU app the most).

But... cause nobody developed anything better so far. There was Joe's mod for skipping blanked areas, but AFAIK he met some issues with boundary areas of blanked and non-blanked parts. Stock algorithm uses noise substitution as blanking. Hence, such blanked task even can return some reportable pulses (yes, this happens in white noise time ti time, and blanking algorithm uses some type of shaped noise instead that can bring false positives too) but no reality in those pulses at all. But just skipping such areas will change boundary areas too so task will fail validation time to time. We need blanking by skipping to be implemented in stock too then. For some reason this step was not done and now mod looks abandoned. Maybe Joe would comment more on this topic.
____________

Juha
Volunteer tester
Send message
Joined: 7 Mar 04
Posts: 175
Credit: 138,941
RAC: 2
Finland
Message 1480639 - Posted: 22 Feb 2014, 15:06:50 UTC - in response to Message 1480582.
Last modified: 22 Feb 2014, 15:07:02 UTC

But... cause nobody developed anything better so far. There was Joe's mod for skipping blanked areas, but AFAIK he met some issues with boundary areas of blanked and non-blanked parts. Stock algorithm uses noise substitution as blanking. Hence, such blanked task even can return some reportable pulses (yes, this happens in white noise time ti time, and blanking algorithm uses some type of shaped noise instead that can bring false positives too) but no reality in those pulses at all.

Does the app report the blanked areas back to the server so that post-analysis can tell which of the pulses are junk and which are real?

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3368
Credit: 46,099,908
RAC: 24,560
Russia
Message 1480723 - Posted: 22 Feb 2014, 19:59:48 UTC - in response to Message 1480639.

Good question.
indexes.txt contains blanked areas, but looks like it's not reported back to server.
One thing to rise to Eric actually...
____________

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4203
Credit: 1,030,189
RAC: 263
United States
Message 1480777 - Posted: 22 Feb 2014, 22:12:52 UTC - in response to Message 1480579.

Why on Gods green Earth do we waste time crunching tasks that are 99.7% blanked. I have one now on my GPU, and since it is so heavily blanked 99.7%, it runs heavily on the CPU. It will take over 3 hours to crunch, and that is on a ATI HD7870.
...

Basically because that's the way Josh Von Korff decided it should be. 99.71% blanking means there are 6 good data chunks. Josh was fairly generous in defining how much before and after a detected RFI point should be blanked, so those 6 data chunks are very probably good data.

OTOH, the data which the application decides is good may have already been replaced with shaped noise server-side.

As I see it the essence of the problem is that cash contributions are enough to cover the Colo costs, H.E. ISP costs, and about 1.5 staff. The staff has managed to keep the project limping along in spite of that, (I do realize that many who read these forums do contribute, but there have only been 3791 donations since February 1 2012 and there are 136196 active users.)
Joe

Profile petri33
Volunteer tester
Send message
Joined: 6 Jun 02
Posts: 372
Credit: 66,654,279
RAC: 54,192
Finland
Message 1480782 - Posted: 22 Feb 2014, 22:50:43 UTC

Blanking is done now by generating standard noise (random numbers) by CPU.

What if the signal was damped gradually to zero or sample average during blanking?

Any computationally viable constant value would do. There would be no pulses, no nothing. The damping would ensure that the border between a possible signal and the banking would not produce any false reports. (Abrupt zeroing could produce a spike or a pulse. To represent square waves You'll need a lot of higher harmonics.) A gradual damping would eliminate those.


    ____________

    Josef W. Segur
    Volunteer developer
    Volunteer tester
    Send message
    Joined: 30 Oct 99
    Posts: 4203
    Credit: 1,030,189
    RAC: 263
    United States
    Message 1480794 - Posted: 22 Feb 2014, 23:50:16 UTC - in response to Message 1480782.

    Blanking is done now by generating standard noise (random numbers) by CPU.

    What if the signal was damped gradually to zero or sample average during blanking?

    Any computationally viable constant value would do. There would be no pulses, no nothing. The damping would ensure that the border between a possible signal and the banking would not produce any false reports. (Abrupt zeroing could produce a spike or a pulse. To represent square waves You'll need a lot of higher harmonics.) A gradual damping would eliminate those.


      The data is analyzed in chunks of 32K samples for single pulse finding, and blanking is either applied or not to a complete chunk. Any edge effects are the same whether the chunk is data from the WU or faked, a matter of using a rectangular window on the 32K FFTs used for dedispersion. It's simply scientific nonsense to look in the fake data for single pulses, and Josh recognized that to the extent of having some non-working commented out code to skip that processing. The primary problem was how to feed data to the FFA code which searches for repetitive pulses, something has to go into the time periods which are blanked. The FFA algorithm doesn't have edge effects, and the data from good chunks is normalized before the single pulse search starts, so my solution was to put flat line data at the correct average level into those time periods. I contend that's correct and believe Eric would agree if he ever has time to consider the change.

      Raistmer's mention of possible edge effects relates to the sigind_v5 test WU. That has over 61% blanking and when analyzed with the shaped noise blanking produces no reportable signals. When analyzed with my proposed change it does find one or two reportable repetitive pulses, I no longer remember exactly. My belief is that simply indicates that folding in the shaped noise was suppressing some signals which really exist in the data from the WU.
      Joe

      Profile Raistmer
      Volunteer developer
      Volunteer tester
      Avatar
      Send message
      Joined: 16 Jun 01
      Posts: 3368
      Credit: 46,099,908
      RAC: 24,560
      Russia
      Message 1480868 - Posted: 23 Feb 2014, 7:56:02 UTC - in response to Message 1480794.

      I contend that's correct and believe Eric would agree if he ever has time to consider the change.

      Raistmer's mention of possible edge effects relates to the sigind_v5 test WU. That has over 61% blanking and when analyzed with the shaped noise blanking produces no reportable signals. When analyzed with my proposed change it does find one or two reportable repetitive pulses, I no longer remember exactly. My belief is that simply indicates that folding in the shaped noise was suppressing some signals which really exist in the data from the WU.
      Joe


      If Eric at least in some resembles me ;) I would say we just need to remind about this issue once again. I think it's just human imperfection to forget about some things not any intentional skipping this issue. Maybe time came to do such reminding. Maybe some time will be allocated.
      ____________

      Richard Haselgrove
      Volunteer tester
      Send message
      Joined: 4 Jul 99
      Posts: 8375
      Credit: 46,695,811
      RAC: 19,913
      United Kingdom
      Message 1480915 - Posted: 23 Feb 2014, 10:57:07 UTC - in response to Message 1480912.

      And Sten-Arne,

      Notice the error message from stderr output:

      WARNING: BOINC supplied wrong platform!

      Not a surprise that it become wrong here. I guess you received either the wrong driver or application for your platform and that it was due to a server quirk or hiccup.

      No.

      Sten-Arne chooses to use a very old version of BOINC (v6.10.58) which doesn't understand OpenCL properly. It's OK - both he and the AP application know what is going on, and there are work-rounds in place.

      That's why it's a 'warning' message, and not an 'error' message.

      Profile Jeff Buck
      Send message
      Joined: 11 Feb 00
      Posts: 258
      Credit: 28,743,682
      RAC: 81,662
      United States
      Message 1507106 - Posted: 22 Apr 2014, 3:04:52 UTC

      Since a "reminder" was called for a couple months ago, I think I'll resurrect this discussion with AP task 3497681666, which ran on a GTX 660 for over 3.5 hours (and used 2.5+ hours of CPU time). The Stderr shows:

      percent blanked: 99.85

      When an AP task such as 3497598158 with "percent blanked: 0.00" runs on the same GPU in under 46 minutes (less than 20 min. CPU time) and a 100% blanked AP task such as 3497681654 is disposed of in 3.23 seconds (1.03 sec. CPU time), it strikes me as utterly perverse for a task that's 99.85% blanked to burn up so many resources to process so little data. (And that task was followed shortly thereafter by task 3497681640 which was 96.68% blanked and sucked up another 3.2 hours of Run Time.)

      There's GOT to be a better way! :^)

      Profile Raistmer
      Volunteer developer
      Volunteer tester
      Avatar
      Send message
      Joined: 16 Jun 01
      Posts: 3368
      Credit: 46,099,908
      RAC: 24,560
      Russia
      Message 1507191 - Posted: 22 Apr 2014, 7:12:28 UTC - in response to Message 1507106.
      Last modified: 22 Apr 2014, 7:13:13 UTC

      As was said before try to attract Eric's attention to this issue.
      We need stock build change to implement blanking. As Joe (and me too) believes such change will be not only great speedup (especially for GPU builds) but will be more scientifically correct too. But validator doesn't know anything about that. It just compares one signal to another no matter fake or no fake. Hence if modded build will skip fake signals... validator will mark such results as invalids!
      That's why stock need to be changed too, release of new opt build is not enough.
      ____________

      Message boards : Number crunching : blanked AP tasks

      Copyright © 2014 University of California