why are the WU's recorded in 2007 still being sent.... what is the reason behind it..

Message boards : Number crunching : why are the WU's recorded in 2007 still being sent.... what is the reason behind it..
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile BYTES of GLORY...
Avatar

Send message
Joined: 1 Mar 08
Posts: 14
Credit: 246,482
RAC: 0
India
Message 723396 - Posted: 8 Mar 2008, 14:13:12 UTC

why are the WU's recorded in 2007 still being sent out to be processed....
although i have got WU's recorded on feb 08..
i am getting WU's recorded on March 07, Feb 07 too..
what is the reason behind it.....

pls tell me in detail...
ID: 723396 · Report as offensive
Profile SATAN
Avatar

Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,129,006
RAC: 0
United Kingdom
Message 723398 - Posted: 8 Mar 2008, 14:28:03 UTC

THere are millions of units still to be split and processed. The guys in the office are just putting a mixture of new and old stuff.

If you want a more meaningful reason, you wont find one.
ID: 723398 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 723436 - Posted: 8 Mar 2008, 16:32:53 UTC
Last modified: 8 Mar 2008, 16:33:57 UTC

Another fundamental reason is that even if all processing was done with the fastest hosts, running the fastest application available now, and assuming all the work was of the shortest runtime type (and 'clean'), they can still record it around an order of magnitude faster than it can be run.

When you take into account the above represents only a small fraction of what actually exists in the field, then you have your answer. ;-)

Alinator
ID: 723436 · Report as offensive
Stephen

Send message
Joined: 1 Sep 06
Posts: 103
Credit: 11,155,194
RAC: 0
Message 723437 - Posted: 8 Mar 2008, 16:33:14 UTC - in response to Message 723396.  

why are the WU's recorded in 2007 still being sent out to be processed....
although i have got WU's recorded on feb 08..
i am getting WU's recorded on March 07, Feb 07 too..
what is the reason behind it.....

pls tell me in detail...



I've got a wu from December of 06. They'll all get processed anyways, why does it matter?
ID: 723437 · Report as offensive
Profile Jim-R.
Volunteer tester
Avatar

Send message
Joined: 7 Feb 06
Posts: 1494
Credit: 194,148
RAC: 0
United States
Message 723459 - Posted: 8 Mar 2008, 17:43:44 UTC - in response to Message 723396.  

why are the WU's recorded in 2007 still being sent out to be processed....
although i have got WU's recorded on feb 08..
i am getting WU's recorded on March 07, Feb 07 too..
what is the reason behind it.....

pls tell me in detail...


Any signals that have ever been recorded could show up in a work unit. The current application is capable of analyzing any signals we have, but with more sensitivity. So if there's questionable data from old signals then we might be asked to recrunch the data with the more sensitive applications that we have now.

A reason why the '07 data is showing up is that it's only been two months into '08 and there's lots of data left from '07. We don't crunch the data in real time. There is more data recorded in a certain time period than can be split and crunched in that same time. We are crunching data 24/7 but the system at Arecibo is only recording data a small portion of that time.

Back when we were recording signals using the "linefeed" array, there was only one signal being recorded at a time. Now we are recording seven signals at a time at two different antenna polarities so in effect for every minute of recording time we have fourteen minutes of "signals" or data to work with. There was about seven years of data recorded with the linefeed array and it was only last year with the introduction of new faster applications and the improvements in computer technology that we finally were able to get "caught up", and that was partially because of the fact that during the transition from the linefeed array to the new multibeam array there was quite a long time that there was absolutely no data flowing from the telescope.

Also, with the new more sensitive applications we have now and ones to be developed in the future, it may benefit us to look at the old data once again with the new applications. These signals (if any) have been traveling for hundreds or thousands of years, so what is the difference if we analyze it and find it "right now" or record the data and find the signal ten years from now? We here on Earth are accustomed to picking up a telephone and having a real-time conversation with anyone, anywhere on earth, and even including our near space. But space is vast so for anything more than Earth-Moon distances time begins to play a very important factor in communications. This is why NASA and other space agencies are developing cutting edge robotics and artificial intelligence systems. It would be impossible to drive a vehicle going only a few miles per hour (walking speeds) on Mars if you had to do so by a radio link. By the time you saw a ravine and sent the signal to turn away from it, your robotic "self" would have already walked into it! So just imagine the problems with trying to communicate with another civilization maybe hundreds or thousands of light years away. It would be a very long drawn out conversation indeed! "Hi, I'm on Earth. It's a pretty blue and green planet.".... a hundred or more years later your other civizilation gets this message and replies "Hi Earth, I am on Bopa, a planet that looks red from space." Two hundred or more years after your message you finally get the reply. Your answer is "Hi, Bopa, We no longer have an Earth, we blew it to bits 50 years ago and now we live in space." I hope the scenario above never happens exactly like that, but my point is by the time we receive a signal, the civilization that sent it may not even exist any more.

So to boil it all down, every second of data we have is important no matter when it was actually recorded. We are not trying to communicate with aliens, but just trying to find a signal that would mean that we are not alone in the universe. That "signal" would have the same impact no matter when it was actually recorded. Time is not a factor in this. Take this for instance... You have a garden and pick a fresh tomato out of it. If you don't eat it now it will soon spoil and it will no longer be fit to eat. This would be like a person sitting at a receiver with a set of headphones on listening to the signal being received. Remember we have no way of talking back to the ones that sent the signal. The person is just sitting there listening to see if they can find one. Now take that same tomato and put it in the freezer. (lets assume that the freezer does no damage to the tomato so that it would be a perfect fresh tomato when you took it back out.) Say you had the tomato in the freezer for ten years. You eat it and still get the benefit from it. The time hasn't destroyed anything (again assuming the perfect freezing/thawing process). This is what we are doing. We are "freezing" or recording the signals at one time and "thawing/eating" them later, or splitting the data into work units and crunching them. If a signal is found it would still give us the same benefits today as it would if it were found by a man sitting there with a set of headphones on when the signal were actually received. And actually a person with a set of headphones wouldn't be able to detect the type of signals we could find. (very weak signals barely above the background noise level.) So the time when a work unit's signals were recorded is meaningless as far as finding the signal.

Also just because a signal were present at one time doesn't mean it is what we are looking for. That is the reason for the NTPCKR or "nitpicker" program being developed. The keyword here is "persistency" or a signal being present from the same location in the sky at different times. When we get the nitpicker online and it has been able to chew on the data and spit out some meaningful results I feel that some of the old tapes from many years ago might be brought out and reanalyzed using the new more sensitive apps we now have available.

Sorry for my long-winded response but I hope you have a better understanding of why it is not so important "when" the data was recorded and some reasons why we may still see even very old data flowing through at various times.

BTW, I never mentioned the radar blanking signal. There is radar interference with much of the new data being recorded. To keep from sending out thousands of noisy work units at a time (the ones that would return a -9 overflow error) large chunks of the data had to be left out of the splitter process. The very first multibeam signals recorded didn't have a "radar blanking" signal recorded with it. Once they found the radar signals interfered with the main signal they started using an extra antenna and receiver to detect when the radar was turned on or off and recorded these exact times on an extra track of data. Matt and Eric are working on a new splitter program that can make use of this data track to split work units all the way up to the exact time the radar is turned on and resume splitting at the exact time the signal cuts off. This way we can use data that we are now "throwing out" as a buffer to make sure we don't send out the noisy work. Just speculating, but they might be wanting to crunch all of the pre-blanking signals now before they implement the new splitter with the blanking feature. And when they get the new splitters running I'm quite sure we will see older ('07) data coming through as we are able to split out the chunks that were left out of the first runs as insurance of not sending out the radar signal.

hope this helps.
Jim

Some people plan their life out and look back at the wealth they've had.
Others live life day by day and look back at the wealth of experiences and enjoyment they've had.
ID: 723459 · Report as offensive
Stephen

Send message
Joined: 1 Sep 06
Posts: 103
Credit: 11,155,194
RAC: 0
Message 723466 - Posted: 8 Mar 2008, 18:07:15 UTC - in response to Message 723459.  

Yea, that's what I meant to say......


Great answer Jim!
ID: 723466 · Report as offensive
Profile BYTES of GLORY...
Avatar

Send message
Joined: 1 Mar 08
Posts: 14
Credit: 246,482
RAC: 0
India
Message 723714 - Posted: 9 Mar 2008, 13:30:45 UTC - in response to Message 723459.  

ya i think i got it....
thanks for helping me out with this.....
i appreciate it...


thanks
-BYTES OF GLORY
ID: 723714 · Report as offensive
Profile AndyW Project Donor
Volunteer tester
Avatar

Send message
Joined: 23 Oct 02
Posts: 5862
Credit: 10,957,677
RAC: 18
United Kingdom
Message 723716 - Posted: 9 Mar 2008, 13:37:34 UTC

Jim, thanks for the detailed answer. I enjoyed reading that.

Sounds like plenty of work for the future, if SETI can Keep the funds coming in
ID: 723716 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 723722 - Posted: 9 Mar 2008, 14:17:08 UTC - in response to Message 723716.  
Last modified: 9 Mar 2008, 14:20:57 UTC

Jim, thanks for the detailed answer. I enjoyed reading that.

Sounds like plenty of work for the future, if SETI can Keep the funds coming in


Yep a very clear explanation .

I wander, howmany more, CPUtime is needed, to process, the amount off daty
from the multibeam recorder inin REAL TIME.?
ID: 723722 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19014
Credit: 40,757,560
RAC: 67
United Kingdom
Message 723727 - Posted: 9 Mar 2008, 14:30:57 UTC

According to one of the news items Eric listed in post SETI@home in the News

As a result, the new instruments at Arecibo collect a whopping 300 gigabytes of data each day, or 100 terabytes per year.


I'll let you do the maths, its Sunday, a day of rest.
ID: 723727 · Report as offensive
Stephen

Send message
Joined: 1 Sep 06
Posts: 103
Credit: 11,155,194
RAC: 0
Message 723738 - Posted: 9 Mar 2008, 15:23:03 UTC - in response to Message 723727.  

According to one of the news items Eric listed in post SETI@home in the News

As a result, the new instruments at Arecibo collect a whopping 300 gigabytes of data each day, or 100 terabytes per year.


I'll let you do the maths, its Sunday, a day of rest.



Good thing I have a whole Sunday Winter Knight.

I had to go to the internet wayback machine to grab this first bit of data. If this no longer applies, my calculations are for naught.

"Note: Each tape contains approximately 33000 blocks (1049600 bytes in size representing 1.7 seconds of telescope data). The splitter reads about 48 blocks at a time and converts them into 256 workunits, which eventually get transitioned into 1024 results to send to our clients."

So 33,000 divided into groups of 48 gives me 687.5 groups. Each of these groups yields 256 WU for a total of 176,000 WU. There are two results for each WU, so 352,000 results for 1 tape. Assuming these tapes are 50 GIG, and Arecibo is giving us 300 GIG/day, that means 6 x 352,000 or 2,112,000 results a day.







ID: 723738 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19014
Credit: 40,757,560
RAC: 67
United Kingdom
Message 723743 - Posted: 9 Mar 2008, 15:34:06 UTC - in response to Message 723738.  

According to one of the news items Eric listed in post SETI@home in the News

As a result, the new instruments at Arecibo collect a whopping 300 gigabytes of data each day, or 100 terabytes per year.


I'll let you do the maths, its Sunday, a day of rest.



Good thing I have a whole Sunday Winter Knight.

I had to go to the internet wayback machine to grab this first bit of data. If this no longer applies, my calculations are for naught.

"Note: Each tape contains approximately 33000 blocks (1049600 bytes in size representing 1.7 seconds of telescope data). The splitter reads about 48 blocks at a time and converts them into 256 workunits, which eventually get transitioned into 1024 results to send to our clients."

So 33,000 divided into groups of 48 gives me 687.5 groups. Each of these groups yields 256 WU for a total of 176,000 WU. There are two results for each WU, so 352,000 results for 1 tape. Assuming these tapes are 50 GIG, and Arecibo is giving us 300 GIG/day, that means 6 x 352,000 or 2,112,000 results a day.

Did they have 50Gig tapes in 1998, or whenever they started collecting data?

The old receiver had a single linefeed, the new multibeam receiver has 7 beams, and each of these can produce horizontal and vertical polarisation. But units only take 70% of time to process compared to the old single line feed data.
ID: 723743 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19014
Credit: 40,757,560
RAC: 67
United Kingdom
Message 723746 - Posted: 9 Mar 2008, 15:41:15 UTC

This post by Matt gives some indication of tape size. #612367
ID: 723746 · Report as offensive
Profile Mumps [MM]
Volunteer tester
Avatar

Send message
Joined: 11 Feb 08
Posts: 4454
Credit: 100,893,853
RAC: 30
United States
Message 723758 - Posted: 9 Mar 2008, 16:08:03 UTC - in response to Message 723738.  

"Note: Each tape contains approximately 33000 blocks (1049600 bytes in size representing 1.7 seconds of telescope data).


That seems to work out to about 33 Gig per tape if I read that part of the statement correctly.

The splitter reads about 48 blocks at a time and converts them into 256 workunits, which eventually get transitioned into 1024 results to send to our clients."

So 33,000 divided into groups of 48 gives me 687.5 groups. Each of these groups yields 256 WU for a total of 176,000 WU. There are two results for each WU, so 352,000 results for 1 tape.


So it'd actually seem to follow that it'd be

9 * 352,000 ~= 3.2 million results a day.

Assuming these tapes are 50 GIG, and Arecibo is giving us 300 GIG/day, that means 6 x 352,000 or 2,112,000 results a day.


So now it's just time to find the statistics relating to total WU's processed per day, rather than RAC.
ID: 723758 · Report as offensive
Stephen

Send message
Joined: 1 Sep 06
Posts: 103
Credit: 11,155,194
RAC: 0
Message 723759 - Posted: 9 Mar 2008, 16:10:02 UTC - in response to Message 723746.  

This post by Matt gives some indication of tape size. #612367


Even taking an average of 3.5 TB, divided into 60 tapes gives an average of 58 GB/tape. Not too far off from the 50 GBs they use today.

And since I've started to enjoy my Sunday, I'm going to say my math is close enough for government work and call it a day.
ID: 723759 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 723763 - Posted: 9 Mar 2008, 16:19:48 UTC - in response to Message 723758.  

So now it's just time to find the statistics relating to total WU's processed per day, rather than RAC.

Scarecrow's graphs are a useful alternative to the back of an envelope for that sort of question.

At the moment, we're getting through about 40,000 results an hour, or near enough 1,000,000 results (or 500,000 workunits) a day. That's a bit lower than usual, because almost all the tasks are long-running (very few shorties ATM): but it's also a bit higher than usual, because a bad splitter is causing some tasks to fail almost immediately.
ID: 723763 · Report as offensive
Profile Mumps [MM]
Volunteer tester
Avatar

Send message
Joined: 11 Feb 08
Posts: 4454
Credit: 100,893,853
RAC: 30
United States
Message 723772 - Posted: 9 Mar 2008, 16:38:46 UTC - in response to Message 723763.  

At the moment, we're getting through about 40,000 results an hour, or near enough 1,000,000 results (or 500,000 workunits) a day. That's a bit lower than usual, because almost all the tasks are long-running (very few shorties ATM): but it's also a bit higher than usual, because a bad splitter is causing some tasks to fail almost immediately.


So, does that average out to "right about usual?"

And it looks like the "Results Creation Rate" is about 85,000 per hour. Does that mean the splitter(s) aren't even splitting the tapes fast enough to keep up with realtime? (2,002,000 WU's created per day?) Or isn't that what that stat represents on the Server Status page? (Time to check the WIKI...)

I guess it's a good thing some data gets discarded due to the Radar Interference, and that the array isn't recording 24x7.
ID: 723772 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 723808 - Posted: 9 Mar 2008, 17:33:03 UTC - in response to Message 723772.  
Last modified: 9 Mar 2008, 17:36:45 UTC

At the moment, we're getting through about 40,000 results an hour, or near enough 1,000,000 results (or 500,000 workunits) a day. That's a bit lower than usual, because almost all the tasks are long-running (very few shorties ATM): but it's also a bit higher than usual, because a bad splitter is causing some tasks to fail almost immediately.


So, does that average out to "right about usual?"

Actually, it averages out to a bit below usual - the long-term figure is nearer 50,000 per hour. The lack of shorties is a bigger effect than the single rogue splitter.
And it looks like the "Results Creation Rate" is about 85,000 per hour. Does that mean the splitter(s) aren't even splitting the tapes fast enough to keep up with realtime? (2,002,000 WU's created per day?) Or isn't that what that stat represents on the Server Status page? (Time to check the WIKI...)

I guess it's a good thing some data gets discarded due to the Radar Interference, and that the array isn't recording 24x7.

At the moment, the slowest part of the system is us, the crunchers.

The splitters are only working part-time: they keep the 'Results ready to send' buffer roughly constant. As we draw off the results we need, the splitters create more work to replace it. But they could split faster, and Arecibo can record faster, than we're getting through it at the moment.
ID: 723808 · Report as offensive
Stephen

Send message
Joined: 1 Sep 06
Posts: 103
Credit: 11,155,194
RAC: 0
Message 723810 - Posted: 9 Mar 2008, 17:36:15 UTC - in response to Message 723808.  

The splitters are only working part-time: they keep the 'Results ready to send' buffer roughly constant. As we draw off the results we need, the splitters create more work to replace it. But they could split faster



Earlier this week they were splitting somewhere around 60 results/second. I've never seen them that high before. I'm assuming the new router is responsible for this increase in productivity.
ID: 723810 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 723956 - Posted: 9 Mar 2008, 23:22:05 UTC - in response to Message 723746.  

This post by Matt gives some indication of tape size. #612367

Post 645063 by Matt tells why the data which comes back from Arecibo on a 750 GB Hard disk is broken up into disk files of about 50 GB.

The multibeam recorder samples 14 channels at 2500000 samples per second, giving 126 Gigasamples per hour. Each sample is encoded as 2 bits, so there's 31.5 Gigabytes of data per hour of recording time. The data for each channel is recorded in blocks on disk. I don't know how much overhead is involved, but there's information about where the telescope was looking, time, and channel at least. For simplicity, I equate a 50 GB disk file to about 1.5 hours of recorded data.

The splitters advance by 85 seconds for each group of WUs they produce (so there's an overlap of 22.37 seconds). For one hour of recorded data, 14 channels give a total of about 151793 WUs and twice as many initial replication Tasks. The unknown is how many hours a day the ALFA receivers will be in use, the 300 GB from the press release linked earlier implies about 9.5 hours.
                                                               Joe
ID: 723956 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : why are the WU's recorded in 2007 still being sent.... what is the reason behind it..


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.