The Database

Message boards : SETI@home Science : The Database
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile cRunchy
Volunteer moderator
Avatar

Send message
Joined: 3 Apr 99
Posts: 3550
Credit: 1,920,030
RAC: 3
United Kingdom
Message 2038754 - Posted: 18 Mar 2020, 19:28:41 UTC

There are a number of threads around the 'hibernation' of SETI@Home so I would not like to get into feelings here.

I would however like to understand the idea of the end product of our WUs as a database and how the next step works.

It has been suggested that, for example, the database can not be divided into parts and processed \ tested against algorithms (models.) (EG: Can not be sent out to us to process as units to work on.)

I would in lay-person's terms like to understand this better.

Can someone with knowledge help us understand 'the database' and how it is structured with our WUs and how models (EG: 'that sounds like ET' and 'that is ET speaking to us.' ..) might be applied.

Other interesting questions are welcomed.

.
ID: 2038754 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 20764
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2038760 - Posted: 18 Mar 2020, 20:04:57 UTC

The work we did was to filter out the "dross", leaving potential signals. These potential signals were added to the big database with the intent at looking for repeating signals, those which came from the same place in the sky, at the same frequency and of the same characteristic over a period of time. Initially it was thought that this would be possible in near-real-time, but a few things conspired against that approach, including database size, rate data was being added, and that the available hardware was a long way short of the task.
Years went by, the database grew even bigger, I'd guess by about an order of magnitude, maybe more.
A small sample of the database was selected to do a trial on, using more modern techniques, a super computer - this approach is called "Nebula", led by David Anderson, supported By Eric Korpela. After about three years this is starting to show promise. A few issues with the way the data was stored in the database have come to light which are making the job harder than it could be, but these are surmountable. One thing that has become apparent is that the rate the database is growing would make it very difficult to do all the correlations required and not miss any "new" data. Among the correlations required are location (fairly obvious), frequency, red-shift (despite the fact we are looking at GHz frequencies not light this is still an appropriate name), Dopler shift (slightly different to red-shift, but sort of related).
Because there was no attempt to do a sort at the time of data being added to the database you more or less have to look at every other signal for each signal (or group of signals) turn - Nebula's first task is to do that sort then try the various correlation tools to see what ties together - on billions of signal combinations!

That's a very rough description of the why and what.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2038760 · Report as offensive     Reply Quote
Profile William Rothamel
Avatar

Send message
Joined: 25 Oct 06
Posts: 3756
Credit: 1,999,735
RAC: 4
United States
Message 2038818 - Posted: 19 Mar 2020, 1:03:31 UTC - in response to Message 2038760.  

Statistical analysis should have rejected the vast majority of the data. I still claim that with graphic chips screening the data coming in off of the antenna this could have been accomplished in near real time.
ID: 2038818 · Report as offensive     Reply Quote
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 29145
Credit: 53,134,872
RAC: 32
United States
Message 2038820 - Posted: 19 Mar 2020, 1:14:48 UTC - in response to Message 2038818.  

Statistical analysis should have rejected the vast majority of the data. I still claim that with graphic chips screening the data coming in off of the antenna this could have been accomplished in near real time.

GPU didn't exist 21 years ago when collection started. Serendip is doing it in real time across a much broader frequency range, but it is nearly deaf compared to the deep look we give.
ID: 2038820 · Report as offensive     Reply Quote
Profile cRunchy
Volunteer moderator
Avatar

Send message
Joined: 3 Apr 99
Posts: 3550
Credit: 1,920,030
RAC: 3
United Kingdom
Message 2038843 - Posted: 19 Mar 2020, 2:39:14 UTC - in response to Message 2038820.  

GPU's of sorts did exist as they are just dedicated CPU's or maths processors.

TV cards with graphics processors were around.

Analog processing that we so much try to emulate with digital and AI today was certainly around.

Honeywell (a name we hear little of today) is touting it's break through in quantum processing. Maybe a nod from Berkeley might get the SETI@Home DB processing posed as a good test project.

I'm still unsure as to why we can't break up the database and ship out WU's to run algorithms against.

BreakThrough offers out parts of it's database and some help for developers to join in so it must be possible somehow.

I guess things and perspectives and technologies have changed over the years but (in theory) the data in the base has not. We just have to apply our new tools to these old chunks of data.

(I assume the data in the database relates to chunks of the sky scanned and has not been fouled some way.)
ID: 2038843 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 20764
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2039072 - Posted: 20 Mar 2020, 6:57:14 UTC - in response to Message 2038818.  

Highly unlikley - think how long it takes a modern GPU to process a few seconds worth of data, which is actually only a narrow frequency sample from the live data stream from a single channel from a single multi-channel receiver. My best guess is that it is about 30 times too slow in the time domain and somewhat less than 1% of the frequency domain.
NtiPicker would certainly be a lot faster today than it was back when first tried as the hardware available then was so much slower, and may have just about worked on the filtered data stream back then, but what it lacked was the historic data, which took years to build up. Now its just a question of sifting through and correlating data.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2039072 · Report as offensive     Reply Quote
Profile William Rothamel
Avatar

Send message
Joined: 25 Oct 06
Posts: 3756
Credit: 1,999,735
RAC: 4
United States
Message 2039111 - Posted: 20 Mar 2020, 12:22:01 UTC - in response to Message 2039072.  

So whats in this so called data base. I presume that these were work units that repeated on 3 computations of the same unit showing a strong signal that was above the noise.
I don't know whether any other criteria were applied to decide to store it. Were they compared to a "Clutter Map" of known emissions at certain frequencies and locations. Were they cross-correlated with any type of of square or sawtooth wave ? Was there any search for simple modulation : on-off, frequency or analog.

How is the Allen array handling their data ?
ID: 2039111 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 20764
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2039751 - Posted: 22 Mar 2020, 20:33:43 UTC

"So called database" - a few years ago it was one of the largest non-commercial databases in the world, and bigger than most banks!
The database holds millions, if not billions, of the results culled by us from the data we ploughed through over the last twenty years.
Each entry comprises, among other things, the date, time, location, frequency, signal type, signal strength, but they are in a somewhat random order. What has to be done is sort this by location and frequency, correcting for "red-shift" and a few more bits to see if there are "pairs" of data have been collected at different times. These times need to differ by months or years.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2039751 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 20764
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2039752 - Posted: 22 Mar 2020, 20:37:40 UTC

The Allen telescope, like Aerecibo does a lot of different sorts of analysis apart from the bit we've been doing.
SETI@Home has only ever worked with a very small part of the data collected, this is down to design decissions done during the very early development days.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2039752 · Report as offensive     Reply Quote

Message boards : SETI@home Science : The Database


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.