Reorganization of resources idea

Message boards : SETI@home Science : Reorganization of resources idea
Message board moderation

To post messages, you must log in.

AuthorMessage
Cheopis

Send message
Joined: 17 Sep 00
Posts: 156
Credit: 18,451,329
RAC: 0
United States
Message 1012895 - Posted: 6 Jul 2010, 19:32:24 UTC

I have a hard time imagining this has not been considered before, and I know the SETI@home team has been pretty busy recently, so this is a "maybe someday" request which might already be in the planning stages.

Perhaps allow home users to do splitting? I know this would require several different layers of implementation, but it seems as if the science the team is actually capable of performing is limited by the sheer amount of beancounting and non-analysis work that is required to keep new data coming in.

I'm envisioning the following implementation:

1) The same raw data is sent to several different users.

2) That data is then split onto work units independently by each user.

3) The resulting hashes are crosschecked against each other.

4) When enough identical readings are received, each user with a "good" dataset is assigned to analyze a certain % of the resulting data that they already have on their machines.


This implementation would remove splitting from the serverside, it would remove the need to send the independent workunits to at least some users, and these resources could be used for actual science.

How would one get the large initial data sets to users? I'm thinking that one might be able to find a respectable p2p provider out there that would be willing to work with SETI@Home & the BOINC project. That way the data would only need to be sent once, and the p2p network would propagate it to other SETI@Home users. I know that several MMO's use dedicated channels in p2p networks for their client downloads, so this would be nothing new for them.

The same p2p network could be used for crosschecking the workunits generated by splitting.
ID: 1012895 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20267
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1014836 - Posted: 11 Jul 2010, 23:39:40 UTC - in response to Message 1012895.  

... Perhaps allow home users to do splitting? ...

No can do unless you can include some completely robust checking to check for cheats.

... And the robust checking would require wasted duplicated effort in doing the splitting...

The present division of processing for s@h and Boinc makes good system sense.


More of a question is whether we could usefully do any other types of processing with the Arecibo data. Astropulse is perhaps one example answer...

Also bear in mind that the present Berkeley system bottleneck appears to be the database used for the science and for Boinc processing.

I have my own suspicion that the entire Boinc system is limited by the maximum possible (disk) IO speed for the database logging!

Keep searchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1014836 · Report as offensive
Cheopis

Send message
Joined: 17 Sep 00
Posts: 156
Credit: 18,451,329
RAC: 0
United States
Message 1014888 - Posted: 12 Jul 2010, 4:19:14 UTC - in response to Message 1014836.  
Last modified: 12 Jul 2010, 4:21:02 UTC

... Perhaps allow home users to do splitting? ...

No can do unless you can include some completely robust checking to check for cheats.

... And the robust checking would require wasted duplicated effort in doing the splitting...

<snip some interesting stuff>

Keep searchin',
Martin


Well, the Robust data checking would simply be to give the same data to multiple users with comparable work completion rates, compare results, and not allow clients to work on and report results on work units that are not "registered" as coming from one of the set of clients that generated identical data. Since I don't know how many work units are generated by a typical "split" data set, I can't say what would be an efficient number of clients per data set. I strongly suspect that each work unit is distributed several times in any case, under the current system. The level of clients required to be inline with the redundancy level required for a sufficiently robust data cross checking for client splitting might be perfectly in line with the number of clients that rework data to perform current levels of work unit verification.
ID: 1014888 · Report as offensive

Message boards : SETI@home Science : Reorganization of resources idea


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.