Who analyzes the results?


log in

Advanced search

Message boards : Number crunching : Who analyzes the results?

Author Message
Health Networks
Send message
Joined: 7 Mar 12
Posts: 16
Credit: 195,739
RAC: 0
United States
Message 1203786 - Posted: 8 Mar 2012, 18:05:01 UTC

Years ago I was involved with the team at Scripps who was running "FightAIDS@Home". since then it has branched off into something else, and they've made some discoveries. I was super excited to be involved as a volunteer and my expertise in building online communities was something they were looking for. A way to make the experience more fun for users, get them involved (stats, compete, etc).

Unfortunately, when I finally showed up at their offices, it became all too clear to me what I was afraid of in the first place. All the data being crunched by the unsuspecting participants at home was sitting in a growing pile of paper printouts. Some piles were literally 5 feet high. The lead programmer at the time told me point blank: "We have the data. We just have no way to analyze it. There's too much."

As a result I lost a lot of hope in the project and did not continue with them. I have always wondered - wouldn't it take another 300,000 people with computers to analyze the data that the first 300,000 "crunch" ?

I realize SETI may be unique in that we are doing the analyzing on the first passthrough - and maybe automated alerts are put in place to trigger "finds". So both crunching and analyzing are synonymous in this case. But with other projects - especially involving health issues - protein folding, etc. ... I cant imagine the manpower needed to take all that data and do something with it!

Profile tullio
Send message
Joined: 9 Apr 04
Posts: 3587
Credit: 362,356
RAC: 194
Italy
Message 1203809 - Posted: 8 Mar 2012, 18:58:15 UTC - in response to Message 1203786.
Last modified: 8 Mar 2012, 19:11:32 UTC

There is an article on "Nature" magazine of March 8 on text mining. Very interesting especially for pharmaceutical firm looking for genome sequences in published texts.I think it could be applied on every printed text.
Tullio
Nature
____________

JLConawayII
Send message
Joined: 2 Apr 02
Posts: 186
Credit: 2,762,491
RAC: 0
United States
Message 1203873 - Posted: 8 Mar 2012, 21:17:47 UTC
Last modified: 8 Mar 2012, 21:23:32 UTC

In most cases, analyzing the results takes nowhere near the same processing power as creating them. That's the whole point of doing it this way, because they don't have the supercomputing horsepower on their end to accomplish this level of calculation themselves. If you've made your project to where you actually need as much or more compute power to analyze the results as it takes to create them, you've done something horribly wrong.

Take GPUGrid, for example. The thousands of WUs that are sent out to end users each calculate a small part of a molecular dynamics model. This takes enormous horsepower because every atomic interaction must be calculated. However, once this gets sent back it gets put together into a completed model that is (relatively) simple for the team to analyze. The WUs just have to be constructed so that their system knows where to put them in the model when they get sent back. This is a pretty oversimplified description of the process, but I trust the point is made.
____________

Health Networks
Send message
Joined: 7 Mar 12
Posts: 16
Credit: 195,739
RAC: 0
United States
Message 1203908 - Posted: 8 Mar 2012, 23:19:58 UTC - in response to Message 1203873.

Okay.

So who is analyzing the numbers for SETI ?

Is there a room full of hundreds of people?

Kafo
Send message
Joined: 17 Dec 00
Posts: 14
Credit: 3,754,780
RAC: 1,974
Canada
Message 1203942 - Posted: 9 Mar 2012, 0:53:25 UTC

Seti@home has outsourced the analysis to an extra-terrestrial conglomerate located off planet

B-Man
Volunteer tester
Send message
Joined: 11 Feb 01
Posts: 253
Credit: 147,366
RAC: 0
United States
Message 1203952 - Posted: 9 Mar 2012, 1:07:23 UTC
Last modified: 9 Mar 2012, 1:18:36 UTC

NTPCkr the analysis program discussion (Thread starts back in 2008)
http://setiathome.berkeley.edu/forum_thread.php?id=46636#743622

The project had constructed a server (Synergy I believe) to run the data search algorithm on the returned data but ran into several problems.
1) several other servers died and the analysis server took over the functions of the dead or dying servers
2) the DB pounding slowed down the main project so much that both could not run at the same time so the project instituted for a time 3 day analysis breaks/week no one was happy.
3) the early version of the software returned many false positives so the programs need to be rewritten the last I heard. The problem being they are short on staff time fixing day to day problems to do this. Also to test the code the project may need to shut down the other parts of the project to support DB access of the new NTPCkr program.

Solution in progress and proposed
1) 2 new servers came in today that at the end of the day will speed up several parts of the backend systems to allow for NTPCkr to run at the same time as primary data processing.
2) the Synergy server should be freed up to return to NTPCkr duty when the new servers come online
3) A new RAID box is coming in with the new servers to allow for faster science DB access to allow simultaneous access by the Validators of primary data processing and NTPCkr use.
Link to the staff post on the new RAID and servers http://setiathome.berkeley.edu/forum_thread.php?id=67190#1203945
I know I have seen other discussions of proposed fixes but am not finding them right now. They may have been it the original NTPCkr thread at the end.
____________

B-Man
Volunteer tester
Send message
Joined: 11 Feb 01
Posts: 253
Credit: 147,366
RAC: 0
United States
Message 1214049 - Posted: 4 Apr 2012, 20:57:37 UTC

Matt just posted in the tech forum that the new builds for the analysis programs are again building on test machines and some of the futures are starting to work. The project does post updates on the backend stuff. The update is the April 3rd 2012 post.
____________

Profile Slavac
Volunteer tester
Avatar
Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1214050 - Posted: 4 Apr 2012, 21:00:43 UTC - in response to Message 1214049.

The plan currently is that GeorgeM will be taking on most if not all of the analytical work once it's properly configured. Prior to this Synergy was doing the lions share but has been retasked into other roles.

BTW GeorgeM is a Synergy clone with 128G of RAM vs Synergy's 96 and has updated CPU's. It's quite a beast.
____________


Executive Director GPU Users Group Inc. -
brad@gpuug.org

Profile Michel448a
Volunteer tester
Avatar
Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1214064 - Posted: 4 Apr 2012, 22:21:09 UTC
Last modified: 4 Apr 2012, 22:22:02 UTC

but... cant have a such analyzer that would analyze my own work before i send it to seti ? in the same way GeorgeM would do it ? ^^
____________

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,274
RAC: 0
Korea, North
Message 1214116 - Posted: 5 Apr 2012, 1:56:51 UTC - in response to Message 1214064.

You don't think some sneaky coding wonk wouldn't use your idea to intentionally add data to file and quickly make a mockery of the process.

Not that someone couldnt do that now. however the work being done also needs to have corroborating results. Examining the result before making sure yours are actually good is putting the cart before the horse
____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

Message boards : Number crunching : Who analyzes the results?

Copyright © 2014 University of California