Thursday Thoughts (Oct 02 2008)

Message boards : Technical News : Thursday Thoughts (Oct 02 2008)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 814116 - Posted: 2 Oct 2008, 21:22:11 UTC

Not much to report, really. We had a couple blips or brownouts which were minor and easily corrected. Mostly spending my day working on R&D type stuff (mysql replication, radar blanking, etc.) and data pipeline management - this included boxing up freshly reformatted drives to ship to Arecibo.

One thing in the works, maybe, is changing the workunit redundancy to effectively zero. There is already the mechanism in BOINC to "trust" hosts that continually return validated work. These hosts are then sent workunits that only they will have to process (not a redundant "wingman"). No validation is required (or actually possible) upon returning the result, and no waiting on others for credit, either. Of course, even trusted hosts will get occasional tests to prove they are still trustworthy. Plus there are quick tests we can do on the backend in lieu of "comparison validation." Other pros for doing this include using half the resources for the same amount of science (hooray!) and potentially getting through our backlog of data twice as fast.

The cons are mostly concerns. If we try to keep up with current demand for work we'd have to run twice as many splitters, which is impossible given our current resources (we'd at least need more cpus, more disks, and better disk i/o). Or we could split at today's rate and regularly run out of work, which might upset some people. If we do increase our splitter production rate and burn through our data, we will even more likely run out of work on a regular basis (since we can't pad fresh data with old data if we used up the old data).

Just some thoughts for now. We haven't really decided on anything yet.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 814116 · Report as offensive
Profile Blurf
Volunteer tester

Send message
Joined: 2 Sep 06
Posts: 8962
Credit: 12,678,685
RAC: 0
United States
Message 814118 - Posted: 2 Oct 2008, 21:23:10 UTC

Thanks, Matt.


ID: 814118 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 814146 - Posted: 2 Oct 2008, 22:30:07 UTC

I think I kind of prefer having one wingman. Having two wingmen was a little overkill on redundancy, but I've noticed on the full MB WUs, and more noticeably on the AP WUs that if a Core 2 and an AMD are paired up, aside from the crunch time, the claimed credit is either really close, or somewhere between 30-60 apart--the lower one being the granted credit for both.

I'm not entirely sure why such differences show up since the MB WUs seem to have a very consistent claim value for all systems, but I have noticed it is there.

For instance, my Opteron 2210s usually do 740-745 credits for the claimed, and I've seen some Core 2's claim 720-771. Sometimes I'm the one that takes a small hit in granted credit, sometimes the wingman gets hit.

I know the vast majority of the participants/volunteers get upset when data runs out, but having a 3-5 day cache helps the situation some, as well. Personally, I'd be fine if we had to go a few days without WUs, and I know some others feel the same way, though they would rather be crunching than not, but as gets mentioned during extended outages/failures, there are other projects that can be done interim.

As far as disk I/O and capacity, I support raid5, but 10 does have better I/O, the only downside is the decrease in usable storage. I was doing some reading about raid5 the other day and it turns out that the optimal number of disks is 9. Significant performance increases with a hardware controller up to 9, and then the increases become less and start to plateau. Also, last week I mentioned the Areca raid controller, and I have seen numerous reports that if you get the model with the SO-DIMM slot on it, and put a 1gb module in there, the burst and sustained read/write increase 20-50%.

Just more things to think about, as if there weren't enough already.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 814146 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 814150 - Posted: 2 Oct 2008, 22:38:00 UTC


. . . Thanks for Your Thoughts and Posr Matt


BOINC Wiki . . .

Science Status Page . . .
ID: 814150 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 814221 - Posted: 3 Oct 2008, 1:52:34 UTC - in response to Message 814116.  

Boxing up freshly reformatted drives to ship to Arecibo.

What good news. How long will it take drives to be returned filled with new data, how many tapes are there to be processed while the formatted drives get filled up again? thanks for the news updates and all the hard work Matt and team have a great weekend
Speedy
ID: 814221 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 814239 - Posted: 3 Oct 2008, 3:01:24 UTC

I am all for reducing the need for redundancy if it is not needed.

I would, however, worry about credit cheats and how to stop them. Perhaps, only those that have a recent enough version of BOINC to use FLOPS counting can be trusted? Can it be the case where anyone that has a significant disagreement over the credit request with a sufficiently recent version of BOINC to do FLOPS counting can also become untrusted for a while?

Then you just have to worry about how often you have to do a check, and how fast a host becomes trusted.


BOINC WIKI
ID: 814239 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30648
Credit: 53,134,872
RAC: 32
United States
Message 814241 - Posted: 3 Oct 2008, 3:11:56 UTC - in response to Message 814116.  

One thing in the works, maybe, is changing the workunit redundancy to effectively zero. There is already the mechanism in BOINC to "trust" hosts that continually return validated work. These hosts are then sent workunits that only they will have to process (not a redundant "wingman"). No validation is required (or actually possible) upon returning the result, and no waiting on others for credit, either. Of course, even trusted hosts will get occasional tests to prove they are still trustworthy. Plus there are quick tests we can do on the backend in lieu of "comparison validation." Other pros for doing this include using half the resources for the same amount of science (hooray!) and potentially getting through our backlog of data twice as fast.

The cons are mostly concerns. If we try to keep up with current demand for work we'd have to run twice as many splitters, which is impossible given our current resources (we'd at least need more cpus, more disks, and better disk i/o). Or we could split at today's rate and regularly run out of work, which might upset some people. If we do increase our splitter production rate and burn through our data, we will even more likely run out of work on a regular basis (since we can't pad fresh data with old data if we used up the old data).

Just some thoughts for now. We haven't really decided on anything yet.

- Matt


Just wondering are we able to crunch faster that the data is collected? I'm not asking about the burst speed because I'm sure the receiver records much faster than we process, but something more like an average if that is month to month or quarter to quarter.

Gary


ID: 814241 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 814259 - Posted: 3 Oct 2008, 4:32:35 UTC
Last modified: 3 Oct 2008, 4:33:02 UTC

Just my thoughts.......

Don't do it.

There are too many examples of hosts gone wacky to trust the science of your project to accept single reported results.

Even my rigs, much OC'd, but basically trustworthy, have been known to go off on walkabout once in a while and start doing strange things when the RAM gets a bit confused.......

I would much rather wait for a wingman to confirm my results that to have me report something that is not valid science and not have a cross check on what I have reported.

It is your project, and DO do whatever you are comfortable with and see fit to implement......

Again, just my thoughts.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 814259 · Report as offensive
H Elzinga
Volunteer tester

Send message
Joined: 20 Aug 99
Posts: 125
Credit: 8,277,116
RAC: 0
Netherlands
Message 814315 - Posted: 3 Oct 2008, 7:19:22 UTC - in response to Message 814259.  

Just my thoughts.......

Don't do it.

There are too many examples of hosts gone wacky to trust the science of your project to accept single reported results.

Even my rigs, much OC'd, but basically trustworthy, have been known to go off on walkabout once in a while and start doing strange things when the RAM gets a bit confused.......

I would much rather wait for a wingman to confirm my results that to have me report something that is not valid science and not have a cross check on what I have reported.

It is your project, and DO do whatever you are comfortable with and see fit to implement......

Again, just my thoughts.


Agreed.

Recently had a power failure.
One host took to much time to do a shutdown so the ups went out before.

2 results (dual CPU machine) showed a "checked but no consensus" state and were reissued.
After that the were trown out.

Both got reported a few hours after the power failure.
This hist takes a minimum of 20h to process a unte so thes had to be running at that moment.

ID: 814315 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24879
Credit: 3,081,182
RAC: 7
Ireland
Message 814334 - Posted: 3 Oct 2008, 9:09:11 UTC - in response to Message 814259.  

Just my thoughts.......

Don't do it.

There are too many examples of hosts gone wacky to trust the science of your project to accept single reported results.

Even my rigs, much OC'd, but basically trustworthy, have been known to go off on walkabout once in a while and start doing strange things when the RAM gets a bit confused.......

I would much rather wait for a wingman to confirm my results that to have me report something that is not valid science and not have a cross check on what I have reported.

It is your project, and DO do whatever you are comfortable with and see fit to implement......

Again, just my thoughts.



Definitely agree. Ramsey brought down my farm yesterday - that's all projects, not just ramsey. On a trustworthy basis, that incident would have killed the trust already built up. Don't Do It!
ID: 814334 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20283
Credit: 7,508,002
RAC: 20
United Kingdom
Message 814342 - Posted: 3 Oct 2008, 10:20:50 UTC - in response to Message 814334.  

Just my thoughts.......

Don't do it.

There are too many examples of hosts gone wacky to trust the science of your project to accept single reported results.

Even my rigs, much OC'd, but basically trustworthy, have been known to go off on walkabout once in a while...

Definitely agree. Ramsey brought down my farm yesterday - that's all projects, not just ramsey. On a trustworthy basis, that incident would have killed the trust already built up. Don't Do It!

I think trust is the keyword there...

It would be trivially easy to start inflating the credit returns when a client notices that it has a singular WU. It would then be a game of how brazenly the credits could be bent until the 'trust' is eventually lost...

Also, is there not scientific merit in that all the signals listed in the Master Science Database have been validated? Other studies may use that data if the data is known to be reliable. Universe background noise studies? Find the general direction of ET by noticing a rise in the noise floor??


Keep searchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 814342 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 814348 - Posted: 3 Oct 2008, 11:06:50 UTC - in response to Message 814116.  

One thing in the works, maybe, is changing the workunit redundancy to effectively zero. There is already the mechanism in BOINC to "trust" hosts that continually return validated work.

- Matt


I'd have to agree with the other guys I wouldn't "trust" any host, mine included. I would recommend we stick with the current arrangements.

Thanks for the update Matt, as always nice to know whats happening.
BOINC blog
ID: 814348 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 814431 - Posted: 3 Oct 2008, 14:48:57 UTC - in response to Message 814348.  

One thing in the works, maybe, is changing the workunit redundancy to effectively zero. There is already the mechanism in BOINC to "trust" hosts that continually return validated work.

- Matt


Don't do it! I can wait on my wingman....no problem. The work must be valid before insertion into the scientific data base or questions will arise.
Boinc....Boinc....Boinc....Boinc....
ID: 814431 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20283
Credit: 7,508,002
RAC: 20
United Kingdom
Message 814440 - Posted: 3 Oct 2008, 15:24:51 UTC - in response to Message 814431.  

Don't do it! I can wait on my wingman....no problem...

Hey! More importantly, there'd be no more forums posts for pending credit angst and wingman chasing! The forums would die!!... That just can't be done!

:-p

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 814440 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30648
Credit: 53,134,872
RAC: 32
United States
Message 814461 - Posted: 3 Oct 2008, 16:18:40 UTC

A story about a computer.

Some time back I had a machine that was very happy and crunching lots of work. Then one day I noticed a work unit that didn't validate. Strange. As time went on over a month it had more work units that didn't validate but it always had good work units too. I began to suspect the machine may have a problem and made sure backups were being made and were readable. Then one day the machine was dead. If not for wing men I wouldn't have had notice something was wrong.

Gary


ID: 814461 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 814482 - Posted: 3 Oct 2008, 17:34:41 UTC

Just using one cruncher sounds nice. Wonder what the error rate has been lately? For example can one be certain that 99 percent of all results crunched are correct?
ID: 814482 · Report as offensive
Profile KyleFL

Send message
Joined: 20 May 99
Posts: 17
Credit: 2,332,249
RAC: 0
Germany
Message 814487 - Posted: 3 Oct 2008, 17:46:11 UTC
Last modified: 3 Oct 2008, 17:47:08 UTC

I´ll have to aggree with my previous posters.

I feel more save with a wingman double checking a task that one of my hosts did crunch.

1. You get a confirmation of the result you provided is correct.
2. You can check your crunching speed against your wingman because he did crunch the exact same WU.


The crunching power will rise in the future, as newer and faster CPUs are coming into the market. Of corse it can be tempting to double the speed with a single switch over, but I personally think the risk would be greater then the benefit.


Cu KyleFL
ID: 814487 · Report as offensive
jim little

Send message
Joined: 3 Apr 99
Posts: 112
Credit: 915,934
RAC: 0
United States
Message 814538 - Posted: 3 Oct 2008, 20:52:20 UTC

I had a few, very few that the wingman had near zero results while mine were in teens or perhaps a hundred. Choosing the smaller may not be using correct data!

A difference of a single digit or two in a value in teens or hundreds, seems to be no great problem, but a coarse screening when really large disagreements occur would wave a red flag to me. Most of my wing agreements are within a one or two units difference which be no big deal. (I hope!)

Since I am running on two Mac's, one portable with dual processors, and the other a big Mac with two dual processors, both should give similar answers. Comparing with other Intel chips should also agree. Other brands of CPU chips might have a possible differences.

I quit using my older processors as the new ones are so much faster. And a single dual processor in the portable is a fast machine.

BTW the portable is more efficient that the big box. About 35 watts for two processors while the big one uses 245 watts for four. Both have power factor of 0.99 so the watts and true energy are nearly identical.

One of these days I want to try the energy use in a dual quad machine. No, I am not going to use my box, one new machine a year is almost too much, and I have use that one this spring for the portable.

Final thought. Most data units will be uninteresting. But it only takes one BINGO......

duke

ID: 814538 · Report as offensive
Profile [KWSN]John Galt 007
Volunteer tester
Avatar

Send message
Joined: 9 Nov 99
Posts: 2444
Credit: 25,086,197
RAC: 0
United States
Message 814539 - Posted: 3 Oct 2008, 20:57:58 UTC - in response to Message 814259.  

Just my thoughts.......

Don't do it.

There are too many examples of hosts gone wacky to trust the science of your project to accept single reported results.

Even my rigs, much OC'd, but basically trustworthy, have been known to go off on walkabout once in a while and start doing strange things when the RAM gets a bit confused.......

I would much rather wait for a wingman to confirm my results that to have me report something that is not valid science and not have a cross check on what I have reported.

It is your project, and DO do whatever you are comfortable with and see fit to implement......

Again, just my thoughts.


Like the time you reported 28,212,776,635,318,302,094,458,356,388,446,235,923,207,
741,744,470,434,104,453,978,968,420,464,012,487,728,615,460,126,025,888,922,495,
329,680,983,667,039,227,542,120,518,375,956,424,460,107,196,736,376,759,573,283,
400,799,476,702,873,620,993,300,697,266,428,815,162,638,403,940,600,522,997,760.00 credits...

Could you imagine THAT???

All in good humor, my friend...
Clk2HlpSetiCty:::PayIt4ward

ID: 814539 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 814550 - Posted: 3 Oct 2008, 21:37:11 UTC

I think that the Adaptive replication feature would be very worthwhile if the servers can deliver the additional work and effective sanity checks can be provided for uncompared results.

The developers did think about possible abusers, so in that mode a Workunit page display does not include the table showing which host(s) have been tasked to produce a result. A user won't know if his host's result will be compared or accepted without comparison, so there's little motivation to play puerile games. It's not perfect, but maybe good enough. OTOH, I'd like to see it modified so that after a user's host has uploaded and reported a result, that user can see full detail on the Workunit.

If the project set the criteria for which hosts are considered reliable such that about 20% of active hosts are excluded, even reliable hosts will be doing work requiring validation nearly 20% of the time. I think that would be more than adequate to catch hosts which develop a problem before they've done significant harm. Bear in mind that 99.999...% of "signals" in the master science database are actually random noise, we just hope it isn't 100%. Adaptive replication in effect just adds another small noise factor.

Any concern about credit claims could fairly easily be handled by the sanity checks which would replace actual comparative validation. That could be made as tight as necessary, even forcing a reissue to another host in questionable circumstances.
                                                               Joe
ID: 814550 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Technical News : Thursday Thoughts (Oct 02 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.