Small Word (Sep 20 2007)


log in

Advanced search

Message boards : Technical News : Small Word (Sep 20 2007)

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next
Author Message
Jesse Viviano
Send message
Joined: 27 Feb 00
Posts: 95
Credit: 474,230
RAC: 0
United States
Message 650411 - Posted: 28 Sep 2007, 20:03:01 UTC

How about this solution:

Whenever there is no new work to issue, generate results from the oldest work units that have not yet been completed and have not had any reissues due to no reply results or this new system. This way, someone who panics and reformats due to some rootkit infection (which, if written well enough requires nothing short of a reformat and possibly a BIOS flash to cure 100%) or malware that is so new the antivirus solution doesn't know how to deal with it, or some rude vacationer who does not abort his work won't cause the work unit to stall too long, and since the work unit already has been split, there is no need to wait for the splitter to come up with the result.

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 22,226,729
RAC: 4,471
United States
Message 650459 - Posted: 28 Sep 2007, 21:11:29 UTC

@n7rfa:
I agree that the outgoing traffic would be larger when the replication level is greater than 2. If it were set to 3, then there would be an immediate 50% increase in outgoing downloads and a 'waste' of 33% of the outgoing bandwidth. As wu's are validated, the one wu still not returned will need to be cancelled. This cancellation has little impact on network bandwidth. I think the analysis stops there, since the same logic applies to each wu. If a rep of 4 were selected, then we'd 'waste' 50% of the outgoing bandwidth. And so on.

On the plus side, I think the server side database requirements might reduce, because the persistance time of each wu in the database would go down. So while more results would be registered in the tables, they would be there for a shorter period of time, at least if we all ran boinc 5.10.20.


@Jesse:
You are proposing a bandaid for outliers. It could run without changing anything else running today. This is fine and could be implemented easily enough, it seems.

I suppose at the cost of increased complexity one could issue wu's to clients that have similar tpt's. Slow clients matched with slow clients, etc. But this would require considerably more data analysis on the servers than I suspect is desired by the sysadmins there. One positive thing is that it would reduce the average size of the databases, because it should trim the pending-credit distribution.

n7rfa
Volunteer tester
Avatar
Send message
Joined: 13 Apr 04
Posts: 370
Credit: 9,058,599
RAC: 0
United States
Message 650503 - Posted: 28 Sep 2007, 22:07:33 UTC - in response to Message 650459.

@n7rfa:
I agree that the outgoing traffic would be larger when the replication level is greater than 2. If it were set to 3, then there would be an immediate 50% increase in outgoing downloads and a 'waste' of 33% of the outgoing bandwidth. As wu's are validated, the one wu still not returned will need to be cancelled. This cancellation has little impact on network bandwidth. I think the analysis stops there, since the same logic applies to each wu. If a rep of 4 were selected, then we'd 'waste' 50% of the outgoing bandwidth. And so on.

On the plus side, I think the server side database requirements might reduce, because the persistance time of each wu in the database would go down. So while more results would be registered in the tables, they would be there for a shorter period of time, at least if we all ran boinc 5.10.20.


@Jesse:
You are proposing a bandaid for outliers. It could run without changing anything else running today. This is fine and could be implemented easily enough, it seems.

I suppose at the cost of increased complexity one could issue wu's to clients that have similar tpt's. Slow clients matched with slow clients, etc. But this would require considerably more data analysis on the servers than I suspect is desired by the sysadmins there. One positive thing is that it would reduce the average size of the databases, because it should trim the pending-credit distribution.

You're assuming that they will be cancelled immediately. They aren't cancelled until the client connects to the server and then only if the BOINC Client supports the cancellation and the client hasn't already started crunching it.

1/3 of my pending WUs are because the other Client hasn't connected and I'm only looking at the pending work for August. More Results being sent out will not resolve this aspect of the problem. Only a shorter Deadline will help in this case.

Now let's consider the slow clients. As long as they are matched with 2 fast clients, they will be continually cancelling and downloading work. Oh, they will be crunching, but they will be continually wasting network bandwidth as well.

Remember, the client downloads have been impacted recently by the number of spliters that are running. And it we hit another run of "short" WUs, there will be even more impact on the network response.

In my opinion, shortening the Deadline to 3-4 weeks is the best all around solution to the "problem".
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8551
Credit: 50,460,587
RAC: 51,411
United Kingdom
Message 650512 - Posted: 28 Sep 2007, 22:23:31 UTC

As Joe Segur has pointed out, the current formula gives a range of 13x between the shortest and the longest deadlines - 8.68 days to ~113 days. Yet my research suggested that the maximum variation to the rare extreme outliers was closer to 8x, and to commoner angle ranges nearer 6x.

That suggests it would be perfectly reasonable to compress the range of deadlines, keeping the shortest at 8.68 days, but bringing the longest down to say 50 days.

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 22,226,729
RAC: 4,471
United States
Message 650528 - Posted: 28 Sep 2007, 23:10:27 UTC

@n7rfa: Yes indeed you are right. I'm assuming continuous connection! So my suggestion is not as desirable as I had hoped. Regarding your comment about slow processors in my scheme: 1) statistically, the slow will not be matched with the fast until the fast are the dominant species and even then their caches will tend to grow so they look slow; and 2) when the slow simply cannot get credit because they are not fast, then account holders will either change projects or upgrade their hardware; in the latter case we all win.

General: I would not be in favor of decreasing the deadline times. It's a philosophical objection, I guess, which should count for something. Seti was supposed to be a noble, egalitarian project after all. Slow processors are a source of pride for some of us and we don't give a hoot about more objective arguments to the contrary.

Instead, as was suggested below more or less by Jesse, why not just run a process that finds the outlier wu's with a pending result that is way overdue for completion; then issue a redundant wu or two for cases that significantly exceed the norm for that client. The tpt data for each client is available, so why not use it at least for the outliers. Perhaps this is too big a programming challenge?

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 22,226,729
RAC: 4,471
United States
Message 650530 - Posted: 28 Sep 2007, 23:28:47 UTC

And so where has Matt been? I'm addicted to his updates on this board and he seems to have disappeared! Probably in some basement cutting a CD (or do you burn those? probably burn the bad ones)

Profile Mentor397
Avatar
Send message
Joined: 16 May 99
Posts: 17
Credit: 4,782,685
RAC: 400
United States
Message 650556 - Posted: 29 Sep 2007, 0:14:52 UTC

Okay, perhaps I haven't been reading this right, but let's pretend that 5 WU's are sent out and the quorum remains two. When two WU's are returned, the other three are cancelled, but what if they are being crunched as they are cancelled? I'm thinking that once again, the faster computers can run circles around the slower ones (which indeed happens already) AND prevent them from doing any work at all.

Eventually, the pending credit thing will even out. It cannot keep rising indefinately. As I write this, my pending credit is 2,182. It will even out, and numbers will rise. However, as we are still technically recovering from a long period of Seti blues, that equilibrium hasn't been tested.

I don't feel bad that my cache is eight days. I've seen others that have it set for the whole ten, and with the difficulties the project is experiencing, I'd like to have a cushion in case something goes wrong, either on my end or on Seti's. That being said, I don't crunch for anyone else and really don't care to. While I'm sure there is plenty of useful science out there, I'm with this project because Seti interests me far more than saving the planet or curing disease. (wow, that made me sound heartless!)

But, up until this past March, my computers have ALWAYS sucked. In fact, the computer before this one was an P3-450. Apparently computers don't like to be dropped - write that down. I knew way back when I signed up that Seti was going to be about numbers for some people. While EVERYONE wants to contribute more than anyone else, there are going to be people who want to tweak the rules so they get their numbers first.

But, think about it. How much pending credit is lost? AFAIK, none. It may take a while, it may take a LONG while, but eventually people get their credit they want. Your computer(s) still does the same amount of work every single day whether your pending credit is 30 or 3000, it just means that you have more out there that you haven't gotten credit for YET. The ONLY justification I can see for crunchers to reset the deadlines is so they get their credit sooner, thereby eliminating the slower computers.

Whew, sorry this is so long. My first post on this board, tee hee! I'm all a-twitter!


____________

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 15,675,685
RAC: 11,815
United States
Message 650570 - Posted: 29 Sep 2007, 0:46:31 UTC

My computers are far from the fastest in the world but they are all I can afford. They do the job and keep crunching along. http://setiathome.berkeley.edu/hosts_user.php?userid=258982 I keep them up to date with the latest BOINC and SETI apps so that they can at least try to keep up.

The only problem I have with the long completion dates are that at least two of my pending wingmen have gone AWOL without aborting their WUs or detaching from the project. One of them expires on the 5th of October. He got the WUs on Aug. 14 and has not been heard from since. Another one doesn't expire until sometime in Nov. That leaves the WU hanging for two months before it even goes out to someone else to complete.

I really hope someone comes up with some way to identify people like this so that the work could progress the way it is supposed to instead of just sitting there waiting on an answer that is never coming.
____________


PROUD MEMBER OF Team Starfire World BOINC

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13620
Credit: 30,571,999
RAC: 21,049
United States
Message 650580 - Posted: 29 Sep 2007, 0:57:50 UTC - in response to Message 650556.

Okay, perhaps I haven't been reading this right, but let's pretend that 5 WU's are sent out and the quorum remains two. When two WU's are returned, the other three are cancelled, but what if they are being crunched as they are cancelled? I'm thinking that once again, the faster computers can run circles around the slower ones (which indeed happens already) AND prevent them from doing any work at all.


As the system is currently set up, after a quorum has been met, the servers will attempt to cancel the workunits sent out to all other hosts. If the system is currently crunching a workunit that has been marked to be canceled, the system will allow the host to keep crunching until completion, meaning that the scientific value is zilch since the quorum are already met, so the host is now crunching purely for the credit since the server will still grant credit for work done before the WU deadline.
____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13620
Credit: 30,571,999
RAC: 21,049
United States
Message 650636 - Posted: 29 Sep 2007, 1:27:24 UTC - in response to Message 650598.

Okay, perhaps I haven't been reading this right, but let's pretend that 5 WU's are sent out and the quorum remains two. When two WU's are returned, the other three are cancelled, but what if they are being crunched as they are cancelled? I'm thinking that once again, the faster computers can run circles around the slower ones (which indeed happens already) AND prevent them from doing any work at all.


As the system is currently set up, after a quorum has been met, the servers will attempt to cancel the workunits sent out to all other hosts. If the system is currently crunching a workunit that has been marked to be canceled, the system will allow the host to keep crunching until completion, meaning that the scientific value is zilch since the quorum are already met, so the host is now crunching purely for the credit since the server will still grant credit for work done before the WU deadline.


And your point is......?
The system is working perfectly. Hosts were crunching WUs for nothing other than cobblestones much more before the change to an initial issue of 2. Now most of the crunching being done, even by dog-slow rigs, is valid science.

As the kitties have written, as it shall be done.


My point was to answer Mentor's question about what happens when a quorum is met and a workunit is scheduled for deletion and a host is currently crunching that workunit.

I thought my point was quite obvious being that I even quoted the portion of text that I was replying to.
____________

Profile Heflin
Send message
Joined: 22 Sep 99
Posts: 81
Credit: 640,242
RAC: 0
United States
Message 650785 - Posted: 29 Sep 2007, 6:37:50 UTC

I guess I'm ABNORMAL.

[B}I **LIKE** Pending credit![/B] It is like ... my computer's have worked faster than my wingmen's. Or I'm gonna get something for nothing in the near future. Kind of like Christmas morning: I'm up and see the presents but have not opened them yet.

Neither really accurate, but a more positive "feeling"

Maybe we should have a competition to see who can get the HIGHEST Pending Credit Score? Maybe a page to list folks via Pending Credits?

Sure, actual credit may be different but they still don't have any cash value.
____________
SETI@home since 1999
"Set it, and Forget it!"

Profile Andy Lee Robinson
Avatar
Send message
Joined: 8 Dec 05
Posts: 615
Credit: 42,155,936
RAC: 38,981
Hungary
Message 650796 - Posted: 29 Sep 2007, 8:41:04 UTC - in response to Message 650196.

(for those that don't remember, anything with an AR of 1.2 or above would have a [relatively] short crunching time and a very short deadline [some in the one week range], a cache full of these WU's would put the computer they were assigned to into EDF [and "no New Tasks"] for the duration of processing the cache!)


Solution is simple:
Don't eat what you can't chew - don't accept short WUs if at end of cache.
Don't get given what you can't eat - server doesn't give short WUs if cache is too big.
Spit out what you can't eat. - return WUs for someone else if you can't meet the deadline, or disappear.

No evil nasty horrible EDF required! What's the problem with EDF anyway?
You've got enough on your plate, and you want more?
No wonder the West has a weight problem!

Scrooge McDuck
Avatar
Send message
Joined: 26 Nov 99
Posts: 21
Credit: 1,184,954
RAC: 0
Germany
Message 650829 - Posted: 29 Sep 2007, 11:06:13 UTC - in response to Message 650556.
Last modified: 29 Sep 2007, 11:26:30 UTC

I want to thank Mentor for his very good post. It sumarizes the whole discussion perfectly. So one may beat on those, who get large amounts of WUs and then simply go away, never be seen again. But it's very important for the project that a normal user, not having the state-of-the-art system is able to contribute, even if his system runs only some hours a day. He doesn't want to crunch simply for the credit, if the quorum was already reached for his WU. So I will never buy or run a fast host simply for SETI. But it's the perfect usage for the idle cycles on my normal desktop machines. If hard deadlines would prevent me from doing so, I would leave the project and I think a lot of other people would do the same. This may not reduce the computing power significantly, but every user tells his friends, buddies... about the project.

So it's a simple question: What audience is Seti@home adressing? Is it only a group of maybe some 10K power users, using a farm of new systems in their company and running 24/7? Seti@home grew to its current state by millions using the simple and nice Seti Classic, without deadlines and a fantastic team at Berkeley, knowing their work is appreciated by all those people worldwide. I think, we should keep it this way and the power users will get the credits some days or weeks later.
____________

Profile Zentrallabor
Send message
Joined: 8 Jan 00
Posts: 6
Credit: 70,525
RAC: 0
Germany
Message 650837 - Posted: 29 Sep 2007, 12:00:49 UTC - in response to Message 650570.
Last modified: 29 Sep 2007, 12:18:20 UTC

@perryjay

..
The only problem I have with the long completion dates are that at least two of my pending wingmen have gone AWOL without aborting their WUs or detaching from the project. One of them expires on the 5th of October. He got the WUs on Aug. 14 and has not been heard from since. Another one doesn't expire until sometime in Nov. That leaves the WU hanging for two months before it even goes out to someone else to complete.

I really hope someone comes up with some way to identify people like this so that the work could progress the way it is supposed to instead of just sitting there waiting on an answer that is never coming.


I'm far from coming up with a solution to your problem, but the reason (at least for Josh's machine not sending back the results to WUs from Aug-14) seems simple: IMO he experimented with WinVista - had problems and now runs his machines again with WinXP ;)
Best would have been, if he cancelled the downloaded results before abandoning his WinVista-experience, but now it may be too late. :-(

I also think that such abandoned machines/clients are most of the problem some users are getting annoyed by their pending credits (not because of credits but because of lagging results which blow up the database). This is not a specific WinVista-problem but a common problem with users trying to run BOINC or new/unknown BOINC-projects. A lot of people seem to try out CPDN - and then cancel the project because of the long running-times of a WU (or the requested HD-space).

Did anyone hear from the team if the long duration for such partly abandoned WUs is _really_ an issue for the S@H-servers? Maybe the servers can get along easily with 2megs of open results. ;-)

Right now my client was able to DL 3 WUs only after trying (only once manually triggered) for more than 30 minutes - such multi-requests are also something causing unneccessary load on the servers and network. When looking at the server-stats I think it's a problem of WU-creation: some weeks ago only 3-4 mb-splitters worked with 10-20 WUs/second creation, now 6 mb-splitters seem to create fewer WUs??

Regards, Chris

P.S. Thanks to OzzFan and the others for their replies to my earlier post. If there were good reasons for abandoning RRI, I won't waste another minute on it (it's a smaller problem of the "pending credits" (better: unvalidated results) - results will only stay a bit (at most 24h) longer unvalidated).
____________

DJStarfox
Send message
Joined: 23 May 01
Posts: 1043
Credit: 548,109
RAC: 125
United States
Message 650884 - Posted: 29 Sep 2007, 15:34:16 UTC - in response to Message 650837.

A lot of people seem to try out CPDN - and then cancel the project because of the long running-times of a WU (or the requested HD-space).

I'm guessing you're right on this issue. Although attrition is much lower percent of users on SETI vs. CPDN, I think this does explain it well. I too have been in that situation; after you uninstall BOINC, it's too late to abort the WU.

I suppose a super-user would log into the website and manually about a WU if such a form/button existed on their results page. That may make the BOINC system as a whole more complicated though.

Right now my client was able to DL 3 WUs only after trying (only once manually triggered) for more than 30 minutes - such multi-requests are also something causing unneccessary load on the servers and network. When looking at the server-stats I think it's a problem of WU-creation: some weeks ago only 3-4 mb-splitters worked with 10-20 WUs/second creation, now 6 mb-splitters seem to create fewer WUs??


Yes, the problem this year all along has been I/O contention. More splitters run but each of them run slower. They are competing for the same disk. I'm sure Matt has been playing with the number of splitters to find the best configuration overall.

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24544
Credit: 522,500
RAC: 102
United States
Message 650977 - Posted: 29 Sep 2007, 18:41:32 UTC - in response to Message 650884.

A lot of people seem to try out CPDN - and then cancel the project because of the long running-times of a WU (or the requested HD-space).

I'm guessing you're right on this issue. Although attrition is much lower percent of users on SETI vs. CPDN, I think this does explain it well. I too have been in that situation; after you uninstall BOINC, it's too late to abort the WU.

I suppose a super-user would log into the website and manually about a WU if such a form/button existed on their results page. That may make the BOINC system as a whole more complicated though.

Right now my client was able to DL 3 WUs only after trying (only once manually triggered) for more than 30 minutes - such multi-requests are also something causing unneccessary load on the servers and network. When looking at the server-stats I think it's a problem of WU-creation: some weeks ago only 3-4 mb-splitters worked with 10-20 WUs/second creation, now 6 mb-splitters seem to create fewer WUs??


Yes, the problem this year all along has been I/O contention. More splitters run but each of them run slower. They are competing for the same disk. I'm sure Matt has been playing with the number of splitters to find the best configuration overall.

Matt has also been palying games with moving some of the HD capacity to different volumes.
____________


BOINC WIKI

Invisible Man
Send message
Joined: 24 Jun 01
Posts: 22
Credit: 1,129,336
RAC: 0
United Kingdom
Message 650983 - Posted: 29 Sep 2007, 18:59:58 UTC
Last modified: 29 Sep 2007, 19:01:05 UTC

Help - Come back Matt, from wherever you are. Reason: project is down.

Viv.
____________

Profile edjcox
Avatar
Send message
Joined: 20 May 99
Posts: 65
Credit: 3,999,536
RAC: 415
United States
Message 651291 - Posted: 30 Sep 2007, 2:19:52 UTC

Noted that LIDOS got a lot of press recently about the discovery of a series of repetetive emanations from the vicinity of a Pulsar cluster. They are of course capitalizing on this and turning a "discovery" into a request and justification for more funding..


So what if anything does SETI have in it's database about that space region? Or are we once more no able to vie that area of space due to Aricibo's narrow sliver of the sky..

Anyway's would like to hear from some "experts" and what they make of the report.
____________
Never engage stupid people at their level, they then have the home court advantage.....

Invisible Man
Send message
Joined: 24 Jun 01
Posts: 22
Credit: 1,129,336
RAC: 0
United Kingdom
Message 651391 - Posted: 30 Sep 2007, 8:25:02 UTC

Many thanks somebody. All the red status blocks are now green again. Well done.

Viv.
____________

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · Next

Message boards : Technical News : Small Word (Sep 20 2007)

Copyright © 2014 University of California