WU Deadlines

留言板 : Number crunching : WU Deadlines
留言板合理

To post messages, you must log in.

1 · 2 · 后

作者消息
Profile Jord
志愿者测试人员
Avatar

发送消息
已加入:9 Jun 99
贴子:15175
积分:4,362,181
近期平均积分:3
Netherlands
消息 146124 - 发表于:2 Aug 2005, 15:49:46 UTC - 回复消息 146115.  

Even with short connection times hosts receive more units than can be crunched in that time period, Hosts are doing more than one project, with selectable variables on per-centage allocation and selectable switching times etc.

So...
Take out multiple units per host. That's no longer necessary.
Take out crunching for multiple projects. That's no longer necessary.

See the end result: Seti Classic with a new cruncher.

It's all he may want. :)

ID: 146124 · 举报违规帖子
W-K 666 Project Donor
志愿者测试人员

发送消息
已加入:18 May 99
贴子:13932
积分:40,757,560
近期平均积分:67
United Kingdom
消息 146115 - 发表于:2 Aug 2005, 15:32:29 UTC

@PhonAcq

I don't believe you have thought your suggestion thru before posting it.

Even with short connection times hosts receive more units than can be crunched in that time period, Hosts are doing more than one project, with selectable variables on per-centage allocation and selectable switching times etc. As I said before I've had units granted credit quicker than a lot of hosts can crunch one unit. The data needed to be held about each host would far outweigh and advantages gained. If you are sent more than one unit there is no garauntee that you receive them in the same order etc etc.

There are area's even in first world countries where broadband is impossible. A little of the work I do sometimes requires me to go to places where there is no public commumications what so ever. The company I work for is contracted to Trinity House the UK organisation responsible for lighthouses etc.

It is just as feasable looking back at the units I received between the 7th and the 17th of July for me to suggest that they send out 5 copies of each unit as I had at least two where 2 hosts failed to meat the deadline, and so they are still in the database.

Andy
ID: 146115 · 举报违规帖子
Profile Ace Casino
Avatar

发送消息
已加入:5 Feb 03
贴子:285
积分:29,750,804
近期平均积分:15
United States
消息 146112 - 发表于:2 Aug 2005, 15:26:24 UTC

PhonAcq,
In one breath you say why not extend the deadline.
In the next breath you say the 4th person reporting should not get full or possibly any credit.
If seti sends out 4 and regardless of the time it takes, everyone should get credit.
1. If only the 1st 3 to report got credit (or recognition for there work) there
would be alot of people leaving feeling there contribution was a waist of time.
2. It would mean to join this project you would need a new computer every 6-12
months. (thats how fast a new computer becomes obsolete) So the fastest would
get in first and tough luck for the people with slower computers.
3. There are alot of people like myself that keep an old computer running for the
fun and science of it. We will never be the top cruncher nor do we care to be.
I don't optimize. I just let the computer run and crunch.

TAke a deep breath and keep on crunch'n

[img][/img]
ID: 146112 · 举报违规帖子
Profile Paul D. Buck
志愿者测试人员

发送消息
已加入:19 Jul 00
贴子:3898
积分:1,158,042
近期平均积分:0
United States
消息 146088 - 发表于:2 Aug 2005, 14:24:06 UTC

@ML1

If I ws that annoyed with something I would be doing something else with my time.

@PhonAcq

One of the other items in the works is that the scheduler should be trying to send work to computers that have roughly identical turnaround times. But, again, this has no relationship to the quorum size or deadline length.

Once again the point is to try to get the work unit processed and returned in the shortest amount of time so that the database has the fewest possible records stored.

The need for the hand-shaking is to be sure that the client and server both keep accurate track of the work to be processed by the client and that the client in fact does know about the work it is supposed to be doing. The primary cause of the errors has to do with the communication exchange being interrupted and the one side "knowing" something that the other side never heard. Loss on the server to client side leads to "ghost" work on the server. Loss on the client to server leads to work reported that is never recorded. Again, this is addressing a different set of problems and has nothing to do with deadllines or quorum size.
ID: 146088 · 举报违规帖子
Profile ML1
志愿者负责人
志愿者测试人员

发送消息
已加入:25 Nov 01
贴子:10633
积分:7,508,002
近期平均积分:20
United Kingdom
消息 146043 - 发表于:2 Aug 2005, 11:19:30 UTC - 回复消息 146002.  

There are those that are so hostile at times I am amazed that they participate in this project at all.

Is it all hostility?

Or just overzealous passion and frustration?

(And sometimes a little ignorance and impatience...)

Cheers,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 146043 · 举报违规帖子
PhonAcq

发送消息
已加入:14 Apr 01
贴子:1656
积分:30,658,217
近期平均积分:1
United States
消息 146041 - 发表于:2 Aug 2005, 11:14:46 UTC

Randy, you are 100% correct and is the same for all of us. Getting credit makes the project fun. I think it would be more fun if that credit was also impactful. That is, if you are also one of the first three reporting valid, self-consistent results. So I think boinc should send work units to clients that have approximately the same successful turn around times, so that the clients would return about the same time and the need for computing a wasteful fourth wu is minimized. One way to do this is to keep a score for each client, which is the time the client will respond with a valid result with a 95% (two standard deviations) probability. Perhaps with such an approach the need for subdquent handshaking (see below) would go away. Cheers.
May this Farce be with You
ID: 146041 · 举报违规帖子
Profile Ace Casino
Avatar

发送消息
已加入:5 Feb 03
贴子:285
积分:29,750,804
近期平均积分:15
United States
消息 146037 - 发表于:2 Aug 2005, 10:27:18 UTC

I think I might be a good example for those of you wondering if you will get credits when others have already been granted credits.

In this WU the other 3 have already been granted credit a week ago:
http://setiweb.ssl.berkeley.edu/workunit.php?wuid=21585872
I'm still crunching should report today and "will" get credit.

Happens all the time to me. In this WU you will see that the other 3 reported on the 24th of July and where granted credit. I came along today 9 days later and was granted credit.
http://setiweb.ssl.berkeley.edu/workunit.php?wuid=21488447

As a couple of people said above; as long as you report before the deadline you will get credit. Even if the others where already granted credit.
At least this is my experience.
[img][/img]
ID: 146037 · 举报违规帖子
Profile Paul D. Buck
志愿者测试人员

发送消息
已加入:19 Jul 00
贴子:3898
积分:1,158,042
近期平均积分:0
United States
消息 146002 - 发表于:2 Aug 2005, 6:32:16 UTC

There is no magic here. The quorum size of three makes sense. It also makes sense in the context of this project to attempt to turn the work around as quickly as possible. In that light, because there is a need for three, and there are a significant percentage of work units that lose one result, the decision was made to issue a fourth right away. One of the things that is a given in this DC areana is that there is enough capacity to make this a viable and practical choice.

You are correct that there are good ideas, some posted here, some only in the mailing lists, some in the bug base. All of them are known to the project types and they choose those that they find practical and useful. The only "weakness" is that we have to speculate as to the reasons for the choices. But, it is not our project. It is there's, and it is up to them to make the choices. One of those choices is to limit communication to the participant base. I don't like it, but can certainly understand why. There are those that are so hostile at times I am amazed that they participate in this project at all.

You do mix up signals with work units, but that is not terribly important. The choice of 3 results is one of those deceptively simple choices. But, truth be told, 3 is good enough for a project that has little probability of accomplishing anything and there is a small need to say, yes, the analysis is valid.

The changes in the hand-shaking between the clients and the servers has no connection to the choice of quorum size, but only to address an issue that in the early design stages was thought to be insignificant. Experience has shown that there are enough errors in the process that a more complicated mechanism is called for. Like all systems, BOINC was developed with some ideas on what would work and how much complication needed to be added to the system. It was felt that the the process of data interchange between the server an client would be good enough with out additional complexity and so that is how it was built.

Now we know that those assumptions are not good enough. So, complexity is being added to address the issues. For those that don't like the development team this is added "proof" that they don't know what they are doing. Yet, this is a logic choice, make the system, field it, find out what does not work, fix that and move on. When I taught comptuter science classes I always pointed out that when developing code you never "optimize" the code as you are writing it. You write it to be as clean, clear, logical, and structured as possible. only after the system is operational do you look for the places it is slow. THEY WILL NEVER BE WHERE YOU EXPECT THEM.

Anyway ... food for thought ...
ID: 146002 · 举报违规帖子
PhonAcq

发送消息
已加入:14 Apr 01
贴子:1656
积分:30,658,217
近期平均积分:1
United States
消息 145905 - 发表于:2 Aug 2005, 1:04:34 UTC
最近的修改日期:2 Aug 2005, 1:05:26 UTC

I've noticed a couple of replies in John's vein, encouraging me to editorialize (but not personalize): This project might be more productive if we spent more time asking how one might do it better and less time accepting things as they are. Bad ideas should be exposed as such, but rebuttal by referencing a 'higher' authority is rarely credible in science. Conversely, with a new idea someone may actually be able to contribute more than hot transistors to the project. (Of course, the higher authority needs to be part of the discussion, or at least be informed of developments somehow.)

So from this prespective, why is it that the project administrators have deemed it useful to take 4 signals after there are 3 consistent ones. If true, then why not always take 4, and not 3. There may be a perfectly good reason, but redundancy would seem to be a weak one (given 3 have already been accepted). Can someone point to the mathematics???

The other point is that the algorithm should be designed to be self adapting and robust, which minimizes the need for ad hoc, artificial time limits. I hope the handshake idea is really going to happen, and is not a rumor, because it will help with an efficient algorithm.

May this Farce be with You
ID: 145905 · 举报违规帖子
John McLeod VII
志愿者开发人员
志愿者测试人员
Avatar

发送消息
已加入:15 Jul 99
贴子:24806
积分:790,712
近期平均积分:0
United States
消息 145897 - 发表于:2 Aug 2005, 0:25:28 UTC - 回复消息 145690.  

THe reason for the 4th issue was to handle those occasions where one of the three does not return the result. In this case you now have a work unit and two results in the database for 4 weeks. Even with the 4 th result pending, and it is late, should return within the two week original deadline. If it does not, the work is purged because we did form a Quorum of Results.

So, normal course of events the work is held in the database for only two weeks and no re-issue is required.

The hand shaking you talk to in the end of your post is a part of the BOINC Daemon and the scheduler code that is being worked on right now. It is being tested on Einstein@Home ...


Good to know boinc's algorithms are going to be improved. And I guess development of the hand shake is an indication my position is more or less accepted. In general, a well engineered system will only use crowbars for the unexpected, not as a matter of course. So the two week deadline will remain ad hoc in my mind, and I will hope the improved boinc will be more self adaptive.

If BOINC hands you work, and you get the work done correctly and on time, you should get credit for it, as the project administrators have deemed it useful to the project when it was sent out. If it is late, and it becomes part of the quorum anyway, then you still get credit because it happened to be useful even though it was late. If it is late, and the quorum is already formed, then you are out of luck as you did not meet your end of the agreement (getting the work back on time) and the work was not needed at that point. So, if the project decides to send 4 copies of a WU then all 4 get credit if they get returned on time.


BOINC WIKI
ID: 145897 · 举报违规帖子
PhonAcq

发送消息
已加入:14 Apr 01
贴子:1656
积分:30,658,217
近期平均积分:1
United States
消息 145690 - 发表于:1 Aug 2005, 17:07:10 UTC - 回复消息 145633.  

THe reason for the 4th issue was to handle those occasions where one of the three does not return the result. In this case you now have a work unit and two results in the database for 4 weeks. Even with the 4 th result pending, and it is late, should return within the two week original deadline. If it does not, the work is purged because we did form a Quorum of Results.

So, normal course of events the work is held in the database for only two weeks and no re-issue is required.

The hand shaking you talk to in the end of your post is a part of the BOINC Daemon and the scheduler code that is being worked on right now. It is being tested on Einstein@Home ...


Good to know boinc's algorithms are going to be improved. And I guess development of the hand shake is an indication my position is more or less accepted. In general, a well engineered system will only use crowbars for the unexpected, not as a matter of course. So the two week deadline will remain ad hoc in my mind, and I will hope the improved boinc will be more self adaptive.
May this Farce be with You
ID: 145690 · 举报违规帖子
W-K 666 Project Donor
志愿者测试人员

发送消息
已加入:18 May 99
贴子:13932
积分:40,757,560
近期平均积分:67
United Kingdom
消息 145672 - 发表于:1 Aug 2005, 16:35:49 UTC - 回复消息 145666.  

P.S. VISTA (M$ name for new windoze ver.) is an acronym for

  • Virus's
  • Intruders
  • Spyware
  • Trojans
  • Adware


Just thought you'd like to know.


ROTFL!

Nice one. Mind if I quote you on that? :)

(And as far as I know, M$ is the only system that supports those things!)

Cheers,
Martin


Of course you can quote it. It actually came to me from my son, registered as Nutter, he has never claimed to be sane and to prove it he is BSc in Computer Science.

Andy
ID: 145672 · 举报违规帖子
Profile ML1
志愿者负责人
志愿者测试人员

发送消息
已加入:25 Nov 01
贴子:10633
积分:7,508,002
近期平均积分:20
United Kingdom
消息 145666 - 发表于:1 Aug 2005, 16:19:50 UTC - 回复消息 145557.  
最近的修改日期:1 Aug 2005, 16:20:20 UTC

P.S. VISTA (M$ name for new windoze ver.) is an acronym for

  • Virus's
  • Intruders
  • Spyware
  • Trojans
  • Adware


Just thought you'd like to know.


ROTFL!

Nice one. Mind if I quote you on that? :)

(And as far as I know, M$ is the only system that supports those things!)

Cheers,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 145666 · 举报违规帖子
Profile RDC
志愿者测试人员
Avatar

发送消息
已加入:17 May 99
贴子:544
积分:1,215,728
近期平均积分:0
United States
消息 145634 - 发表于:1 Aug 2005, 13:56:14 UTC - 回复消息 145608.  

People who can return results quickly but choose long queues in order to increase their earned cobblestones will rethink that strategy. People, when faced with getting no credit for their work due to slow computing, will help the economy by buying broadband and getting a new computer, perhaps a MacTel!


Hope you'll buy me a faster PC and the broadband connection to go with it since it's your idea.


To truly explore, one must keep an open mind...
ID: 145634 · 举报违规帖子
Profile Paul D. Buck
志愿者测试人员

发送消息
已加入:19 Jul 00
贴子:3898
积分:1,158,042
近期平均积分:0
United States
消息 145633 - 发表于:1 Aug 2005, 13:51:24 UTC

THe reason for the 4th issue was to handle those occasions where one of the three does not return the result. In this case you now have a work unit and two results in the database for 4 weeks. Even with the 4 th result pending, and it is late, should return within the two week original deadline. If it does not, the work is purged because we did form a Quorum of Results.

So, normal course of events the work is held in the database for only two weeks and no re-issue is required.

The hand shaking you talk to in the end of your post is a part of the BOINC Daemon and the scheduler code that is being worked on right now. It is being tested on Einstein@Home ...
ID: 145633 · 举报违规帖子
PhonAcq

发送消息
已加入:14 Apr 01
贴子:1656
积分:30,658,217
近期平均积分:1
United States
消息 145608 - 发表于:1 Aug 2005, 12:02:00 UTC
最近的修改日期:1 Aug 2005, 12:02:58 UTC

I'm not sure anyone is rebutting my position objectively. To repeat, if the Seti computer system already has decided a result, which seems to occur when there are three self-consistent returned results, the next result is over-redundant and is not needed, by definition. The science result has been determined by this point. Thus the late arrivers are not contributing. They don't deserve full credit. An a priori wu deadline/time limit is not needed.

Accepting this, then the database requirements may actually reduce, because the system need not keep the late comers around. Just assimulate the initial results and clear the space.

When there are system malfunctions like those of a couple of weeks ago, people will not loose work units via time outs.

People who can return results quickly but choose long queues in order to increase their earned cobblestones will rethink that strategy. People, when faced with getting no credit for their work due to slow computing, will help the economy by buying broadband and getting a new computer, perhaps a MacTel!

What would be good would be for command central to develope a system derived abort signal, which would be issued to a client to abort any work unit that is already stale. Again, the compute power would not be wasted on overly redundant computing.

See also: Is redundant SETI computing absurd?
May this Farce be with You
ID: 145608 · 举报违规帖子
Profile Pappa
志愿者测试人员
Avatar

发送消息
已加入:9 Jan 00
贴子:2562
积分:12,301,681
近期平均积分:0
United States
消息 145568 - 发表于:1 Aug 2005, 5:05:52 UTC - 回复消息 145148.  

John

I remember a post similar and it was stated that "multiple machine are in "progress" then the first trhee that it high/low are tosed and then "assigned" machines are "averaged"... This is a Very Good Idea!

So obviously there is a change that no one has posted... AND looking at some results, you are stating the truth...

This really is SAD! They Lied Again! With respect to the last outage, then users have to waste more "Donated" time...

Yes it reall is hard to be "Postive" when people in charge have dictated that "Users" do not matter... We can get More!


Look it up. If it has three results that are already validated, abort it. Otherwise, keep crunching, you may be the third result, and prevent the need for someone else to do the crunching. If there are three results and they are not validating, then keep crunching as you may make a quorum with two of the others and get the science complete and not require another computer to do the crunching.

If it is going to be more than a couple of days late, abort it anyway as it is likely, but not guaranteed that the replacement will be back before yours is in.

So it really depends.


Please consider a Donation to the Seti Project.

ID: 145568 · 举报违规帖子
SURVEYOR
志愿者测试人员

发送消息
已加入:19 Oct 02
贴子:375
积分:608,422
近期平均积分:0
United States
消息 145564 - 发表于:1 Aug 2005, 4:54:43 UTC

I just received 4 wu's with a 2 month deadline for the test project [Beta].
Maybe they are testing the long deadline?

Fred
BOINC Alpha, BOINC Beta, LHC Alpha, Einstein Alpha
ID: 145564 · 举报违规帖子
W-K 666 Project Donor
志愿者测试人员

发送消息
已加入:18 May 99
贴子:13932
积分:40,757,560
近期平均积分:67
United Kingdom
消息 145557 - 发表于:1 Aug 2005, 3:45:59 UTC - 回复消息 145449.  

Yes, I forgot that the db keeps growing. But the deadline is not important if the process described below is slightly modified, to wit, as soon as there are three (or four or whatever) consistent results, close the wu. The decision on the wu has been made, de facto. Don't invoke a time limit. Then when there are system issues that delay the return of results no one gets harmed.

Of course, alot of people are getting credit for that fourth returned result, me included. But I think the credit is not due to us because the decision on the unit was already complete. So the 4th result is not adding to the 'science' except for adding a wee bit more confidence in the result.


I think that it would upset a lot of people, I had a unit yesterday morning that I received at about 03:00 UTC and by the time I looked at about 08:30 BST in the UK it had already been granted credit. There has to be a lot of hosts that cannot crunch a unit in 4 and a half hours, my second machine which is offline, takes 6h:30m on a good day.

Andy

P.S. VISTA (M$ name for new windoze ver.) is an acronym for

  • Virus's
  • Intruders
  • Spyware
  • Trojans
  • Adware



Just thought you'd like to know.

ID: 145557 · 举报违规帖子
Profile Steve Cressman
志愿者测试人员
Avatar

发送消息
已加入:6 Jun 02
贴子:583
积分:65,644
近期平均积分:0
Canada
消息 145542 - 发表于:1 Aug 2005, 2:46:28 UTC - 回复消息 145449.  

Yes Of course, alot of people are getting credit for that fourth returned result, me included. But I think the credit is not due to us because the decision on the unit was already complete. So the 4th result is not adding to the 'science' except for adding a wee bit more confidence in the result.


Your thinking here needs to be revised a little bit. Please go to the Wiki and read about credit and related links. You will then learn there are many reason it is done the way it is :)

98SE XP2500+ @ 2.1 GHz Boinc v5.8.8

And God said"Let there be light."But then the program crashed because he was trying to access the 'light' property of a NULL universe pointer.
ID: 145542 · 举报违规帖子
1 · 2 · 后

留言板 : Number crunching : WU Deadlines


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.