Small Word (Sep 20 2007)

Message boards : Technical News : Small Word (Sep 20 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

AuthorMessage
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24913
Credit: 3,081,182
RAC: 7
Ireland
Message 646212 - Posted: 22 Sep 2007, 0:32:35 UTC - in response to Message 646184.  

the servers are not down they sre just having to deal with a large volume of traffic due to small work units and 10 day caches


Thanks Lee
ID: 646212 · Report as offensive
Robert

Send message
Joined: 2 May 00
Posts: 5
Credit: 12,853,177
RAC: 10
United States
Message 646260 - Posted: 22 Sep 2007, 2:13:51 UTC - in response to Message 646132.  

An adjustment is in order, and I suspect that it will be done when enough data is in place to do so.


Mark, I'm sure there's enough data now! As you know, it's so annoying when someone just buggers off and leaves the rest of us high and dry for another couple of months.

The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool.
If they are actively crunching through their cache, then deadlines could be extended for any WUs in progress but at risk of not completing.
The worst that can happen is that if they do come back, then they have to download some new WUs. An itsy bitsy inconvenience compared to our enduring frustration!

Some people have obviously gone AWOL, and an admin override to return WUs to the pool would be nice.

Make it clearer in the docs, that it is very unpolite to leave a project in limbo.

Oh, and while I'm at it, implementing redirect after post when submitting a message here is the correct way to write a bulletin board, to avoid double posts and paying twice etc.

Andy.


If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them.



Add me to the list also... I can't even find this person in the user list!

Thats my IP I run several computers whats the issue
ID: 646260 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 646302 - Posted: 22 Sep 2007, 3:27:22 UTC - in response to Message 646260.  

An adjustment is in order, and I suspect that it will be done when enough data is in place to do so.


Mark, I'm sure there's enough data now! As you know, it's so annoying when someone just buggers off and leaves the rest of us high and dry for another couple of months.

The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool.
If they are actively crunching through their cache, then deadlines could be extended for any WUs in progress but at risk of not completing.
The worst that can happen is that if they do come back, then they have to download some new WUs. An itsy bitsy inconvenience compared to our enduring frustration!

Some people have obviously gone AWOL, and an admin override to return WUs to the pool would be nice.

Make it clearer in the docs, that it is very unpolite to leave a project in limbo.

Oh, and while I'm at it, implementing redirect after post when submitting a message here is the correct way to write a bulletin board, to avoid double posts and paying twice etc.

Andy.


If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them.



Add me to the list also... I can't even find this person in the user list!

Thats my IP I run several computers whats the issue


There are 52 WU's that were downloaded between August 13 and the 18th's, over a month ago, that are not being crunched.
ID: 646302 · Report as offensive
Robert

Send message
Joined: 2 May 00
Posts: 5
Credit: 12,853,177
RAC: 10
United States
Message 646314 - Posted: 22 Sep 2007, 3:52:22 UTC - in response to Message 646302.  

An adjustment is in order, and I suspect that it will be done when enough data is in place to do so.


Mark, I'm sure there's enough data now! As you know, it's so annoying when someone just buggers off and leaves the rest of us high and dry for another couple of months.

The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool.
If they are actively crunching through their cache, then deadlines could be extended for any WUs in progress but at risk of not completing.
The worst that can happen is that if they do come back, then they have to download some new WUs. An itsy bitsy inconvenience compared to our enduring frustration!

Some people have obviously gone AWOL, and an admin override to return WUs to the pool would be nice.

Make it clearer in the docs, that it is very unpolite to leave a project in limbo.

Oh, and while I'm at it, implementing redirect after post when submitting a message here is the correct way to write a bulletin board, to avoid double posts and paying twice etc.

Andy.


If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them.



Add me to the list also... I can't even find this person in the user list!

Thats my IP I run several computers whats the issue


There are 52 WU's that were downloaded between August 13 and the 18th's, over a month ago, that are not being crunched.


Not sure , whats happening there, all wu's I can track at the moment have been crunched. will have another beer and check my servers
ID: 646314 · Report as offensive
Profile Gary Roberts
Volunteer tester

Send message
Joined: 31 Oct 99
Posts: 95
Credit: 2,301,228
RAC: 0
Australia
Message 646343 - Posted: 22 Sep 2007, 6:02:22 UTC - in response to Message 646314.  

....
Thats my IP I run several computers whats the issue


There are 52 WU's that were downloaded between August 13 and the 18th's, over a month ago, that are not being crunched.


Not sure , whats happening there, all wu's I can track at the moment have been crunched. will have another beer and check my servers


Despite your assertion to the contrary, the host with the 52 "hostages" is NOT one of your machines. You said that it had your IP - which is something that you cannot know since IP addresses are not published. Actually I presume you meant host ID and it certainly isn't one of yours. You don't even seem to have a dual core 6600 in your list, let alone a matching host ID.

Why did you think it was one of your machines?

ID: 646343 · Report as offensive
Profile Henk Haneveld
Volunteer tester

Send message
Joined: 16 May 99
Posts: 154
Credit: 1,577,293
RAC: 1
Netherlands
Message 646443 - Posted: 22 Sep 2007, 9:19:47 UTC - in response to Message 645989.  
Last modified: 22 Sep 2007, 9:20:15 UTC

The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool.


A week is to short. I have a old host that does about 1 wu per day.
But recently it downloaded a Einstein result and went into EDF for 2 weeks because of the short return time of these huge things.

A heartbeat detector is in principle ok but it should look at a interval of at least a month.
ID: 646443 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 646456 - Posted: 22 Sep 2007, 10:03:49 UTC - in response to Message 646443.  
Last modified: 22 Sep 2007, 10:11:13 UTC

The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool.


A week is to short. I have a old host that does about 1 wu per day.
But recently it downloaded a Einstein result and went into EDF for 2 weeks because of the short return time of these huge things.

A heartbeat detector is in principle ok but it should look at a interval of at least a month.


A week isn't too short, you'll still be contacting the server occasionally while processing a long WU.

A heartbeat is just to say "I'm here and working", and probably a daily event. If you do 1 WU a day, and a 2 day cache, then you'll have 2-3 WUs in your queue. If you run continously then most that anyone would wait for your results is 2-3 days.

If you run 8 hours a day, then the most anyone would wait is 8-9 days.

If you take on Einstein too, and can't complete your seti cache, then they should be returned to the pool in advance of the deadline. It is really no big deal to give up unprocessed WUs.

If your host contacts SETI daily, then it will know you're alive and could extend the deadline for a WU *in progress* and could even extend deadlines for cached WUs, though this situation should never arise as they should have been returned.

If you switch off your machine for a week, and don't make any contact (regardless of WU size or machine power) then your queued WUs would return to the pool, and life goes on for everyone else. You get new ones when you reconnect, but of course lose the one that the machine was working on.

If you do plan to go away without the machine, then set no new work, otherwise try to suspend after a WU has completed.

Perhaps an AWAY button would be an idea to return queued WUs to the pool, while allowing the current WU to complete.

Einstein has some 56 hour WUs (P4 estimate) with a deadline of 3 weeks.
SETI has 8 hour WUs with a deadline of 8 weeks. Go figure.
The current system works, but I think is flawed and could be done more intelligently and equitably for all.
ID: 646456 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 646457 - Posted: 22 Sep 2007, 10:03:59 UTC - in response to Message 646443.  

The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool.


A week is to short. I have a old host that does about 1 wu per day.
But recently it downloaded a Einstein result and went into EDF for 2 weeks because of the short return time of these huge things.

A heartbeat detector is in principle ok but it should look at a interval of at least a month.

I agree. There are many slow crunchers out there that still contribute and should not be cut off. If the servers could determine whether there is any work really in progress, no matter how slow, the host should not be disturbed.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 646457 · Report as offensive
Profile Henk Haneveld
Volunteer tester

Send message
Joined: 16 May 99
Posts: 154
Credit: 1,577,293
RAC: 1
Netherlands
Message 646474 - Posted: 22 Sep 2007, 11:20:25 UTC - in response to Message 646456.  

The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool.


A week is to short. I have a old host that does about 1 wu per day.
But recently it downloaded a Einstein result and went into EDF for 2 weeks because of the short return time of these huge things.

A heartbeat detector is in principle ok but it should look at a interval of at least a month.


A week isn't too short, you'll still be contacting the server occasionally while processing a long WU.

A heartbeat is just to say "I'm here and working", and probably a daily event. If you do 1 WU a day, and a 2 day cache, then you'll have 2-3 WUs in your queue. If you run continously then most that anyone would wait for your results is 2-3 days.

If you run 8 hours a day, then the most anyone would wait is 8-9 days.

If you take on Einstein too, and can't complete your seti cache, then they should be returned to the pool in advance of the deadline. It is really no big deal to give up unprocessed WUs.

If your host contacts SETI daily, then it will know you're alive and could extend the deadline for a WU *in progress* and could even extend deadlines for cached WUs, though this situation should never arise as they should have been returned.

If you switch off your machine for a week, and don't make any contact (regardless of WU size or machine power) then your queued WUs would return to the pool, and life goes on for everyone else. You get new ones when you reconnect, but of course lose the one that the machine was working on.

If you do plan to go away without the machine, then set no new work, otherwise try to suspend after a WU has completed.

Perhaps an AWAY button would be an idea to return queued WUs to the pool, while allowing the current WU to complete.

Einstein has some 56 hour WUs (P4 estimate) with a deadline of 3 weeks.
SETI has 8 hour WUs with a deadline of 8 weeks. Go figure.
The current system works, but I think is flawed and could be done more intelligently and equitably for all.


If my slow host is working in EDF on a Einstein result (a result takes about 250hrs) then it does not contact Seti for more then a week unless it has a Seti result with a short deadline.

Your idea of a host telling the server "I'm here and working", will create a lot of extra traffic on the already very busy servers.


ID: 646474 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 646477 - Posted: 22 Sep 2007, 11:29:17 UTC - in response to Message 646474.  

If my slow host is working in EDF on a Einstein result (a result takes about 250hrs) then it does not contact Seti for more then a week unless it has a Seti result with a short deadline.

Your idea of a host telling the server "I'm here and working", will create a lot of extra traffic on the already very busy servers.


as indeed it would if everyone upgraded to Quad cores!

What's the bandwidth problem with a simple packet?
ID: 646477 · Report as offensive
Profile HDRW

Send message
Joined: 18 Oct 02
Posts: 14
Credit: 189,189
RAC: 0
United Kingdom
Message 646490 - Posted: 22 Sep 2007, 12:29:47 UTC

If it's suspected that a host isn't working (by heartbeat or whatever method), rather than returning their WUs to the pool, why not issue them to others as the 3rd cruncher? The current standard is that 2 people get each WU, but is there anything in the logic that stops a third one being added at some point? That way all 3 people should get the credit if they do finally submit the result.

Or maybe I've misunderstood how it works...

Cheers,

Howard

ID: 646490 · Report as offensive
Profile Henk Haneveld
Volunteer tester

Send message
Joined: 16 May 99
Posts: 154
Credit: 1,577,293
RAC: 1
Netherlands
Message 646506 - Posted: 22 Sep 2007, 12:52:18 UTC - in response to Message 646490.  
Last modified: 22 Sep 2007, 12:53:26 UTC

If it's suspected that a host isn't working (by heartbeat or whatever method), rather than returning their WUs to the pool, why not issue them to others as the 3rd cruncher? The current standard is that 2 people get each WU, but is there anything in the logic that stops a third one being added at some point? That way all 3 people should get the credit if they do finally submit the result.

Or maybe I've misunderstood how it works...

Cheers,

Howard


Good idea. And if a host is running Boinc 5.10.xx then the not returned result will get the "Aborted by server" message.

An other thing came to mind just now. There is no need for a heartbeat message from a host. The server already knows when the last contact was. It is in the computer info on your account page.
ID: 646506 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 646511 - Posted: 22 Sep 2007, 13:02:11 UTC - in response to Message 646506.  

An other thing came to mind just now. There is no need for a heartbeat message from a host. The server already knows when the last contact was. It is in the computer info on your account page.


Same thing, different name. The client contacts the server at least as often as the cache setting, so the server can know this. It just doesn't care about delinquent users!
ID: 646511 · Report as offensive
Profile S@NL - XP_Freak

Send message
Joined: 10 Jul 99
Posts: 99
Credit: 6,248,265
RAC: 0
Netherlands
Message 646521 - Posted: 22 Sep 2007, 13:58:47 UTC - in response to Message 646064.  
Last modified: 22 Sep 2007, 14:02:03 UTC

If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them.

It's already down to 40. :)

And in 3 days time it will be down to 23.


Goodbye Seti Classic
ID: 646521 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 646523 - Posted: 22 Sep 2007, 14:00:28 UTC - in response to Message 646477.  

If my slow host is working in EDF on a Einstein result (a result takes about 250hrs) then it does not contact Seti for more then a week unless it has a Seti result with a short deadline.

Your idea of a host telling the server "I'm here and working", will create a lot of extra traffic on the already very busy servers.


as indeed it would if everyone upgraded to Quad cores!

What's the bandwidth problem with a simple packet?

For a host which is not always connected, none of these ideas make sense.

The deadlines range from 8.68 days to about 113 days because the splitter estimate calculations assume there's a crunch time range of about 13 times. The actual range is much less, and there's no logical reason not to change those calculations to match.
                                                                 Joe
ID: 646523 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 646526 - Posted: 22 Sep 2007, 14:02:24 UTC - in response to Message 646521.  

If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them.

It's already down to 40. :)


While that is good news, the bad news is the reason for the drop is a result of expired deadlines, which isn't good. If those deadlines had been months out, as some are now, who knows when they would have expired.

There really should be a work around for computers that at least appear to be idle; a month qualifies as idle to me :)
ID: 646526 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 646537 - Posted: 22 Sep 2007, 14:29:32 UTC - in response to Message 646526.  

If you're looking for a canidate to flush, how about this guy He's holding 52 hostages, I'm one of them.

It's already down to 40. :)


While that is good news, the bad news is the reason for the drop is a result of expired deadlines, which isn't good. If those deadlines had been months out, as some are now, who knows when they would have expired.

There really should be a work around for computers that at least appear to be idle; a month qualifies as idle to me :)


You mean like this guy? http://setiathome.berkeley.edu/workunit.php?wuid=147057956
I guess he decided SETI wasn't for him. I'll find out in a couple of weeks.



PROUD MEMBER OF Team Starfire World BOINC
ID: 646537 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 646545 - Posted: 22 Sep 2007, 14:37:56 UTC - in response to Message 646523.  

[snip]

What's the bandwidth problem with a simple packet?

For a host which is not always connected, none of these ideas make sense.
[snip]
                                                                 Joe


I agree, some hosts don't connect, except when specifically directed to: E.G. computers still on dial-up (like me!) My computers (except one, on a cable modem - at a remote location) usually spend about 23 hrs a day on "network activity suspended", because the modem isn't connected - or the modem (one modem, computers networked) is handling my other Internet traffic, which I don't want slowed down by BOINC traffic...

.

Hello, from Albany, CA!...
ID: 646545 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 646547 - Posted: 22 Sep 2007, 14:45:43 UTC
Last modified: 22 Sep 2007, 14:47:43 UTC

Everyone - please remember that there may be reasons why a person "drops out" of SETI other than getting tired of the project or the project's woes... I've had to drop WU's because a) something major happened to my BOINC installation, causing me to have to detach and re-install; (losing all issued WU's in progress...) and b) my OS decided to take a dive, again losing all WU's in progress. (NTM the use of that computer for a number of days!)
.

Hello, from Albany, CA!...
ID: 646547 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 646575 - Posted: 22 Sep 2007, 15:37:51 UTC
Last modified: 22 Sep 2007, 15:38:16 UTC

I don't see any reason why these ideas shouldn't be put forward as a serious proposal, notwithstanding dialup, EDF or anything else.

The longest supported cache length is 10 days. Double that for luck: so say "Any host which has not contacted the server for over 3 weeks (21 days), and which has WUs 'in progress' assigned to it, isn't pulling its weight: it is causing stress to the project by blocking scarce database and storage resources."

In the first instance, we don't have to do anything about the idle host. Just mark its WUs for re-issue, and let them be sent again to the next candidate who asks. It becomes a bit like an initial replication of two-and-a-bit, and in the process slightly reduces the demand on the splitters (since the re-issued WU datapak is already, by definition, on the download server).

That should ensure that the quorum is met within about a month, and the idle host's result becomes 'Redundant - cancelled by server' (subject to the usual vagaries of download error, compute error, no consensus, etc. etc. - take it as read that these would generate re-sends as normal).

If the idle host wakes up again at this point, nothing is lost: whatever the reason for the absence (holiday, summer heat, breakdown, road trip without internet access, priority given to other BOINC projects), s@h work continues according to the rules of BOINC. Newer clients will react to the server cancellation, and download replacement work: older clients will just crunch it anyway, and claim (and be awarded) credits for the work even though it's redundant.

The more interesting question is: what would happen if the idle host remains AWOL and incommunicado, even after the quorum is formed? There could be another two months or more to go before the ultimate deadline.

Personally, I think the project should bite the bullet, and be firm about this: if a host has not made any contact at all with the project for over a month, and a quorum has been formed, then the WU should be assimilated unconditionally and the associated results deleted and purged from the BOINC database.

We might get a few - a very, very few - who wake up like Rip van Winkle, crunch their ancient WUs, and try to cash them in for credit: and then complain when the server rejects them. But I think the good of the project is more important than the feelings of this particular category of users. It would need to be explained, politely but firmly, that Berkeley needed its storage space back, and has acted accordingly.

And we should instill in users - current and new - a sense of community. In my book, it is plain antisocial to download a whole bunch of WUs and then deliberately walk away without crunching them, aborting them, detaching from the project, or in some such way telling the servers that the results won't be coming back.

Note that I say 'deliberately'. Nobody would criticise anyone who loses a few WUs to a comms or computer glitch, or even a whole cache-full to a hard drive failure or suchlike. Events like that are unplanned, and can usually be sorted out in under a month anyway: if you can re-attach under the same ID, or merge hosts, you can nowadays get the lost WUs reissued and crunch them as if nothing had happened.
ID: 646575 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

Message boards : Technical News : Small Word (Sep 20 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.