Small Word (Sep 20 2007)

Message boards : Technical News : Small Word (Sep 20 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

AuthorMessage
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 646585 - Posted: 22 Sep 2007, 15:50:44 UTC - in response to Message 646547.  

Everyone - please remember that there may be reasons why a person "drops out" of SETI other than getting tired of the project or the project's woes... I've had to drop WU's because a) something major happened to my BOINC installation, causing me to have to detach and re-install; (losing all issued WU's in progress...) and b) my OS decided to take a dive, again losing all WU's in progress. (NTM the use of that computer for a number of days!)

I once had to drop all work units because I opened a Trojan horse that was so new that Symantec had no idea about it. I wound up having to abort all my results, upload the new trojan to Symantec, perform a backup, reformat, and reinstall. Symantec later emailed me stating that I sent something in that they did not now about, and had a link to definitions that detected the new threat. Unfortunately, by that time, I had already nuked my hard drive. I did not want to have my hijacked laptop possibly be part of any spam botnet. The definitions were able to catch a piece of the Trojan I accidentally backed up, though. Anyways, I am pretty happy with Norton because it has caught some emails that contained BIOS-erasing Trojan horses and other really nasty malware packages. I understand that someone has to be the first victim of some malware who uploads it to Symantec or other antivirus company before anyone can be protected from it. However, if it happens too often, I will get upset and switch.
ID: 646585 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 646606 - Posted: 22 Sep 2007, 16:09:52 UTC

There is no heartbeat detector time that works. I have a couple of hosts that entered EDF for CPDN with a collection of other tasks on the host. All of those other tasks were completed and returned by the deadline, but the hosts were scheduled to be in EDF for well over 6 months. These hosts only contacted CPDN to upload intermediate work. They did not contact any of the other projects. One of those is still going and has a completion time in January for the CPDN task (late, but not much of a problem for CPDN). The other crashed the CPDN task.

Just be patient. Your credit will come eventually.


BOINC WIKI
ID: 646606 · Report as offensive
Profile Henk Haneveld
Volunteer tester

Send message
Joined: 16 May 99
Posts: 154
Credit: 1,577,293
RAC: 1
Netherlands
Message 646610 - Posted: 22 Sep 2007, 16:13:47 UTC

I can agree to the proposal of Richard Haselgrove.

The only thing that is not completely right is the maximum cache size.

It is 10 days of connection interval + 10 days of additional cache for a total of 20 days. I don't think many people will have this setting but is it theoretical possible. So I think 1 month is the better cut off time.
ID: 646610 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 646622 - Posted: 22 Sep 2007, 16:25:23 UTC - in response to Message 646547.  

Everyone - please remember that there may be reasons why a person "drops out" of SETI other than getting tired of the project or the project's woes...

I would suggest that the vast majority of crunchers are not even aware of the "project's woes."

BOINC is loaded, and otherwise ignored.

ID: 646622 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 646714 - Posted: 22 Sep 2007, 20:53:48 UTC - in response to Message 646456.  

The BOINC system needs a heartbeat detector so that if someone doesn't make *any* contact at all for a week, then all uncrunched WUs are returned to the pool.


A week is to short. I have a old host that does about 1 wu per day.
But recently it downloaded a Einstein result and went into EDF for 2 weeks because of the short return time of these huge things.

A heartbeat detector is in principle ok but it should look at a interval of at least a month.


A week isn't too short, you'll still be contacting the server occasionally while processing a long WU.

A heartbeat is just to say "I'm here and working", and probably a daily event. If you do 1 WU a day, and a 2 day cache, then you'll have 2-3 WUs in your queue. If you run continously then most that anyone would wait for your results is 2-3 days.

If you run 8 hours a day, then the most anyone would wait is 8-9 days.

If you take on Einstein too, and can't complete your seti cache, then they should be returned to the pool in advance of the deadline. It is really no big deal to give up unprocessed WUs.

If your host contacts SETI daily, then it will know you're alive and could extend the deadline for a WU *in progress* and could even extend deadlines for cached WUs, though this situation should never arise as they should have been returned.

If you switch off your machine for a week, and don't make any contact (regardless of WU size or machine power) then your queued WUs would return to the pool, and life goes on for everyone else. You get new ones when you reconnect, but of course lose the one that the machine was working on.

If you do plan to go away without the machine, then set no new work, otherwise try to suspend after a WU has completed.

Perhaps an AWAY button would be an idea to return queued WUs to the pool, while allowing the current WU to complete.

Einstein has some 56 hour WUs (P4 estimate) with a deadline of 3 weeks.
SETI has 8 hour WUs with a deadline of 8 weeks. Go figure.
The current system works, but I think is flawed and could be done more intelligently and equitably for all.


There is nothing wrong with the system, only the implementation of it by individual projects. Workunits can send "trickles" to the project server that indicate its progress. For example, CPDN does this, and its projects servers use this as a "client heartbeat" for the issued workunits. Since their WU take 6-12 months to crunch each, trickles are critical for efficiency of the project. I think Einstein and perhaps others could be improved by using trickles. SETI's WU are really too small IMHO for trickles and their servers are already overloaded without another server function to run.
ID: 646714 · Report as offensive
Tommy

Send message
Joined: 26 Jul 00
Posts: 9
Credit: 530,369
RAC: 0
United States
Message 646886 - Posted: 23 Sep 2007, 1:44:31 UTC - in response to Message 645328.  

MATT, ARE WE HAVING PROBLEMS IN GRANTING CREDIT? I HAVE NEVER HAD SO MANY PENDING WORK UNITS, OVER 600+ HOURS. MY PERCEPTION IS THAT IT STARTED BUILDING UP AFTER THE LAST BACKUP OUTAGE, COULD BE WRONG ON THAT.


There's nothing Matt can do.

The other systems need to return their results before you can be granted credit.


By now you probably agree with me there is a problem. I have been a user for over 6 years now (have some understanding on how the system works - very low level and do not want to know more, all I want to do is have my 3 PC's crunch their little procssors until they are content. I now have have 19 WU pending, that has NEVER happened to me! The rate of credit declined sharply either on Monday or Tuesday.

I have been reading all the post today, some very good thoughts out there.
ID: 646886 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 646893 - Posted: 23 Sep 2007, 1:57:40 UTC - in response to Message 646886.  

By now you probably agree with me there is a problem.

There is no problem.
You just have to wait for others to return their results.

Grant
Darwin NT
ID: 646893 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30692
Credit: 53,134,872
RAC: 32
United States
Message 647731 - Posted: 24 Sep 2007, 8:04:57 UTC - in response to Message 646886.  

Nothing wrong or out of the odinary. As more people cache more work it will take longer for the results to get returned. Expect the granting of credit to get slower. Your machine alone returning a workunit means nothing. Another machine must also return it and with the same results. Just checked and I have 43 for about 3000 cobblestones. Higher than when 3 computers were needed for a result but average since they went down to 2 computers.

Why you ask? Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life.

Gary

MATT, ARE WE HAVING PROBLEMS IN GRANTING CREDIT? I HAVE NEVER HAD SO MANY PENDING WORK UNITS, OVER 600+ HOURS. MY PERCEPTION IS THAT IT STARTED BUILDING UP AFTER THE LAST BACKUP OUTAGE, COULD BE WRONG ON THAT.


There's nothing Matt can do.

The other systems need to return their results before you can be granted credit.


By now you probably agree with me there is a problem. I have been a user for over 6 years now (have some understanding on how the system works - very low level and do not want to know more, all I want to do is have my 3 PC's crunch their little procssors until they are content. I now have have 19 WU pending, that has NEVER happened to me! The rate of credit declined sharply either on Monday or Tuesday.

I have been reading all the post today, some very good thoughts out there.


ID: 647731 · Report as offensive
Profile Palomar Jack
Avatar

Send message
Joined: 9 Nov 04
Posts: 44
Credit: 503,405
RAC: 0
United States
Message 647737 - Posted: 24 Sep 2007, 9:00:01 UTC - in response to Message 647731.  

Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life.


No, it's not "life", it's inconsiderate. For example, if you "recruited" Aunt Bertha's computer and it's is only on a couple of of hours a week to check email and you really think it needs to run a DC project, you don't need a ten day cache. It won't run out of work if there's a two or three day outage, it just won't. And even if it did, just how much credit are you going to lose? A dozen or so off of your RAC will not... make... a difference.
ID: 647737 · Report as offensive
Profile RandyC
Avatar

Send message
Joined: 20 Oct 99
Posts: 714
Credit: 1,704,345
RAC: 0
United States
Message 647745 - Posted: 24 Sep 2007, 9:52:43 UTC - in response to Message 647737.  

Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life.


No, it's not "life", it's inconsiderate. For example, if you "recruited" Aunt Bertha's computer and it's is only on a couple of of hours a week to check email and you really think it needs to run a DC project, you don't need a ten day cache. It won't run out of work if there's a two or three day outage, it just won't. And even if it did, just how much credit are you going to lose? A dozen or so off of your RAC will not... make... a difference.


Which just goes to show that every user's situation is different. So different users/machines need different cache settings. BOINC allows that. That's life in the DC world.
ID: 647745 · Report as offensive
n7rfa
Volunteer tester
Avatar

Send message
Joined: 13 Apr 04
Posts: 370
Credit: 9,058,599
RAC: 0
United States
Message 647903 - Posted: 24 Sep 2007, 13:31:25 UTC - in response to Message 647737.  

Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life.


No, it's not "life", it's inconsiderate. For example, if you "recruited" Aunt Bertha's computer and it's is only on a couple of of hours a week to check email and you really think it needs to run a DC project, you don't need a ten day cache. It won't run out of work if there's a two or three day outage, it just won't. And even if it did, just how much credit are you going to lose? A dozen or so off of your RAC will not... make... a difference.

Some folks have gotten "gun shy" because of the recent problems with getting work. As a result, they've upped their cache size to make sure that they will get through any rough times.

Then there are those clients that for one reason or another haven't connected to the Berkeley servers for weeks. Their work won't be re-issued until it hits the deadline and then you have to wait some more for the new system to return the result.

I'm connected 24x7 and personally, I've upped my cache from 3-5 days to 7 days. This allows for a 4 day weekend, 1 day for Berkeley to fix things, the Tuesday outage, and the splitters to get caught up on either Wednesday or Thursday.

I'm currently running my Windows systems down to zero so I can work on them. Once I'm done with the work, I'll go back to 7 days.

ID: 647903 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 647909 - Posted: 24 Sep 2007, 13:36:53 UTC - in response to Message 647737.  
Last modified: 24 Sep 2007, 13:38:26 UTC

Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life.


No, it's not "life", it's inconsiderate. For example, if you "recruited" Aunt Bertha's computer and it's is only on a couple of of hours a week to check email and you really think it needs to run a DC project, you don't need a ten day cache. It won't run out of work if there's a two or three day outage, it just won't. And even if it did, just how much credit are you going to lose? A dozen or so off of your RAC will not... make... a difference.


If you only run the computer for a couple hours per week, the % (% of time computer is running & while running, % of time BOINC is allowed to run) stats would drop, affecting the calculation for a 10 day cache. That user won't get a full 10 day cache for very long. And if they only run the machine a couple of hours a day and bumped the cache up to 10, the % stats are already low, which will prevent the user from getting a full 10 day cache.

BOINC is designed so that your scenario of inconsiderate people just doesn't happen. So any pending credits you have are due to the speed of the crunchers themselves and not an over-inflated cache setting. Not everyone has a fast computer.

As said, that's life.
ID: 647909 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 647938 - Posted: 24 Sep 2007, 14:27:27 UTC - in response to Message 647909.  

Well with people caching more work it means that you are more likely to get paired with a slow computer (slow to return result) rather than a fast one. That's life.

No, it's not "life", it's inconsiderate. For example, if you "recruited" Aunt Bertha's computer and it's is only on a couple of of hours a week to check email and you really think it needs to run a DC project, you don't need a ten day cache. It won't run out of work if there's a two or three day outage, it just won't. And even if it did, just how much credit are you going to lose? A dozen or so off of your RAC will not... make... a difference.

If you only run the computer for a couple hours per week, the % (% of time computer is running & while running, % of time BOINC is allowed to run) stats would drop, affecting the calculation for a 10 day cache. That user won't get a full 10 day cache for very long. And if they only run the machine a couple of hours a day and bumped the cache up to 10, the % stats are already low, which will prevent the user from getting a full 10 day cache.

Well, the user will still get a 10 day cache, but 10 days of their usual pattern of BOINC activity, rather than 10 days of 24/7 mega-overclock multicore multiCPU, which is what people tend to notice.
BOINC is designed so that your scenario of inconsiderate people just doesn't happen.

LOL - now there's a statement! Could we offer BOINC to the UN, as the solution to all those inconsiderate people who keep starting wars, killing each other, that sort of thing?

Seriously, BOINC reacts well to past events - sometimes by adjusting averages, sometimes by taking immediate corrective action (viz. RDCF). But it doesn't predict future events ("Motherboard going to fail in nine days time, I see. Better stop fetching more work, then."), it doesn't prevent external failures ("You knocked the network cable out of the wall jack while you were hoovering last night. I've plugged it back in for you."), and it doesn't prevent its human operator from making spontaneous changes of plan ("Someone was rude to me on a forum. I'm not going to crunch for that lot again."). That's where the inconsiderate/antisocial behaviour comes in - not just having a cache (I have one myself, though less than 10 days) but having a very large cache and just sitting on it, not returning work so that the project, and other crunchers, can clear the results and move on.
ID: 647938 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 647941 - Posted: 24 Sep 2007, 14:38:51 UTC - in response to Message 647938.  
Last modified: 24 Sep 2007, 14:41:54 UTC

Well, the user will still get a 10 day cache, but 10 days of their usual pattern of BOINC activity, rather than 10 days of 24/7 mega-overclock multicore multiCPU, which is what people tend to notice.


Umm.. yeah, that's what I said. 8-)

Seriously, BOINC reacts well to past events - sometimes by adjusting averages, sometimes by taking immediate corrective action (viz. RDCF). But it doesn't predict future events ("Motherboard going to fail in nine days time, I see. Better stop fetching more work, then."), it doesn't prevent external failures ("You knocked the network cable out of the wall jack while you were hoovering last night. I've plugged it back in for you."), and it doesn't prevent its human operator from making spontaneous changes of plan ("Someone was rude to me on a forum. I'm not going to crunch for that lot again."). That's where the inconsiderate/antisocial behaviour comes in - not just having a cache (I have one myself, though less than 10 days) but having a very large cache and just sitting on it, not returning work so that the project, and other crunchers, can clear the results and move on.


No, it doesn't know future events, but the second it becomes the past, it learns from it. So no, it won't know about a motherboard failure, etc. But it will eventually notice that you're not running at 100% 24/7 and adjust accordingly. It will straighten itself out.

I thought I've shown myself to be a reasonable person. I didn't know someone would take a statement of mine to the extreme and infer something that certainly wasn't intended. (Ah, who am I kidding? That's the nature of humans! To take things to extremes when making a point. LOL)

Your point is taken and understood, but in doing so, you lost or obfuscated the meaning of my post.
ID: 647941 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 647986 - Posted: 24 Sep 2007, 16:45:22 UTC - in response to Message 646893.  

By now you probably agree with me there is a problem.

There is no problem.
You just have to wait for others to return their results.


I have no problem with having to wait while other users are working through their caches, but I keep a 1-2 day cache so that I don't keep others waiting.

I do (as should all of us) have a problem with those that have buggered off, for whatever reason, and there's only a blunt and unintelligent tool of WU expiry to recover from that situation. The deadlines for that are just far too long.

I'm simply proposing a better solution that will enhance workflow, and make the system more efficient and responsive and make more people happy.

The credit delay is minimal annoyance, but as a perfectionist and professional system analyst, suboptimal systems irritate me. WU management can and should be improved.

Any user that has gone AWOL should have the WUs returned to the pool. Period.
I'll leave it to dynamicists to decide thresholds.

Those that opt to process a large cache offline can set a flag to be ignored.
Those that are connected 24x7 should be more involved and the core should have more awareness of the participants' status.

It is simply unacceptable that parts of a supercomputer should just disappear without intervention for months on end while running a job!!!

Andy.
ID: 647986 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20372
Credit: 7,508,002
RAC: 20
United Kingdom
Message 648008 - Posted: 24 Sep 2007, 17:20:23 UTC - in response to Message 647986.  

... It is simply unacceptable that parts of a supercomputer should just disappear without intervention for months on end while running a job!!!

That is actually a design parameter of Boinc...

The system recovers gracefully even if a proportion of the supporting hosts disappear without trace!


However, I guess the credit whores will always be upset at whatever small delay is incurred for their credits...

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 648008 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 648017 - Posted: 24 Sep 2007, 17:40:18 UTC - in response to Message 648008.  

However, I guess the credit whores will always be upset at whatever small delay is incurred for their credits...

Happy crunchin',
Martin

Martin,

I hope you will agree, from my posting history, that I'm not a credit whore (and I'm sure that you weren't directing that comment at me - no offence taken).

But I get upset too. I get upset because the scientific aims of the project are put at risk, because the ancient and underfunded hardware is stressed beyond its limits by keeping an excessive and unnecessary volume of archival material in working, front-line, active storage.

I agree that BOINC is designed to tolerate and recover from stresses like this, but I still think, and say, that it's antisocial to test those stress limits when they can be avoided with a little thoughtfulness.
ID: 648017 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 648034 - Posted: 24 Sep 2007, 18:28:33 UTC - in response to Message 647986.  

The deadlines for that are just far too long.

Shorter deadlines would be good; 2-3 weeks for the longest of Work Units should be more than enough for the slowest of crunchers. It'd help reduce the load on the servers by having less "in progress" work.
But as it is the credit does come though eventually it's not a big deal, just an extremely minor one IMHO.
Grant
Darwin NT
ID: 648034 · Report as offensive
john_morriss
Avatar

Send message
Joined: 5 Nov 99
Posts: 72
Credit: 1,969,221
RAC: 48
Canada
Message 648037 - Posted: 24 Sep 2007, 18:36:48 UTC

If we're looking for bad examples, try Computer # 3639917

It's been MIA since Aug 30, sitting on 120 uncrunched Results, incluidng one of mine. It's part of a dozen or so computers run by Technical Edge. All the others are reporting in almost daily...

I also noticed that this computer is paired on one WU with Computer # 3742648 who was heard from Sept 10, with 446 Results. That's a lot of strain on the database from just two computers...

Interesting what you can find out just poking around...

ID: 648037 · Report as offensive
Profile WimTea
Volunteer tester

Send message
Joined: 15 Feb 02
Posts: 34
Credit: 909,865
RAC: 0
Netherlands
Message 648059 - Posted: 24 Sep 2007, 19:31:42 UTC

Who can beat comp #3768714 with 719 results, not heard from since Sept 5th...

Really, I don't care if it take a week or two months for a result to get granted credits by getting validated, as long as it meets the projects' scientific goals I consider that to be perfectly OK.
ID: 648059 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

Message boards : Technical News : Small Word (Sep 20 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.