Composite Head (Nov 05 2008)

Message boards : Technical News : Composite Head (Nov 05 2008)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 827636 - Posted: 6 Nov 2008, 19:55:39 UTC - in response to Message 827602.  

... When developing a new feature or program in the business world, we should always ...

In case you hadn't noticed, this project has nothing to do with the business world.

Indeed, in the business world this project simply would not exist.

Now realign reality to academia and note that this project runs on academia with zero secure funding and irratic negligible funding.


I certainly wouldn't work under those conditions. It's a good job that Matt can skive off for a few days to go busking in the streets!

Keep searchin',
Martin


You seemed to have missed my point. It has nothing to do with "business world" vs academia. My point was that when making changes to a system, it's a good idea to consider, "How much time will I have to spend keeping this thing going once I start?" Consider the machine's workload too.

It's been my experience that keeping this simple idea in mind (albeit a little more "thought" and planning time goes into it at first) saves both time and resources later.
ID: 827636 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 827637 - Posted: 6 Nov 2008, 19:57:57 UTC - in response to Message 827384.  

Ned, thank you for always pointing out the silver lining in our cloud worry, so to speak.

Of course, its possible that we are becoming the "flagship" project for no-budget big-computing, with the most error prone network communications and no scientific results. But that would probably tarnish your silver.

So I guess we should just hum the "don't worry; be happy" jingle along with Bobby McFarrin. We really don't need to get any better, after all.

I really think you are missing the point, and I think that point is incredibly important.

All network communications are error prone, from the simplest possible home network to the largest enterprise-wide WAN/LAN. We tend to be unaware because the glitches are generally unusual and short, and in the case of an enterprise network, there are often redundant paths.

But the real reason is, for every network, there exists some level where "good enough" really is.

A home network with 0.1% packet loss really is "good enough."

So, let's take a step back, and look at BOINC.

BOINC needs to be able to talk to the SETI servers pretty infrequently. If you are carrying a 3 or 4 day cache, as I suspect most of us in the forums are, BOINC really only needs to talk to the project once every 3 or 4 days. In reality, most of us probably "top up" several times a day.

My informal observation is that the SETI servers (not counting planned outages) are probably somewhere around the mid 90% reliable range. (better if you define "up" as "able to get work" only, but most who complain would say that up means splitters, upload server, scheduler, download server, etc.)

My gut feeling is that something around 75% is enough to keep everyone reasonably topped up, and to be able to accept uploads and reports before deadlines are reached.

Yes, I'm dead serious. Pushing reliability into the 99.999% range would make SETI run better, but their budget would need to increase by an order of magnitude.

So, while I'm all for improving things wherever possible, I think we need to measure "good" and "bad" with the right yardstick. I want better, but it's a want, not a requirement.

If BOINC can keep the cache from going empty, and report all work before the deadline, then the network and servers are in fact "good enough."

... and "good enough" really is good enough.
ID: 827637 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 827699 - Posted: 7 Nov 2008, 0:50:03 UTC - in response to Message 827567.  

After years of being involved with Seti in a casual "hobby" sort of way, I'd like to make the following observation.

Seems hundreds of thousands of us realize we donate our spare computer cycles for the folks at the projects we're involved with to do with as they wish. If there are issues with the project, so be it, they will resolve. There is the Other group, the "10 percenters" that think The Seti group and especially Matt are employed to squeeze the most credit out off personal machines and should be available 24/7 to make it happen.

I Understand this is THEIR project, not mine and appreciate the fact they do what they do so I can dream of being the guy that discovers E.T. so he can find us and destroy us (just kidding).

Matt and the Seti Staff, you do great work and should be applauded. The rest of you, relax.


You post is how I believe a lot of people feel, but they also are quiet and behind the scenes, like yourself. It echo's my sediments about the projects, pretty well. I have been more vocal than some, but as of late have sat back and just watched. You gave me inspiration to say something again.

With that, I have stated before that we are volunteering here, and are not forced to be here. We are at the mercy of the project. It could shut down tomorrow, then what would people do with their time?

People need to just relax and know they are working towards a common goal, one we may not even know the answer to in our lifetimes. It's not worth the stress of flipping out when a service isn't running correctly, or there is no work, etc. Enjoy the project as it is.

To Matt, Eric, and the rest of the team. Keep up the grand work you do each an every day.

My movie https://vimeo.com/manage/videos/502242
ID: 827699 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 827708 - Posted: 7 Nov 2008, 1:34:32 UTC
Last modified: 7 Nov 2008, 1:36:23 UTC

... back to technical news:

We still get the occasional "HTTP service unavailable"...

This, even though the computer in question had nothing else downloading, and no other computers were using the (shared) dial-up line.

[add]
Oh, and Spanish headers/footers are back!
[/add]
.

Hello, from Albany, CA!...
ID: 827708 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 827868 - Posted: 7 Nov 2008, 15:58:29 UTC - in response to Message 827637.  

Ned, thank you for always pointing out the silver lining in our cloud worry, so to speak.

Of course, its possible that we are becoming the "flagship" project for no-budget big-computing, with the most error prone network communications and no scientific results. But that would probably tarnish your silver.

So I guess we should just hum the "don't worry; be happy" jingle along with Bobby McFarrin. We really don't need to get any better, after all.

I really think you are missing the point, and I think that point is incredibly important.

All network communications are error prone, from the simplest possible home network to the largest enterprise-wide WAN/LAN. We tend to be unaware because the glitches are generally unusual and short, and in the case of an enterprise network, there are often redundant paths.

But the real reason is, for every network, there exists some level where "good enough" really is.

A home network with 0.1% packet loss really is "good enough."

So, let's take a step back, and look at BOINC.

BOINC needs to be able to talk to the SETI servers pretty infrequently. If you are carrying a 3 or 4 day cache, as I suspect most of us in the forums are, BOINC really only needs to talk to the project once every 3 or 4 days. In reality, most of us probably "top up" several times a day.

My informal observation is that the SETI servers (not counting planned outages) are probably somewhere around the mid 90% reliable range. (better if you define "up" as "able to get work" only, but most who complain would say that up means splitters, upload server, scheduler, download server, etc.)

My gut feeling is that something around 75% is enough to keep everyone reasonably topped up, and to be able to accept uploads and reports before deadlines are reached.

Yes, I'm dead serious. Pushing reliability into the 99.999% range would make SETI run better, but their budget would need to increase by an order of magnitude.

So, while I'm all for improving things wherever possible, I think we need to measure "good" and "bad" with the right yardstick. I want better, but it's a want, not a requirement.

If BOINC can keep the cache from going empty, and report all work before the deadline, then the network and servers are in fact "good enough."

... and "good enough" really is good enough.


Ok, a more serious but short reply.
* Zeroeth, Matt and friends are doing fine; criticizing the project in a helpful way should not be construed differently.
* First, all these comparison to other projects arguments are specious. We have what we have, regarding resources. Comparisons may suggest our limitations, but another project's reality is not our reality.
* Second, that said, we need to strive to be the best at what we do with what we have. So, once realized, our limitations need to be addressed. In the network reliability context, repeatedly reminding us about how we don't need 6-9's of network reliability is tedious; that point, and the fact that we aren't going to get there within current fiscal limiations, is well understood and accepted. But, pointing out obvious and correctable issues, should help move the project forward. And hiding behind Polly-Anna's petticoat, by denouncing every attempted objective criticism, doesn't help anyone, save for Polly-Anna who might be a bit excited.
- Case in point, anyone looking at the message logs will likely see many places where the connection to project was made but the servers are not responding, followed rapidly by many repeated attempts of this nature. That is an example of network waste commandcentral should be fixing, and it has nothing directly to do with cache size.
- Another example of a correctable problem has to do with the constant topping off of the cache. 300K hosts asking for 3 seconds of work is nonsense. Again, probably nothing to do with cache size.
- And so on.
* Nth, if this project is not in someway 'our' project then why are any of us here? Really, saying otherwise is nonsense. We are partnering with commandcentral. They need us in order to make any progress at all; we need them to provide organization and lead us forward. We share a joint vision of discovery. So I think abdicating our joint ownership is just a childish cop-out.
ID: 827868 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 827895 - Posted: 7 Nov 2008, 16:35:30 UTC

We are all in this project together.......
Matt, Eric, et al, would not be able to continue the project without us, and vice versa.......

Constructive criticism is just that.........constructive.

There is a difference between constructive criticism and just carping about the fact that things may not be running as they might be.

There are many instances over the years that we lowly users have pointed the admins of the project to look at a problem from a different view......and many times to our joint success at solving it..........

Soooooooo....

If you are really being constructive, please continue with your observations. Many times a new set of eyes brings light to things that another person may not see.

OTOH......if you just wish to complain...........well, let's just leave it at that.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 827895 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 827900 - Posted: 7 Nov 2008, 16:49:38 UTC - in response to Message 827868.  

Ned, thank you for always pointing out the silver lining in our cloud worry, so to speak.

Of course, its possible that we are becoming the "flagship" project for no-budget big-computing, with the most error prone network communications and no scientific results. But that would probably tarnish your silver.

So I guess we should just hum the "don't worry; be happy" jingle along with Bobby McFarrin. We really don't need to get any better, after all.

I really think you are missing the point, and I think that point is incredibly important.

All network communications are error prone, from the simplest possible home network to the largest enterprise-wide WAN/LAN. We tend to be unaware because the glitches are generally unusual and short, and in the case of an enterprise network, there are often redundant paths.

But the real reason is, for every network, there exists some level where "good enough" really is.

A home network with 0.1% packet loss really is "good enough."

So, let's take a step back, and look at BOINC.

BOINC needs to be able to talk to the SETI servers pretty infrequently. If you are carrying a 3 or 4 day cache, as I suspect most of us in the forums are, BOINC really only needs to talk to the project once every 3 or 4 days. In reality, most of us probably "top up" several times a day.

My informal observation is that the SETI servers (not counting planned outages) are probably somewhere around the mid 90% reliable range. (better if you define "up" as "able to get work" only, but most who complain would say that up means splitters, upload server, scheduler, download server, etc.)

My gut feeling is that something around 75% is enough to keep everyone reasonably topped up, and to be able to accept uploads and reports before deadlines are reached.

Yes, I'm dead serious. Pushing reliability into the 99.999% range would make SETI run better, but their budget would need to increase by an order of magnitude.

So, while I'm all for improving things wherever possible, I think we need to measure "good" and "bad" with the right yardstick. I want better, but it's a want, not a requirement.

If BOINC can keep the cache from going empty, and report all work before the deadline, then the network and servers are in fact "good enough."

... and "good enough" really is good enough.


Ok, a more serious but short reply.
* Zeroeth, Matt and friends are doing fine; criticizing the project in a helpful way should not be construed differently.
* First, all these comparison to other projects arguments are specious. We have what we have, regarding resources. Comparisons may suggest our limitations, but another project's reality is not our reality.
* Second, that said, we need to strive to be the best at what we do with what we have. So, once realized, our limitations need to be addressed. In the network reliability context, repeatedly reminding us about how we don't need 6-9's of network reliability is tedious; that point, and the fact that we aren't going to get there within current fiscal limiations, is well understood and accepted. But, pointing out obvious and correctable issues, should help move the project forward. And hiding behind Polly-Anna's petticoat, by denouncing every attempted objective criticism, doesn't help anyone, save for Polly-Anna who might be a bit excited.
- Case in point, anyone looking at the message logs will likely see many places where the connection to project was made but the servers are not responding, followed rapidly by many repeated attempts of this nature. That is an example of network waste commandcentral should be fixing, and it has nothing directly to do with cache size.
- Another example of a correctable problem has to do with the constant topping off of the cache. 300K hosts asking for 3 seconds of work is nonsense. Again, probably nothing to do with cache size.
- And so on.
* Nth, if this project is not in someway 'our' project then why are any of us here? Really, saying otherwise is nonsense. We are partnering with commandcentral. They need us in order to make any progress at all; we need them to provide organization and lead us forward. We share a joint vision of discovery. So I think abdicating our joint ownership is just a childish cop-out.

You are actually making my point, even though you might not think so.

What you're calling "Polly-Anna-ish" is not wishful thinking. You see each attempt to connect that fails as a complete and utter failure, and the fact that it retries is a clear demonstration that it's not.

You are arguing that because BOINC can't connect 100% of the time, that the server side needs to be fixed.

I would argue that the client side of BOINC is a little too aggressive: load could be reduced by some cooperative scheduling, and BOINC owns both halves of the transaction.

I'm also pointing out again that in a perfect world, we would not have to ever retry. We don't live in a perfect world.

The biggest single "issue" when a connection fails is an entry in the logs. The work gets reported eventually.

You apparently missed where I said that better is better, and that we should always aspire to better.

But, it is clearly working, or credit would not be granted.

The model for BOINC is not that different from SMTP: RFC-2821 describes in some detail how a client (everything that sends mail is a client, even if it's a server) should deal with a server being down and unreachable. The big difference between BOINC and the server at your ISP is that you can read the logs under BOINC.
ID: 827900 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 827925 - Posted: 7 Nov 2008, 18:51:14 UTC

The one thing I do agree with (and I do not endorse much tampering with the status quo) would be the suggestion about a little modification of Boinc client work requests......
It is hardly necessary for Boinc to request 1,000 seconds of work to top off a 10 day cache that is otherwise full. It could wait until a certain percentage of the cache needed filling.....not too large, or you could get into problems if the servers happened to be down or out of ready to send when the request was finally made......
As with any other proposed modifications to the Boinc machinery, one would have to assess the impact on those who have a 1 day cache or less, and the impact on every other project under the Boinc umbrella......
IE....what might work for Seti could sometimes raise havoc with other projects.....

And I really think the final answer lies with the Seti infrastructure.....

In other words......fix the broken wheel instead of spending a lot of resources trying to figure out how to make the wagon lighter....
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 827925 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 827930 - Posted: 7 Nov 2008, 19:13:24 UTC

The point is, you have to work out which wheel has the puncture before you try to fix it.

If the problem is the database, then I agree the cache top-ups are a bad idea: why keep hitting the scheduler with lots of little requests, when you could get the job done with one big one?

But if the problem is the download line, then exactly the opposite is true: you want a smooth, even flow down the pipe, not lots of data requests jostling and elbowing each other out of the way. It's the same number of bytes either way in the end.
ID: 827930 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 827941 - Posted: 7 Nov 2008, 19:48:38 UTC - in response to Message 827925.  

The one thing I do agree with (and I do not endorse much tampering with the status quo) would be the suggestion about a little modification of Boinc client work requests......
It is hardly necessary for Boinc to request 1,000 seconds of work to top off a 10 day cache that is otherwise full. It could wait until a certain percentage of the cache needed filling.....not too large, or you could get into problems if the servers happened to be down or out of ready to send when the request was finally made......

Remember that a request for 1 second of work is really a request for just one work unit....

It isn't the 1000 second request though, it is that each server at SETI has some maximum number of requests per second, and that those servers really run best at about 90% of that maximum.

There are two ways (in general) to accomplish that:

1) Increase capacity.

2) Redistribute load.

I like #2 for a couple of different reasons. First, this is a software change that is relatively low cost. Second, it comes out of the BOINC budget, not the SETI budget.

If the BOINC client did a better job of spreading the load, then the projects would all benefit from fewer "difficult" peaks.
ID: 827941 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19401
Credit: 40,757,560
RAC: 67
United Kingdom
Message 828016 - Posted: 7 Nov 2008, 22:46:47 UTC
Last modified: 7 Nov 2008, 22:48:37 UTC

What I and, I think, PhonAcq are seeing and trying to get fixed is the multiple requests, as seen below;

07/11/2008 12:21:39|SETI@home|Sending scheduler request: To fetch work
07/11/2008 12:21:39|SETI@home|Requesting 20516 seconds of new work, and reporting 2 completed tasks
07/11/2008 12:21:58|SETI@home|Computation for task 14oc08ab.5573.72.15.8.73_0 finished
07/11/2008 12:21:58|SETI@home|Starting 13oc08ac.8979.4980.16.8.42_0
07/11/2008 12:21:58|SETI@home|Starting task 13oc08ac.8979.4980.16.8.42_0 using setiathome_enhanced version 528
07/11/2008 12:22:01|SETI@home|[file_xfer] Started upload of file 14oc08ab.5573.72.15.8.73_0_0
07/11/2008 12:22:06|SETI@home|[file_xfer] Finished upload of file 14oc08ab.5573.72.15.8.73_0_0
07/11/2008 12:22:06|SETI@home|[file_xfer] Throughput 21925 bytes/sec
07/11/2008 12:23:10|SETI@home|Scheduler RPC succeeded [server version 603]
07/11/2008 12:23:10|SETI@home|Deferring communication for 11 sec
07/11/2008 12:23:10|SETI@home|Reason: requested by project
07/11/2008 12:23:12|SETI@home|[file_xfer] Started download of file 04oc08aa.6632.12342.7.8.223
07/11/2008 12:23:12|SETI@home|[file_xfer] Started download of file 03oc08aa.7093.37235.9.8.173
07/11/2008 12:23:17|SETI@home|[file_xfer] Finished download of file 04oc08aa.6632.12342.7.8.223
07/11/2008 12:23:17|SETI@home|[file_xfer] Throughput 92767 bytes/sec
07/11/2008 12:23:17|SETI@home|[file_xfer] Started download of file 03oc08aa.7093.37235.9.8.179
07/11/2008 12:23:18|SETI@home|[file_xfer] Finished download of file 03oc08aa.7093.37235.9.8.173
07/11/2008 12:23:18|SETI@home|[file_xfer] Throughput 76726 bytes/sec
07/11/2008 12:23:18|SETI@home|[file_xfer] Started download of file 03oc08aa.7093.37235.9.8.181
07/11/2008 12:23:21|SETI@home|[file_xfer] Finished download of file 03oc08aa.7093.37235.9.8.179
07/11/2008 12:23:21|SETI@home|[file_xfer] Throughput 126935 bytes/sec
07/11/2008 12:23:21|SETI@home|[file_xfer] Started download of file 03oc08aa.7093.37235.9.8.171
07/11/2008 12:23:22|SETI@home|[file_xfer] Finished download of file 03oc08aa.7093.37235.9.8.181
07/11/2008 12:23:22|SETI@home|[file_xfer] Throughput 105021 bytes/sec
07/11/2008 12:23:22|SETI@home|[file_xfer] Started download of file 14oc08ae.15250.8252.8.8.37
07/11/2008 12:23:25|SETI@home|[file_xfer] Finished download of file 03oc08aa.7093.37235.9.8.171
07/11/2008 12:23:25|SETI@home|[file_xfer] Throughput 123185 bytes/sec
07/11/2008 12:23:26|SETI@home|[file_xfer] Finished download of file 14oc08ae.15250.8252.8.8.37
07/11/2008 12:23:26|SETI@home|[file_xfer] Throughput 128533 bytes/sec
07/11/2008 12:23:27|SETI@home|Sending scheduler request: To fetch work
07/11/2008 12:23:27|SETI@home|Requesting 9089 seconds of new work, and reporting 1 completed tasks
07/11/2008 12:23:42|SETI@home|Scheduler RPC succeeded [server version 603]
07/11/2008 12:23:42|SETI@home|Deferring communication for 11 sec
07/11/2008 12:23:42|SETI@home|Reason: requested by project
07/11/2008 12:23:44|SETI@home|[file_xfer] Started download of file 14oc08ae.15250.8252.8.8.249
07/11/2008 12:23:44|SETI@home|[file_xfer] Started download of file 14oc08ae.15250.8252.8.8.235
07/11/2008 12:23:48|SETI@home|[file_xfer] Finished download of file 14oc08ae.15250.8252.8.8.249
07/11/2008 12:23:48|SETI@home|[file_xfer] Throughput 120839 bytes/sec
07/11/2008 12:23:48|SETI@home|[file_xfer] Finished download of file 14oc08ae.15250.8252.8.8.235
07/11/2008 12:23:48|SETI@home|[file_xfer] Throughput 120412 bytes/sec
07/11/2008 12:23:48|SETI@home|[file_xfer] Started download of file 14oc08ae.15250.8252.8.8.232
07/11/2008 12:23:52|SETI@home|[file_xfer] Finished download of file 14oc08ae.15250.8252.8.8.232
07/11/2008 12:23:52|SETI@home|[file_xfer] Throughput 136079 bytes/sec
07/11/2008 12:23:58|SETI@home|Sending scheduler request: To fetch work
07/11/2008 12:23:58|SETI@home|Requesting 329 seconds of new work

Where we have three requests for work in 2m:20s. This repeatedly happens when a task completes quicker than predicted and lowers the TDCF.
There is also a problem that when this happens BOINC requests work during and sometimes before the associated upload, so that even if the original request filled the cache the host is left with an uploaded but not reported task.

My requests to JM7 have been to have some sort of hysteresis in requests for work and for work requests delayed for a short period after a task completes.

The argument against hysteresis that has been raised is that users want their cache filled at all times. Even if this causes server overload and then can unbalance resource shares when initial project cannot supply and the client then goes to another project for work.
ID: 828016 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 828080 - Posted: 8 Nov 2008, 1:07:05 UTC - in response to Message 828016.  

What I and, I think, PhonAcq are seeing and trying to get fixed is the multiple requests, as seen below;

I understand that: you want the scheduler to have a little bit of "deadband" so that it doesn't want to connect as frequently.

You're looking at the top of one component, and saying "we could optimize this function."

I'm actually looking at this a layer lower.

I'm saying "the BOINC client connects to servers (upload server, download server and scheduling server) and is very aggressive in talking to them."

So, lets' say that BOINC wants to download work. What would happen if, just before it started a download it simply decided not to? If it got ready to grab the file and then said "oh, I'm going to wait..."

What if it just plain skipped 3 attempts out of every 4. How would that affect the BOINC servers?

Taking your scheduler example, BOINC wants to request work, so it gets ready to request, and then "rolls the dice" and skips the request.

I used the example of only doing 1 in 4 because if you haven't seen variable persistence in action, it seems more comfortable. In reality, a persistence of 1 in 20 is probably a good target.
ID: 828080 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 828093 - Posted: 8 Nov 2008, 1:21:13 UTC - in response to Message 828016.  

What I and, I think, PhonAcq are seeing and trying to get fixed is the multiple requests, as seen below;

07/11/2008 12:21:39|SETI@home|Sending scheduler request: To fetch work
07/11/2008 12:21:39|SETI@home|Requesting 20516 seconds of new work, and reporting 2 completed tasks
07/11/2008 12:21:58|SETI@home|Computation for task 14oc08ab.5573.72.15.8.73_0 finished
07/11/2008 12:21:58|SETI@home|Starting 13oc08ac.8979.4980.16.8.42_0
07/11/2008 12:21:58|SETI@home|Starting task 13oc08ac.8979.4980.16.8.42_0 using setiathome_enhanced version 528
07/11/2008 12:22:01|SETI@home|[file_xfer] Started upload of file 14oc08ab.5573.72.15.8.73_0_0
07/11/2008 12:22:06|SETI@home|[file_xfer] Finished upload of file 14oc08ab.5573.72.15.8.73_0_0
07/11/2008 12:22:06|SETI@home|[file_xfer] Throughput 21925 bytes/sec
07/11/2008 12:23:10|SETI@home|Scheduler RPC succeeded [server version 603]
07/11/2008 12:23:10|SETI@home|Deferring communication for 11 sec
07/11/2008 12:23:10|SETI@home|Reason: requested by project
07/11/2008 12:23:12|SETI@home|[file_xfer] Started download of file 04oc08aa.6632.12342.7.8.223
07/11/2008 12:23:12|SETI@home|[file_xfer] Started download of file 03oc08aa.7093.37235.9.8.173
07/11/2008 12:23:17|SETI@home|[file_xfer] Finished download of file 04oc08aa.6632.12342.7.8.223
07/11/2008 12:23:17|SETI@home|[file_xfer] Throughput 92767 bytes/sec
07/11/2008 12:23:17|SETI@home|[file_xfer] Started download of file 03oc08aa.7093.37235.9.8.179
07/11/2008 12:23:18|SETI@home|[file_xfer] Finished download of file 03oc08aa.7093.37235.9.8.173
07/11/2008 12:23:18|SETI@home|[file_xfer] Throughput 76726 bytes/sec
07/11/2008 12:23:18|SETI@home|[file_xfer] Started download of file 03oc08aa.7093.37235.9.8.181
07/11/2008 12:23:21|SETI@home|[file_xfer] Finished download of file 03oc08aa.7093.37235.9.8.179
07/11/2008 12:23:21|SETI@home|[file_xfer] Throughput 126935 bytes/sec
07/11/2008 12:23:21|SETI@home|[file_xfer] Started download of file 03oc08aa.7093.37235.9.8.171
07/11/2008 12:23:22|SETI@home|[file_xfer] Finished download of file 03oc08aa.7093.37235.9.8.181
07/11/2008 12:23:22|SETI@home|[file_xfer] Throughput 105021 bytes/sec
07/11/2008 12:23:22|SETI@home|[file_xfer] Started download of file 14oc08ae.15250.8252.8.8.37
07/11/2008 12:23:25|SETI@home|[file_xfer] Finished download of file 03oc08aa.7093.37235.9.8.171
07/11/2008 12:23:25|SETI@home|[file_xfer] Throughput 123185 bytes/sec
07/11/2008 12:23:26|SETI@home|[file_xfer] Finished download of file 14oc08ae.15250.8252.8.8.37
07/11/2008 12:23:26|SETI@home|[file_xfer] Throughput 128533 bytes/sec
07/11/2008 12:23:27|SETI@home|Sending scheduler request: To fetch work
07/11/2008 12:23:27|SETI@home|Requesting 9089 seconds of new work, and reporting 1 completed tasks
07/11/2008 12:23:42|SETI@home|Scheduler RPC succeeded [server version 603]
07/11/2008 12:23:42|SETI@home|Deferring communication for 11 sec
07/11/2008 12:23:42|SETI@home|Reason: requested by project
07/11/2008 12:23:44|SETI@home|[file_xfer] Started download of file 14oc08ae.15250.8252.8.8.249
07/11/2008 12:23:44|SETI@home|[file_xfer] Started download of file 14oc08ae.15250.8252.8.8.235
07/11/2008 12:23:48|SETI@home|[file_xfer] Finished download of file 14oc08ae.15250.8252.8.8.249
07/11/2008 12:23:48|SETI@home|[file_xfer] Throughput 120839 bytes/sec
07/11/2008 12:23:48|SETI@home|[file_xfer] Finished download of file 14oc08ae.15250.8252.8.8.235
07/11/2008 12:23:48|SETI@home|[file_xfer] Throughput 120412 bytes/sec
07/11/2008 12:23:48|SETI@home|[file_xfer] Started download of file 14oc08ae.15250.8252.8.8.232
07/11/2008 12:23:52|SETI@home|[file_xfer] Finished download of file 14oc08ae.15250.8252.8.8.232
07/11/2008 12:23:52|SETI@home|[file_xfer] Throughput 136079 bytes/sec
07/11/2008 12:23:58|SETI@home|Sending scheduler request: To fetch work
07/11/2008 12:23:58|SETI@home|Requesting 329 seconds of new work

Where we have three requests for work in 2m:20s. This repeatedly happens when a task completes quicker than predicted and lowers the TDCF.
There is also a problem that when this happens BOINC requests work during and sometimes before the associated upload, so that even if the original request filled the cache the host is left with an uploaded but not reported task.

My requests to JM7 have been to have some sort of hysteresis in requests for work and for work requests delayed for a short period after a task completes.

The argument against hysteresis that has been raised is that users want their cache filled at all times. Even if this causes server overload and then can unbalance resource shares when initial project cannot supply and the client then goes to another project for work.



The baseline here is that the servers need to be able to handle the load, and not Boinc.......

This is only gonna get worse as the user base and computing power increases.....

It's the servers that need to be fixed and/or upgraded.

Eric and Matt are doing the best with what they have available......there is no mistake about that.

If anybody thinks there is............please buy a ticket to Berkeley and help them sort out the server settings.........

I am not saying that they don't know what they are doing, but there is always somebody who knows the insides of things better than anybody else.......

If you are that person.........please step up to the plate.....

I am sorry that I am not that person, or I would be on a flight to CA right now.........
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 828093 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19401
Credit: 40,757,560
RAC: 67
United Kingdom
Message 828132 - Posted: 8 Nov 2008, 2:03:52 UTC - in response to Message 828080.  

What I and, I think, PhonAcq are seeing and trying to get fixed is the multiple requests, as seen below;

I understand that: you want the scheduler to have a little bit of "deadband" so that it doesn't want to connect as frequently.

You're looking at the top of one component, and saying "we could optimize this function."

I'm actually looking at this a layer lower.

I'm saying "the BOINC client connects to servers (upload server, download server and scheduling server) and is very aggressive in talking to them."

So, lets' say that BOINC wants to download work. What would happen if, just before it started a download it simply decided not to? If it got ready to grab the file and then said "oh, I'm going to wait..."

What if it just plain skipped 3 attempts out of every 4. How would that affect the BOINC servers?

Taking your scheduler example, BOINC wants to request work, so it gets ready to request, and then "rolls the dice" and skips the request.

I used the example of only doing 1 in 4 because if you haven't seen variable persistence in action, it seems more comfortable. In reality, a persistence of 1 in 20 is probably a good target.

If I understand you correctly, then the net effect would be the same. If the request is declined by your method, this time it may or may not be the next time.
As there are two actions, requests and reports, that consume server actions what happens on reports, and probably more complex, what happens when the client reports and it also calculates the cache needs a top up. Especially as it would appear a significant proportion of users want reporting to be done asap.
ID: 828132 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19401
Credit: 40,757,560
RAC: 67
United Kingdom
Message 828145 - Posted: 8 Nov 2008, 2:34:35 UTC - in response to Message 828093.  


The baseline here is that the servers need to be able to handle the load, and not Boinc.......

The problem is that it is BOINC that controls the processes that define the load.
This is only gonna get worse as the user base and computing power increases.....

That's true.

It's the servers that need to be fixed and/or upgraded.

So if BOINC could be made to slow down requests and reports at peak time, then it is quite probable that the present servers at Berkeley could handle the load.
And the servers can be replaced/upgraded at a slower pace.

Eric and Matt are doing the best with what they have available......there is no mistake about that.
Very true.

If anybody thinks there is............please buy a ticket to Berkeley and help them sort out the server settings.........

I am not saying that they don't know what they are doing, but there is always somebody who knows the insides of things better than anybody else.......

If you are that person.........please step up to the plate.....

I am sorry that I am not that person, or I would be on a flight to CA right now.........


Unfortunately that person is not me, I'm just an electronics guy, who happens to be quite good at problem solving, or so they say.
ID: 828145 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 828147 - Posted: 8 Nov 2008, 2:42:56 UTC - in response to Message 828145.  


The baseline here is that the servers need to be able to handle the load, and not Boinc.......

The problem is that it is BOINC that controls the processes that define the load.
This is only gonna get worse as the user base and computing power increases.....

That's true.

It's the servers that need to be fixed and/or upgraded.

So if BOINC could be made to slow down requests and reports at peak time, then it is quite probable that the present servers at Berkeley could handle the load.
And the servers can be replaced/upgraded at a slower pace.

Eric and Matt are doing the best with what they have available......there is no mistake about that.
Very true.

If anybody thinks there is............please buy a ticket to Berkeley and help them sort out the server settings.........

I am not saying that they don't know what they are doing, but there is always somebody who knows the insides of things better than anybody else.......

If you are that person.........please step up to the plate.....

I am sorry that I am not that person, or I would be on a flight to CA right now.........


Unfortunately that person is not me, I'm just an electronics guy, who happens to be quite good at problem solving, or so they say.

It just needs to be fixed on the server end.......and you know it....
All the tweaking in the world on the user end of things might fix the situation for the short term, but the only answer is to have enough capacity available to handle the demand.......and you all know it.


So how 'bout we stop talking about theoretical fixes and start talking about getting hardware that will handle the problem??
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 828147 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 828149 - Posted: 8 Nov 2008, 2:47:07 UTC - in response to Message 828132.  

What I and, I think, PhonAcq are seeing and trying to get fixed is the multiple requests, as seen below;

I understand that: you want the scheduler to have a little bit of "deadband" so that it doesn't want to connect as frequently.

You're looking at the top of one component, and saying "we could optimize this function."

I'm actually looking at this a layer lower.

I'm saying "the BOINC client connects to servers (upload server, download server and scheduling server) and is very aggressive in talking to them."

So, lets' say that BOINC wants to download work. What would happen if, just before it started a download it simply decided not to? If it got ready to grab the file and then said "oh, I'm going to wait..."

What if it just plain skipped 3 attempts out of every 4. How would that affect the BOINC servers?

Taking your scheduler example, BOINC wants to request work, so it gets ready to request, and then "rolls the dice" and skips the request.

I used the example of only doing 1 in 4 because if you haven't seen variable persistence in action, it seems more comfortable. In reality, a persistence of 1 in 20 is probably a good target.

If I understand you correctly, then the net effect would be the same. If the request is declined by your method, this time it may or may not be the next time.
As there are two actions, requests and reports, that consume server actions what happens on reports, and probably more complex, what happens when the client reports and it also calculates the cache needs a top up. Especially as it would appear a significant proportion of users want reporting to be done asap.

There is a post Eric or Matt around someplace that discusses this. The conclusion is that the more things that get put into a single update (requests and reports) the better. Much of the cost is per update, and there is a smaller cost per request and report.


BOINC WIKI
ID: 828149 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 828159 - Posted: 8 Nov 2008, 3:08:32 UTC - in response to Message 828093.  

The baseline here is that the servers need to be able to handle the load, and not BOINC.......


But the servers *are* BOINC. The client is *also* BOINC.

There is a huge opportunity here as a result: Slow the clients down, get more successful transactions, more success means less wasted bandwidth/CPU cycles, means everything gets FASTER.
ID: 828159 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19401
Credit: 40,757,560
RAC: 67
United Kingdom
Message 828163 - Posted: 8 Nov 2008, 3:17:21 UTC - in response to Message 828159.  

Couldn't have said that better if I tried.

But we have to remember we have to slow down reports as well as requests.

@JM7

Were you thinking of Rom's BOINC Client: The evils of 'Returning Results Immediately'
ID: 828163 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 828176 - Posted: 8 Nov 2008, 4:14:54 UTC

One point: the ul/dl pings that bother me are using the bandwidth to the boinc/seti servers. At one time that was being pegged near 100%. Matt's -allapps switch seems to have reduced the load to around 80%. If I understand all this, then we are wasting bandwidth, but it is no longer an obvious bottleneck. So one must look upstream more. Maybe Matt can find an analogous -allapps switch??
ID: 828176 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Technical News : Composite Head (Nov 05 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.