Working as Expected (Jul 13 2009)

Message boards : Technical News : Working as Expected (Jul 13 2009)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

AuthorMessage
Dave

Send message
Joined: 13 Jul 09
Posts: 1
Credit: 5,218
RAC: 0
United States
Message 918452 - Posted: 16 Jul 2009, 14:16:17 UTC

I have a simple mind and I am not following this discussion too well ;-\
What is the estimate for me to be able to upload completed work units??
Or should I suspend processing until uploads are again functioning??

Many thanks,
Dave G.
"Per Ardua, ad Astra"[/img]
ID: 918452 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 918453 - Posted: 16 Jul 2009, 14:21:30 UTC - in response to Message 918452.  

I have a simple mind and I am not following this discussion too well ;-\
What is the estimate for me to be able to upload completed work units??
Or should I suspend processing until uploads are again functioning??

Many thanks,
Dave G.
"Per Ardua, ad Astra"[/img]


Wecome to the message boards.

Uploads are getting through, although it's still a bit patchy.

No need to suspend processing,things should be improving with time.
Flying high with Team Sicituradastra.
ID: 918453 · Report as offensive
lobozmarcin
Volunteer tester

Send message
Joined: 19 Jul 02
Posts: 1
Credit: 1,421
RAC: 0
Poland
Message 918455 - Posted: 16 Jul 2009, 14:39:10 UTC

I have problem:

2009-07-16 15:27:18|SETI@home|Backing off 3 hr 39 min 10 sec on upload of 01dc08ad.14412.20113.8.8.66_1_0
2009-07-16 15:27:19||Internet access OK - project servers may be temporarily down.
2009-07-16 15:27:36|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
2009-07-16 15:27:41|SETI@home|Scheduler request completed: got 0 new tasks


3 days this same problem
ID: 918455 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 918459 - Posted: 16 Jul 2009, 14:49:46 UTC - in response to Message 918455.  

I have problem:

2009-07-16 15:27:18|SETI@home|Backing off 3 hr 39 min 10 sec on upload of 01dc08ad.14412.20113.8.8.66_1_0
2009-07-16 15:27:19||Internet access OK - project servers may be temporarily down.
2009-07-16 15:27:36|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
2009-07-16 15:27:41|SETI@home|Scheduler request completed: got 0 new tasks


3 days this same problem


Wecome to the message boards.

Your work is slowly getting through, 16 tasks have uploaded and reported today (UTC time)

After a few more have uploaded you will get some new work.

Flying high with Team Sicituradastra.
ID: 918459 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 918460 - Posted: 16 Jul 2009, 14:53:13 UTC - in response to Message 918422.  
Last modified: 16 Jul 2009, 14:53:52 UTC

But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive.

Why would they need to be decompressed on Bruno? As zip they take up less space on the disk as well as in throughput. I would expect that once you really need the results that you can import them on whatever server you're using to look at them and decompress them there. Or is that too simple?
ID: 918460 · Report as offensive
CryptokiD
Avatar

Send message
Joined: 2 Dec 00
Posts: 150
Credit: 3,216,632
RAC: 0
United States
Message 918468 - Posted: 16 Jul 2009, 15:30:58 UTC
Last modified: 16 Jul 2009, 15:34:54 UTC

you all see those short time work units? they take only a few minutes for the cpu or gpu to complete. why not stack those shorties into a big zipped file and send one single to a user. it would reduce the number of simultaneous connections to the server AND client if you had 40 or 50 work units in 1 large compressed file. when the client is done with the work units they can compress them back into one file and send it back to the server.

if a client got bored and only managed to crunch half the work units, it could still compress the half it finished into one file and send that. when the file gets to the server it would decompress the file into the invidivual work units, and count and file away the ones that made it back, and re issue work units for the ones which the client did not complete or error ed out.

this would greatly reduce the amount of hammering the seti servers see from clients begging to upload or download. instead of seeing thousands of little files comming and going, we would see hundreds of large files which means reduced inbound and outbound connections. the 100mbit pipe would still get saturated to capacity, but at least the number of connections would decrease.


you could even go so far as to limit the number of downloads an individual client is allowed per day. set it at for example, 2. only twice a day can a client request a new compressed stack of work units, and only if he has sent the previous ones back already. the boinc app only lets you download up to 100 wu's a day when you first join. why not compress those 100 into 1 file, send it out and hope you get some of it back a week later. and just like the current boinc app, if a client shows that it can handle 100 a day, then the number could gradually increase.

again, it wouldnt solve the bandwidth issue, but it would greatly reduce the number of connection attempts, which are in and of themselve bandwidth hogs.

on the computers on my account alone they are trying to upload a completed work unit every few seconds. this could be reduced to 2x per computer per day.
ID: 918468 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 918474 - Posted: 16 Jul 2009, 16:10:33 UTC - in response to Message 918441.  

One thought I have had.....

BUT it would require a change to the Boinc client software.

I'll throw it in the ring anyway

It seems a lot of the problem is the continual hammering of the upload server with attempt to upload by each result individually.

Why not get Boinc to apply the backoff to ALL results attempting to upload to that SAME server that caused the initial backoff.

This would mean having a backoff clock for each upload server, instead of for each result.

This would mean just one or two (whatever your # of simultaneous tranfers setting) results would make the attempt, then the rest of the results waiting (up to 1000's in some cases) would be backed off as well and give the servers a breather.

Not being a programmer, I'm not sure how difficult this would be to implement (proverbially it doesn't seem like it would be to me), and the benefits of reduced bandwidth wasting should be substantial.

Please feel free to comment.


We were just talking about that last night in the panic thread here. Seems that something like that might be comming. :D
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 918474 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 918475 - Posted: 16 Jul 2009, 16:14:21 UTC - in response to Message 918460.  

But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive.

Why would they need to be decompressed on Bruno? As zip they take up less space on the disk as well as in throughput. I would expect that once you really need the results that you can import them on whatever server you're using to look at them and decompress them there. Or is that too simple?

The advantage of decompressing them on Bruno is that they end up in exactly the same place as they would have done under the existing upload handler: no change is required to all the cross-mounted complexity of the SETI file system. And the files are available immediately and individually: you don't have to teach the validator how to go fossicking around in any number of zip archives when it needs a file.

And that also makes the whole system reversible: if something goes wrong with the remote concentrator, use DNS to point all of us back to Bruno. Gets sticky again, of course, but lets the project limp along until the concentrator is revived.
ID: 918475 · Report as offensive
Aurora Borealis
Volunteer tester
Avatar

Send message
Joined: 14 Jan 01
Posts: 3075
Credit: 5,631,463
RAC: 0
Canada
Message 918484 - Posted: 16 Jul 2009, 17:01:45 UTC - in response to Message 918441.  
Last modified: 16 Jul 2009, 17:08:10 UTC

One thought I have had.....

BUT it would require a change to the Boinc client software.

I'll throw it in the ring anyway

It seems a lot of the problem is the continual hammering of the upload server with attempt to upload by each result individually.

Why not get Boinc to apply the backoff to ALL results attempting to upload to that SAME server that caused the initial backoff.

This would mean having a backoff clock for each upload server, instead of for each result.

This would mean just one or two (whatever your # of simultaneous tranfers setting) results would make the attempt, then the rest of the results waiting (up to 1000's in some cases) would be backed off as well and give the servers a breather.

Not being a programmer, I'm not sure how difficult this would be to implement (proverbially it doesn't seem like it would be to me), and the benefits of reduced bandwidth wasting should be substantial.

Please feel free to comment.

This idea has already been checked in by the Boinc developer. It will probably be incorporated in the next version released for testing. As I understand it, this was tried a couple of years ago with some negative effect that will need to be looked at again.

EDIT:Ooops, should have read the top of the thread before replying.
ID: 918484 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 918485 - Posted: 16 Jul 2009, 17:03:05 UTC - in response to Message 918474.  

We were just talking about that last night in the panic thread here. Seems that something like that might be comming. :D



Thanks HAL9000, I had not seen that thread yet. Looks like may be good news. :)

Flying high with Team Sicituradastra.
ID: 918485 · Report as offensive
Anthony Liggins

Send message
Joined: 23 Aug 99
Posts: 14
Credit: 609,816
RAC: 0
United Kingdom
Message 918488 - Posted: 16 Jul 2009, 17:11:57 UTC - in response to Message 917472.  

Since CUDA was introduced into the project, the Boinc servers have been taking on an ever increasing load, as people populate there spare PCIE slots with extra GPU's adding an extra 112 or more cores each time they do that, your bandwidth woes will only increase exponentially.

Seti@home has become a victim of it’s own success where CUDA is concerned, the best thing to do here is to limit the amount each GPU can download each day through the web interface, cutting it by one third or one half will free up a good portion of bandwidth. This will also decrease the load on the backend as you will not need to create so many multibeam Wu's, increasing the chirp rate will affect the slower CPU’s far more than GPU’s. :-(

I have been browsing stats, and looking at computers attached to Boinc, and I have notice fellow participants who have between 1500 to 5000 wu's downloaded onto there pc's, I would consider this somewhat excessive, this is why I am making this suggestion.

Once implemented you should notice a difference within 24 hours, then hopefully people will not feel so frustrated when trying to upload there finished Wu's. This then will limit the amount of people going red in the face and blowing off steam on this forum, well maybe until the next managed emergency comes along, participants need to remember science does not hurt anyone if it is running late.

Anthony.

A 10 year veteran of Seti@home :-)
ID: 918488 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 918489 - Posted: 16 Jul 2009, 17:13:14 UTC - in response to Message 918475.  

you don't have to teach the validator how to go fossicking around in any number of zip archives when it needs a file.

Install Windows XP or above to do the looking into that archive. ;-)

Just kidding!
I forgot about the validator needing to be able to check the contents. Smacks head.
ID: 918489 · Report as offensive
Profile skildude
Avatar

Send message
Joined: 4 Oct 00
Posts: 9541
Credit: 50,759,529
RAC: 60
Yemen
Message 918490 - Posted: 16 Jul 2009, 17:21:04 UTC

Matt mentioned that they are starting to run low on work. I wonder how much of the old tapes are available to run Astropulse. Obviously the regular S@H has been run on the old (1999-start of astropulse) tapes but there are years of tapes to run astropulse on. Does this have to do with the RFI and other factors previously mentioned that they aren't available?


In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope
ID: 918490 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 918493 - Posted: 16 Jul 2009, 17:28:58 UTC - in response to Message 918490.  

Matt mentioned that they are starting to run low on work. I wonder how much of the old tapes are available to run Astropulse. Obviously the regular S@H has been run on the old (1999-start of astropulse) tapes but there are years of tapes to run astropulse on. Does this have to do with the RFI and other factors previously mentioned that they aren't available?

Yes, it does have to do with the RFI/radar. A while back, the recorder was set up so that one of the 14 channels of data holds the "chirping" of the radar, and the remaining channels have the data that gets cut into WUs. That's what I think I remember reading a long time ago.

Actually, what I remember reading is that we were only using 12 channels at the time, so there were two free channels left, so one was used for the radar chirping, and the other was still available for future use. No idea where to even try to find that reference now.

But at any rate, before that 13th channel was used for radar chirping, there is no way of knowing where the chirps actually are, and that's where the software radar blanker that Matt has been working on comes into play. Once he gets that up and running, it can pre-process the older tapes, find where it thinks the radar is, and fill that 13th channel with the chirps so the splitters can do what they normally do.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 918493 · Report as offensive
Nicolas
Avatar

Send message
Joined: 30 Mar 05
Posts: 161
Credit: 12,985
RAC: 0
Argentina
Message 918523 - Posted: 16 Jul 2009, 19:56:50 UTC - in response to Message 918399.  

It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error.

It's possible for the scheduler to ignore reports if the file wasn't uploaded yet; and the client would just keep them queued for a while longer and try reporting them later. This can be done for individual workunits (there is a separate 'ack' for each which the client must receive before it gets rid of the task locally).

Contribute to the Wiki!
ID: 918523 · Report as offensive
Nicolas
Avatar

Send message
Joined: 30 Mar 05
Posts: 161
Credit: 12,985
RAC: 0
Argentina
Message 918525 - Posted: 16 Jul 2009, 20:05:07 UTC - in response to Message 918441.  

Why not get Boinc to apply the backoff to ALL results attempting to upload to that SAME server that caused the initial backoff.

Already implemented, see http://boinc.berkeley.edu/trac/changeset/18593.

There hasn't yet been a new version release since then, though.


Contribute to the Wiki!
ID: 918525 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 918528 - Posted: 16 Jul 2009, 20:14:45 UTC - in response to Message 918525.  

You know, some people had pointed that out already in this same thread... ;-)
ID: 918528 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918557 - Posted: 16 Jul 2009, 22:33:07 UTC - in response to Message 918399.  

Would an set of remote upload servers as "data aggregators" work?

It is an interesting idea.

The biggest single issue as I see it:

Work is uploaded, and the moment the upload completes it is available for processing on the upload server.

Then, the result is reported. At this point, it is marked in the database as received, and subject to validation.

The validator doesn't have to check to see if the result is in local storage, because it is in local storage by definition.

This change means you have a new state: reported but not in local storage.

BOINC would have to know about that, and have some way of dealing with it (rescanning the database and checking to see if the result is actually here), probably by making the "unzip" process on the upload server report.

There is also a chance that the result gets lost between the off-site server and the "true" upload server.

I like the idea of doing just one, near Berkeley.

What I'm not sure about: the change that Eric made to shorten the "pending connection" queue suggests that the number of simultaneous connections is a big issue, this just moves that issue from the upload server to the server near the edge.

... but, a better idea (related to the thread which I haven't worked my way through) might be to zip all of the pending uploads into one file. All the client really needs to know is what is in the zip -- then let that go to Bruno.

The downside is that you have to push all of the work through in one session, and the bigger the .zip file the more bytes/packets you have to push through in a row....

ID: 918557 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 918560 - Posted: 16 Jul 2009, 22:45:39 UTC - in response to Message 918557.  

What I'm not sure about: the change that Eric made to shorten the "pending connection" queue suggests that the number of simultaneous connections is a big issue, this just moves that issue from the upload server to the server near the edge.

Previous observations, over numerous surges/dips, is that the number of simultaneous connections only becomes a problem when it coincides with an extremely heavy (93+ Mbit, 98% utilisation) download demand. Supposition has been that this is link saturation with protocol packets instead of data packets. If the protocol packets can be intercepted at the bottom of the hill, the theorey is that there's some gain to be had.
ID: 918560 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918561 - Posted: 16 Jul 2009, 22:57:25 UTC - in response to Message 918560.  

What I'm not sure about: the change that Eric made to shorten the "pending connection" queue suggests that the number of simultaneous connections is a big issue, this just moves that issue from the upload server to the server near the edge.

Previous observations, over numerous surges/dips, is that the number of simultaneous connections only becomes a problem when it coincides with an extremely heavy (93+ Mbit, 98% utilisation) download demand. Supposition has been that this is link saturation with protocol packets instead of data packets. If the protocol packets can be intercepted at the bottom of the hill, the theorey is that there's some gain to be had.

The interesting thing that we saw when Eric made his change was a sudden, dramatic increase in bandwidth used, from somewhere around 40 megabits to something near 90 megabits -- Eric said "tripled."

In other words, we were under 50% utilization when the servers were flooded with queued connections.

I'm not really disagreeing, I'm just saying that the server out on the edge is going to be subject to all of the problems Bruno faces now -- and be more accessible.

One change from your design that I would make: I would try to keep two connections going at speed at all times, so that if one connection stalled for any reason the other could use that bandwidth -- and each time a transfer completes, I'd start making a new .zip file, instead of doing it hourly or somesuch.
ID: 918561 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 11 · Next

Message boards : Technical News : Working as Expected (Jul 13 2009)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.