Message boards :
Number crunching :
The spoofed client - Whatever about
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I remember reading somewhere the main reason for the limit was to stop having so many outstanding unused task assignments in the Database. Previously anyone could download up to 10000 tasks at a time. The problem was, there was a number of people who would download 10000 tasks and then never run BOINC again, causing those 10000 tasks to remain in the database for months before timing out. That's the reason I remember for the limit being put into place. To prevent the "hit and run" hosts at the beginning of the project when the Setiathome project had wide publication and the general public was introduced to distributed computing for the first time. So lots of people joined the project, downloaded tasks and then never ran the client to process the work or simply used Windows app removal tool to remove the application, but then orphan 10,000 task in the database, taking up space and then not clearing out for 7 weeks. If a participant is serious about crunching for the project and not just some passing fad, they would continue to crunch and then get more work. In the days of cpu only processing, a 100 tasks took several days to complete. The limit has never been updated to recognize the primary producers are gpus which process work in fractions of the time a cpu takes. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
How about another version of people's recollections? It wasn't about people that signed up & didn't return work. It wasn't about the amount of work returned after an outage. It wasn't about systems pumping out invalid/erroneous work. The issue was the large amount of work in progress. The sever systems at the time just couldn't cope with the amount of work in progress, as many people had 10+10 caches giving them 20 days worth of work. Unfortunately all that work in progress kept bringing the servers either crashing down or just grinding to a halt. The changes made some time ago that reduced the lengths of the outages may or may not have improved the server systems ability to cope with large amounts of work in progress, but as the limits haven't been changed there's no way to know for sure. However; given the fact that we are often seeing diminished splitter output, resulting in occasional periods of not being able to get work on some requests, due to issues with the file deleters & the subsequent purging not being able to keep up makes it highly likely that increasing the amount of work a system can carry would just lead to further server system issues. Maybe once the new upload server (which also runs file deleters and validators) comes online the file delete/purge/splitter output issues will be resolved, and then we can revisit the server side limits. Yes, the spoofed clients do allow a greater crunching contribution. Yes the spoofed clients are responsible for increasing the load on the servers. As would be the case for more people joining up, which has been stated in Seti mailouts as necessary in order to process all the data so far received from the telescopes (Each spoofed client, depending on the value used, is equivalent to 22 to 30 systems with 1 GPU for the work in progress load). As for me- while I would like to be able to crunch all the time, even through extended outages, if I run out of work, then I run out of work and my systems get a break. Others put them towards backup projects. The fact is that while I don't like the server side work limits, the project has them in place for a reason, which they have explained why & I understand, so I personally don't feel it is right to work around them. But that's just how I feel about the situation. *shrug* Grant Darwin NT |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The issue was the large amount of work in progress...Yes, I believe that falls under the "outstanding unused task assignments in the Database" purview. After thinking about it, I remembered the distinct relationship between the 'In progress' & 'Validation pending' columns. Raising or lowering the first will result in the second rising or falling by the nearly the same amount with very little change in the column that matters, the ALL column. This is probably due to the Host running through almost 3 times the current 'In Progress' number a day. In fact, you might not see a significant change in the 'ALL' column until the 'In Progress' number exceeds the 'Number of tasks today' number for a Full day. Hmmm, I'll have to test that theory, it sounds interesting. One sure way to reduce the "outstanding unused task assignments in the Database" number would be to change the Maximum cache allowed to One Day for All Hosts. That wouldn't be so bad, considering the faster Hosts only have a Few Hours worth of cache the way things are set. I mean, why should some people have Days of cache while others only have Hours? Sound fair to Me... <nods head> |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
I mean, why should some people have Days of cache while others only have Hours? Sound fair to Me... And by the same argument there shouldn't be any optimised applications because it means they will run out of work in outages. Nor should there be any recent high performance hardware used for crunching. Sorry, but facetious arguments don't do anything to support your position, this one actually counters it and supports the other view. Grant Darwin NT |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
You seem totally oblivious to the fact SETI is crying for MORE WORK to be accomplished. Only way to produce More work is to keep the machines running, and obtain Faster machines, which burn through More work. Why have work sitting unused for Days on some machines while others are Out of Work? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
You seem totally oblivious to the fact SETI is crying for MORE WORK to be accomplished.And it would appear you didn't read all of my original post. I suggest you re-read my original post again. All of it. To make it easier, I draw your attention to the 3rd paragraph. Yes, the spoofed clients do allow a greater crunching contribution. Yes the spoofed clients are responsible for increasing the load on the servers. As would be the case for more people joining up, which has been stated in Seti mailouts as necessary in order to process all the data so far received from the telescopes (Each spoofed client, depending on the value used, is equivalent to 22 to 30 systems with 1 GPU for the work in progress load). Grant Darwin NT |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
So, why are you complaining about a suggestion that would lower the total amount of work outstanding, while allowing the faster machines to work unimpeded? |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30639 Credit: 53,134,872 RAC: 32 |
I remember reading somewhere the main reason for the limit was to stop having so many outstanding unused task assignments in the Database. Previously anyone could download up to 10000 tasks at a time. The problem was, there was a number of people who would download 10000 tasks and then never run BOINC again, causing those 10000 tasks to remain in the database for months before timing out. Seti Classic had a one work unit limit. You had to return it before you could get another. Huge numbers of users were on dialup. Outages could last weeks. Somehow I can't agree with your recollection. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
So, why are you complaining about a suggestion that would lower the total amount of work outstanding, while allowing the faster machines to work unimpeded? Sure, if everyone had the same daily limit, that would be best, but that wouldn't be the case, and you would then be limiting slower systems. And given the work slower systems can do, and what the faster systems can do, the reduction in the slower systems caches wouldn't be nearly enough to fill the faster system's caches. Seti has never guaranteed work 24/7/365, and one of the main aim's of Seti is to make it possible for the widest number of systems to participate. If you choose to get more powerful hardware, fine, If you choose to develop and use more efficient applications, fine. But when you start to impact on others that don't choose to do the same as you, that is not fine. Suggesting the amount of work slower systems can get should be limited, so you can continue to run spoofed clients, is not fine. If I run out of work during an outage, I run out of work. It's not a big deal. Other's choose to run backup projects. Yeah, i'd like to be processing work 24/7/365, but when stuff happens and i'm out of work- it's not a big deal. I don't see why it's such a big deal for you? Grant Darwin NT |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It's a numbers game. Consider 80000 machines with say 100 unused tasks each on them. Those tasks would easily compensate for the few extra hours the say 100 faster machines would use Why would setting the cache at one day limit the slower machines? The only change would be fewer unused tasks sitting on them. Actually I don't care if everyones machines goes down a few hours a day, I just thought it would be nice to try to Help SETI when they ask for more production. As I see it the tasks are available, they just need to be available to the machines that can use them. Right now that doesn't appear to be a problem, I certainly don't see SETI complaining about tasks being made available to the machines that need them. The only complaints I see are from a few private individuals on this board. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
It's a numbers game. Consider 80000 machines with say 100 unused tasks each on them. Those tasks would easily compensate for the few extra hours the say 100 faster machines would use Why would setting the cache at one day limit the slower machines? The only change would be fewer unused tasks sitting on them.Odd terminology you use there- "Unused." So by your very definition, spoofed clients have thousands of unused tasks. That's a whole lot more than a slow system. Please- Be honest and call them what they are; a buffer against server issues, they are processed so they are not "unused." Actually I don't care if everyones machines goes down a few hours a day, I just thought it would be nice to try to Help SETI when they ask for more production.You have helped by providing Petri's application and making it easier to setup. But if down time wasn't an issue, there wouldn't be spoofing. It was the whole reason it came about- to overcome the server side limits and allow faster systems to remain productive through outages. As I see it the tasks are available, they just need to be available to the machines that can use them. Right now that doesn't appear to be a problem, I certainly don't see SETI complaining about tasks being made available to the machines that need them. The only complaints I see are from a few private individuals on this board.And here I am repeating myself yet again- Spoofed clients have the same effect as 22-30 individual systems with 1 GPU on the server load. I don't like the server side limits, but I understand why they were introduced, so I go along with that decision (I can't compile a spoofed client, but I am aware of the .xml file workaround and I don't make use of it). Until we find ET, the servers all collapse under the load, or we find out what is required & get hardware capable of meeting the load of 4day+ caches without server side limits, things are what they are. Such is life. Grant Darwin NT |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
(I can't compile a spoofed client) Why not? The instructions are here. https://setiweb.ssl.berkeley.edu/sah_porting.php https://boinc.berkeley.edu/trac/wiki/BuildSystem https://boinc.berkeley.edu/trac/wiki/SoftwarePrereqsUnix If you want to build for Windows, sign up for a free account at AppVeyor. https://www.appveyor.com/ Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
In which case, I choose not to.(I can't compile a spoofed client)Why not? The instructions are here. Grant Darwin NT |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Here's a rough assessment of the Theory, 3 Hosts producing about the same amount of work using different In Progress numbers. Hypothesis : Different In Progress numbers will interact with Turn Around time resulting in little difference in the All important 'All' number due to the change in 'Validation pending' numbers. Host 1; State: All (29150) · In progress (500) · Validation pending (17846) · Validation inconclusive (352) · Valid (10452) · Invalid (0) · Error (0) Host 2; State: All (28428) · In progress (3200) · Validation pending (14583) · Validation inconclusive (425) · Valid (10220) · Invalid (0) · Error (0) Host 3; State: All (26577) · In progress (5172) · Validation pending (10703) · Validation inconclusive (374) · Valid (10229) · Invalid (2) · Error (97) At first glance it would Appear the Host with the Highest In Progress number is leaving the Smallest Footprint on the Database due to a Lower Validation pending number. Pretty much the opposite of what some here are postulating. Of course, more testing is required... |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Here's a rough assessment of the Theory, 3 Hosts producing about the same amount of work using different In Progress numbers.Ok, I see where you're heading with this. Seti@home has imposed limits on the number of WUs a single host can have in it's cache, due to it's servers having issues with large numbers of Results-out-in-the-field. Once the number of Results-returned-per-hour reaches a certain (rather variable) threshold, the servers effectively grind to a halt as they can't meet the I/O load the high return rate puts on them. The lower the number of Results-out-in-the-field, the higher the number of Results-returned-per-hour can be, before things fall over. At present Seti@home doesn't have enough computing power to process all the available data, so either a huge number of new crunchers are required, or a significant number of high throughput systems, but more than likely it needs both- a high number of new crunchers with high throughput systems. But quite simply Seti@home's current servers aren't capable of handling much more of a load than they already have. We know that the faster you return work, the higher the number of Pendings; the slower you return work, the lower the number of Pendings. Tbar is proposing that for a system of a given output (WUs per hour, day or whatever) the number of Tasks (WUs) in the All column are the same regardless of the In Progress value, as the higher the In Progress number the lower the Validation Pending number. The lower the In Progress number, the higher the Validation Pending number (which would be rather unfortunate if true as it would mean high performance systems, even without spoofing, put a huge strain on the servers because of both their high return rates and their All tasks value, compared to much slower systems with their extremely low return rates & much lower All tasks values. Extrapolating from that- as the Seti servers can only handle extremely high Results-returned-per-hour numbers with extremely low Results-out-in-the-field numbers- it would seem lots of very slow to crunch systems with only a few WUs on each would be able to return more work per hour without the Seti servers falling over due to their much lower All tasks numbers and therefore much lower Results-out-in-the-field numbers. But I think getting a million or so Android devices to replace the current higher performance PC hardware would be a bit of an ask...). Grant Darwin NT |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Sorry, what you posted just doesn't make any sense. The current top level machines produce well over a hundred times more work than the older machines that caused the problem with the Database simply because they were all leaving a footprint in the database nearly half the size of the current top machines which are hundreds times more productive. The problem with all those old machines, many more than todays numbers, wasn't what you are trying to attribute to today's machines, they were simply leaving too large of a footprint in the database with their 10000 tasks a piece even though most of those 10000 tasks were simply sitting there unused. You really think it would be easier for the servers to deal with those thousands and thousands of Androids it would take to replace those top machines? Instead of three accounts, it would take thousands and thousands of Android accounts requiring thousands and thousands of server contacts instead of just three. The only time there is an I/O problem is when there is an Overflow storm, not only is that being fixed, but, it appears they can avoid it by providing different data sources instead of just one BLC group. Not sure what to make of your last post. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Host 1; State: All (29150) · In progress (500) · Validation pending (17846) · Validation inconclusive (352) · Valid (10452) · Invalid (0) · Error (0) Another look at those Three Top Hosts; Host 1; State: All (28282) · In progress (500) · Validation pending (17600) · Validation inconclusive (337) · Valid (9845) Host 2; State: All (27806) · In progress (3200) · Validation pending (14143) · Validation inconclusive (448) · Valid (10015) Host 3; State: All (27821) · In progress (6500) · Validation pending (10860) · Validation inconclusive (378) · Valid (9987) This is more like what I would expect, the 'All' numbers are very close together. Any perceived 'Help' to the database by lowering the cache setting is offset by an increase in Pending tasks. This is for Hosts completing 10000 tasks a day, is it the same for those completing 5000 per day? 1000 a day? Since Validated tasks are removed after 24 hours I assume we need to consider this a daily cycle. Hmmm, I suppose I need to order another 1070 to break out of this deadlock... |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304 |
Sorry, what you posted just doesn't make any sense. The current top level machines produce well over a hundred times more work than the older machines that caused the problem with the Database simply because they were all leaving a footprint in the database nearly half the size of the current top machines which are hundreds times more productive. The problem with all those old machines, many more than todays numbers, wasn't what you are trying to attribute to today's machines, they were simply leaving too large of a footprint in the database with their 10000 tasks a piece even though most of those 10000 tasks were simply sitting there unused.Now think carefully about what you are saying there; you're contradicting yourself. And please- stop making things up by talking about "Unused" WUs. The Seti@home server problem is the Results-out-in-the-field number impacting on the amount of work it can receive each hour. The more Results-out-in-the-field then the lower the number of Results-returned-per-hour can be before problems occur. The lower the number of Results-out-in-the-field then the higher the number of Results-returned-per-hour can be before problems occur. That is the Seti@home server issue. As you have proposed, and appear to be supporting that proposal with evidence, is that the amount cached by a system (In progress) for a given work output doesn't make any difference to the Results-out-in-the-field number, as the higher the In Progress number, the lower the Pendings, the lower the In Progress number the higher the Pendings- resulting in the same All tasks number. And the fact appears to be that the higher the throughput of a system, the higher the All tasks number- which makes up the Results-out-in-the-field number. What you have proposed & appear to be supporting with evidence is that the higher the production of a machine then the greater it's impact on the Results-out-in-the-field number. And it's the Results-out-in-the-field number that had to be limited for the servers to handle the Results-returned-per-hour load. You really think it would be easier for the servers to deal with those thousands and thousands of Androids it would take to replace those top machines? Instead of three accounts, it would take thousands and thousands of Android accounts requiring thousands and thousands of server contacts instead of just three.I agree, it would produce a whole different load on the database. But as the Results-out-in-the-field would be significantly reduced it may or may not be able to cope with the change in loading. While the post was rather tongue-in-check, it is still factually correct if your hypothesis about All tasks/ Pending and In progress is true. it appears they can avoid it by providing different data sources instead of just one BLC group.That would help, but it would appear they need to work on it. Other than the odd resend all my present WUs are BLC32s. And it wouldn't be of any use if Seti@home gets it's whish and gets more processing power for the project as the Results-received-in-last-hour and Results-out-in-the-field would both increase, lowering the I/O problem threshold. The only time there is an I/O problem is when there is an Overflow storm,Not just a noise bomb storm, it also occurs when there are lots of shorties in the mix. Or when the number of crunches / the number of high performance machines increases then the Results-out-in-the-field and Results-returned-per-hour will increase & actually lower the I/O bottleneck threshold. Personally i'm hopeful that the new upload server, which also does file deletion & Validation duties, will help reduce the present bottlenecks (or better yet, remove them. Which would of course expose the next bottleneck area). Grant Darwin NT |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
I think the seti team has been doing a great job of upgrading the servers (with more upgrades planned). The unplanned outages are only a bad memory, and the planned outages have been of a reasonable (your definition may vary) length. There tends to be a horrible crunch right after the outage, and I think the issue isn't with the people who are on the boards trying to do their best to lessen the load on the system one way or another. I think we can have a healthy debate with science and numbers to figure out what works. But hopefully as Grant said "Personally i'm hopeful that the new upload server, which also does file deletion & Validation duties, will help reduce the present bottlenecks (or better yet, remove them. Which would of course expose the next bottleneck area)." these discussions are only important in the moment as the system changes and thus the bottleneck changes. I'm glad we have a community who is passionate about how they SETI, and that we can all SETI the way we want. The team recently even changed the code to allow more ghosts to be recovered at one time. It means they are paying attention and tweaking things within the money/time constraint to help the system. If the spoofing caused a big problem, I'm guessing a message would be sent out, or something put into place. |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
So following the logic of reducing the out-in-the-field number surely shortening the deadlines would improve that. The project could adjust it, wait a while to see the effect and then look at increasing the per-host limits. Once that’s done you don’t need the spoofed client. BOINC blog |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.