Message boards :
Number crunching :
GPU max 100wu Why?
Message board moderation
Author | Message |
---|---|
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
Is there any chance that the number of wu's per GPU can be increased to like 200?? With the faster GPU's they get without wu's when there is server maintanaince and server downs. I suggest an increase of wu's per GPU!! //TRuEQ TRuEQ & TuVaLu |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
preaching to the choir unfortunately. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
I know that the setting to 100 wu max was done sometime because the servers supply was drained by some people. But that was a long time ago and if i remember correctly it was before the servers where co-located to different site. Now with the alot faster GPU's and better servers and better internet connection it should be possible to increase the setting 100 to 150, 200 or maybe even more. //TRuEQ |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
From what I remember, the 100 Wu limit was imposed because after the outage (any outage in fact) the database was unable to keep up with the large amount of results being returned in one go. So in fact now that GPU's have got faster and there are many multi GPU machines it would probably be even worse, add in Linux and the special sauce and well.. Someone should do a few sums on how many WU's the top 100 machines process per hour, times that by the nominal length of an outage an see what total number of WU's being returned from just the top 100 would be. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
From what I remember, the 100 Wu limit was imposed because after the outage (any outage in fact) the database was unable to keep up with the large amount of results being returned in one go. I do 17.5K tasks per day, so after a typical 6 hour Tuesday outage I need to report 4375 tasks which I've finished during the outage. If we just select the Top 20 hosts which would have similar numbers to report that would tally up 87.5K tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
From what I remember, the 100 Wu limit was imposed because after the outage (any outage in fact) the database was unable to keep up with the large amount of results being returned in one go. Then maybe the restriction should be in how the results are returned instead of not sending new wu's. Maybe just 50 or 100 at a time..... Not all.... //TRuEQ |
TRuEQ & TuVaLu Send message Joined: 4 Oct 99 Posts: 505 Credit: 69,523,653 RAC: 10 |
//TRuEQ |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
the servers already restrict how many WU they can take back at one time from each host. if i have 5000 completed WU ready to report, on the next communication, I will tell the server i have 5000 ready, but the server will only take anywhere from 100-250 or so at one time. it really doesn't matter how many you have ready. the server only takes what it wants. but you can also limit how many you send back at a time via a command in the app_config file. some people have found this beneficial after a Tuesday maintenance outage, but i haven't really seen it to matter too much personally. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
I restrict to only reporting 100 tasks at a time since that is the highest value for connecting consistently after an outage. Anything higher than that drops percentage of successful reporting rapidly to zero. It takes several hours to report all work after an outage. Also only by setting NNT when reporting is it usually successful. So it takes several hours to report after the project comes back and then several hours after that to even start getting new work. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
Is there any chance that the number of wu's per GPU can be increased to like 200?? Probably when the existing servers are replaced with systems that can meet that load. I figure $500,000 would probably be enough (as a minimum) to meet the extra demand increasing the server side limits would create, $900,000+ for a bit of future proofing. Grant Darwin NT |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
I thought it wasn't a server restriction, but a database one. There is a limit to the number of queries it can process irrespective of hardware. I might be wrong but that is what I seem to remember. |
rob smith Send message Joined: 7 Mar 03 Posts: 22160 Credit: 416,307,556 RAC: 380 |
Most of the "top twenty" users don't complain, they just grit their teeth when the weekly outrage comes along - it is only one or two who do complain. I very much doubt that the impact of Keith or Petri with their massive crunchers is as big as one thinks as they both have strategies in place to control up and downloads to "sensible per cycle". Indeed, if one looks at the top twenty users, many of them have very large numbers of computers, not "mega crunchers", thus they might be considered "average" users. If one considers RAC as an indication of work done per day, then the top twenty computers (not users) represents a very small drop in the ocean of demand at the end of an outrage when everyone is returning results and requesting new work - and the shear number of users returning one or two tasks results in a very large number of "contact queries", which are among the hardest hitters on the servers - sending out work is remarkably "query efficient", and indeed sending out a bundle of ten tasks to a single computer is more efficient through the whole process than sending out the same ten tasks to ten individual computers. Splitting work off to another server dedicated to one group of users would insert another layer of query complexity into an already messy set of queries, and could well result in a reduction in overall performance, not the desired improvement. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
I just feel it is somehow wrong that out of 93,895 active user, just the top 20 cause inconvenience to all the others? How are the inconveniencing others? The fact is the project keeps asking for more processing power, more users. That will put a much greater load on things than exists now. The top 20 are just providing exactly what Seti has asked of every one- provide more computing resources. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13720 Credit: 208,696,464 RAC: 304 |
I thought it wasn't a server restriction, but a database one. The problem is the load on the database, and that limit is due to the hardware it is running on. More powerful hardware will be able to meet that load- although the biggest improvement would be suitable flash storage, but then it would be limited by the existing hardware & internal network speeds, so they would need to be upgraded to take full advantage of the benefits that all flash storage would provide. Whenever you remove one bottleneck in a system, it's not long till you find the next one, and the one after that, and the one after that, etc (think of a never ending game of whack-a-mole). Grant Darwin NT |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
One thing that can be easily done, and helps a lot, is to simply run as many 'shorty' tasks through as you can before maintenance. They take half the time to run, so having your cache full of the longer running tasks makes a big difference. I can usually make it through a normal maintenance with 1500 GPU tasks - with rescheduling. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Hi All As we all knows the 100 WU limit was put in place to try to keep the DB size on a manageable numbers because the software/hardware used. That was an "emergency" solution. And at that time with less powerful GPU's has little impact on the heavy users. But that changes. The real cause of that extremely high DB size are not the "mega crunchers" as was suggested in this thread. Why? I will try to explain. Mega crunches return their work in around 1 day, so in theory their work could be purged very fast and not impact in the size of the DB, but the problem is the way the Boinc works, it's need a wing mate to check the results, and there is the problem. Normally the wing mates of that "mega crunchers" are "common users" who takes even weeks to send back their work, some a entire month, some even time-out. There is the problem. The DB needs to keep track of that for the entire period of time. And that makes it size growing bigger and bigger. So the problem with the size of the DB can't be attribute to the 20 top "mega-crunchers" hosts. Someone could say, i'm defending myself, i respect their opinions, but no i just explain the problem for a different POV. There are no "easy" solution to that problem, DB size x WU Cache Size. More powerful servers, change the DB engine, more.... more & more... But that cost a lot more.... IMHO the right thing to do without spending 500K or more on more.... as suggested, is limit the cache size of every host (mega or common) to it's turnaround time and set the time out of the WU to a lower level. So a "mega cruncher" who does 5K WU per day could receive 5K WU and the "slower user" who does 2 WU per day receives 2. Not exactly but something like the GPUGrid does, by only sending new WU when the host report the old one crunched. That will allow everyone "happy" and help to pass the outages (who are very frequently this time BTW) without problem. Some could say, nobody promise 24/7 work and that is right, but why a host who crunch 2-3 WU day needs a 100 WU cache? And why we need this extremely large time-out times? More than a month to timed out the WU on the actual days where even a cellular could crunch a WU in few days. Or a 10 days cache for the old telephone access modem times? Just a simple resource management could make all the users "happy". When the newer GPU's and Heavy core count CPU's spread in the Setiverse the problem with the 100 limit will increase proportionally. You not need to have a "mega-cruncher" anymore to have problem with the 100 WU limit. Is simple math, a 2080 Ti could crunch a WU in less than 30 secs, so a 100 WU cache holds for about a 1/2 hour only. And is not only with the GPU's a new generation of CPU's with a 40 or more cores. On it's case the 100 WU will hold for few hours only. There are turnarounds, like rescheduling, yes, but they are not for common users. This turnarounds need "special" users who knows what they are doing to avoid add even more stress to the DB. My 0.02 |
Sirius B Send message Joined: 26 Dec 00 Posts: 24877 Credit: 3,081,182 RAC: 7 |
There are turnarounds, like rescheduling, yes, but they are not for common users. This turnarounds need "special" users who knows what they are doing to avoid add even more stress to the DB.That's the most asinine piece of bovine excrement I've ever read! |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
You don't need high-end GPUs to run out of work. My lower end consumer 1050Ti runs out on most outages. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
You don't need high-end GPUs to run out of work. If you compare the 1050Ti (specially running Linux Special Sauces builds) with the GPU's we have at the time the 100 WU was introduced. They are powerful GPU's. At that time the available GPU's crunches a WU in about 15 min to 1/2 Hr. LOL Basically any new GPU based host has the same problem. That's is why i said the problem with the 100 WU limit will increase very fast as more and more users add newer hardware. |
Sirius B Send message Joined: 26 Dec 00 Posts: 24877 Credit: 3,081,182 RAC: 7 |
Basically any new GPU based host has the same problem. That's is why i said the problem with the 100 WU limit will increase very fast as more and more users add newer hardware.So you propose a "special elite group" of crunchers to differentiate from "commoners"? Also, when was the last time you seen a single processor computer crunching 2-3 wu's a day? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.