Can not get ANY Cache built up !

Author	Message
Floyd Send message Joined: 19 May 11 Posts: 524 Credit: 1,870,625 RAC: 0	Message 1154739 - Posted: 22 Sep 2011, 0:54:28 UTC I am Just getting wu's to run as I send them in... NO extra ones at all , Have to watch time go down and hit Update to get any then. Is Everyone having this same problem ? IF NOT , what can I do to get this ( at Least a days cache ) going again ? Some one mentioned " Flops " ? Upload is empty and DCF is : Task duration correction factor 0.202202. Average turnaround time 0.16 days SETI@home Enhanced (anonymous platform, CPU) Number of tasks completed 1413 Max tasks per day 168 Number of tasks today 16 Consecutive valid tasks 71 Average processing rate 19.683215936176 Average turnaround time 1.76 days SETI@home Enhanced (anonymous platform, nvidia GPU) Number of tasks completed 2835 Max tasks per day 1447 Number of tasks today 36 Consecutive valid tasks 1349 Average processing rate 154.05440154138 Average turnaround time 0.28 days ID: 1154739 ·

Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0	Message 1154740 - Posted: 22 Sep 2011, 1:14:39 UTC Last modified: 22 Sep 2011, 1:15:42 UTC From what I'm observing, ~19 hours of WUs is the maximum cache now, at least on my machines. I will not get any more WUs than that (2-3 shorties in a batch), and if a machine tries (or has more WUs from before that crippling was introduced) it gets 'This computer has reached a limit on tasks in progress'. My time estimates are correct and DCF around 1 (due to <flops> entries). This isn't really working well together with daily project outages and ALL new workunits being "shorties", making faster machines contact servers literally every 5 minutes. ID: 1154740 ·

Floyd Send message Joined: 19 May 11 Posts: 524 Credit: 1,870,625 RAC: 0	Message 1154743 - Posted: 22 Sep 2011, 1:19:50 UTC - in response to Message 1154740. From what I'm observing, ~19 hours of WUs is the maximum cache now, at least on my machines. I will not get any more WUs than that (2-3 shorties in a batch), and if a machine tries (or has more WUs from before that crippling was introduced) it gets 'This computer has reached a limit on tasks in progress'. My time estimates are correct and DCF around 1 (due to <flops> entries). This isn't really working well together with daily project outages and ALL new workunits being "shorties", making faster machines contact servers literally every 5 minutes. I wish I could get 19 Hr Cache... I have NOTHING in cache... :-( ID: 1154743 ·

Khangollo Send message Joined: 1 Aug 00 Posts: 245 Credit: 36,410,524 RAC: 0	Message 1154746 - Posted: 22 Sep 2011, 1:31:04 UTC - in response to Message 1154743. Try adding <flops> values in your app_info.xml, as described in http://setiathome.berkeley.edu/forum_thread.php?id=62293#1055179, maybe that'll help. Also I'm correcting myself on that 19 hour cache. CPU has <5 hour cache; I forgot do divide it by 4 for 4 cores... I was being able to download over 400 WUs for GPU for a huge 19 hour cache, but for CPU it doesn't let me download more than 40. So, apparently it's not fixed 25/core as thought before. Hopefully, this is only temporarily (?). ID: 1154746 ·

Floyd Send message Joined: 19 May 11 Posts: 524 Credit: 1,870,625 RAC: 0	Message 1154747 - Posted: 22 Sep 2011, 1:56:55 UTC can somebody explain to me about DCF? I was thinking that the lower the number , the better it was ? OR... Is it being right at 1.0 , instead of 0.202202 , better ? Being a noob about all this can be Frusterating sometimes... ID: 1154747 ·

LadyL Volunteer tester Send message Joined: 14 Sep 11 Posts: 1679 Credit: 5,230,097 RAC: 0	Message 1154870 - Posted: 22 Sep 2011, 12:42:15 UTC - in response to Message 1154747. can somebody explain to me about DCF? I was thinking that the lower the number , the better it was ? OR... Is it being right at 1.0 , instead of 0.202202 , better ? Being a noob about all this can be Frusterating sometimes... DCF is the Duration Correction Factor It is used locally on your machine to get good estimates on task runtimes. Depending on the project there can be two sorts of values: Old system (projects which haven't adopted 'CreditNew') The project roughly estimates how long a WU will take - when starting you receive a WU which is usually overestimated. - It gets crunched - the system notices that it was faster than expected and adapts by lowering the DCF (may also happen other way round) - you get the next one - dcf drops. until the estimated time fits quite well with the actual runtime. Combined estimates are used to determine workfetch, so if they are too high, you will have less cache than set. Flip side - if runtimes are heavily underestimated your cache will be far far higher than you wanted. DCF can be any number as it's linked to the ratio of your machines speed to the speed of the project's reference machine. NB this does not work very well if applications differ greatly in speed - such as GPU/CPU - DCF sawtoothes between high CPU values and low GPU values. New system (SETI - and projects that have 'upgraded' to CreditNew) The server keeps a record of how fast you have crunched a particular unit with a particular appilcation and uses this to calculate APR (avarage processing rate, found under application details on the host details page) once 10 tasks have validated APR is used to calculate the estimated runtime - DCF stays close to 1. NB the server does not distinguish beyond 'anonymous platform, CPU/NV/ATI)' if you change application and the runtime significantly changes, APR will be slow to follow and DCF will move. Until last week estimates here were APR driven and DCF on most hosts should have been around 1. Then capping happend. Now tasks (on hosts running anonymous platform) arrive overestimated and the client tries to compensate by lowering DCF (as per 'old system') Regarding flops and cache - if present flops will be used locally to calculate estimates. With APR estimating working as it should these are not really needed. As it's currently not working inserting <flops> into app_info.xml helps to get better estimates and consequently better workfetch. Note however, that there is a limit in place on 'work in progress' - this limit may be smaller than your cache setting. Also there is a problem with the router, leading to problems to connect to the servers from some parts of the globe. If your reports and up-/downloads are still working (albeit slowly) you are not affected. ID: 1154870 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19048 Credit: 40,757,560 RAC: 67	Message 1154879 - Posted: 22 Sep 2011, 13:18:11 UTC Another thing about DCF, as Richard found out, there is a safety factor built in, if the DCF goes too low, then it restricts d/loads. I don't know the value of too low. ID: 1154879 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22189 Credit: 416,307,556 RAC: 380	Message 1154883 - Posted: 22 Sep 2011, 13:47:17 UTC DCF appears to be very poor at predicting the run times of most jobs - on one of my hosts currently I have a load of "shorties" that will take about 4 minutes on the GPU, these are predicted as going to take about 1 hour (better than yesterday's wild guess of a couple of hours), and at the same time a few shorties for the CPU which will take between two and three hours, these are also predicting about 1 hour. I've just received what I assume to be a handful of longer tasks for the GPU, with initial estimates of about 4hours, against my expectation of about 20-30 minutes. This is a real case of breaking something that was working fairly well to achieve a steadier download rate. (I've also had a crop of very short run talks, which I assume were caused by "high fault rate" tasks. This is not an uncommon feature of the data, just "one of those things" that has always happened, and always will happen) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1154883 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1154890 - Posted: 22 Sep 2011, 14:12:35 UTC - in response to Message 1154879. Another thing about DCF, as Richard found out, there is a safety factor built in, if the DCF goes too low, then it restricts d/loads. I don't know the value of too low. 0.02 Above that is fine, below it restricts downloads. I don't know what happens if you end up with exactly 0.020000..... ID: 1154890 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19048 Credit: 40,757,560 RAC: 67	Message 1154891 - Posted: 22 Sep 2011, 14:19:06 UTC - in response to Message 1154883. Last modified: 22 Sep 2011, 14:20:29 UTC DCF appears to be very poor at predicting the run times of most jobs - on one of my hosts currently I have a load of "shorties" that will take about 4 minutes on the GPU, these are predicted as going to take about 1 hour (better than yesterday's wild guess of a couple of hours), and at the same time a few shorties for the CPU which will take between two and three hours, these are also predicting about 1 hour. I've just received what I assume to be a handful of longer tasks for the GPU, with initial estimates of about 4hours, against my expectation of about 20-30 minutes. This is a real case of breaking something that was working fairly well to achieve a steadier download rate. (I've also had a crop of very short run talks, which I assume were caused by "high fault rate" tasks. This is not an uncommon feature of the data, just "one of those things" that has always happened, and always will happen) DCF is only a prediction modifier. Also there is only one DCF value/project, not one/application. Therefore it is only at a correct value for one of the applications once every blue moon, well might be more frequent but not much, but never for the others. Dr.A apparently said he couldn't do individual app DCF's, but Jason, the one from Lunatics, proved it could be done. Dr A's solution to the problem is credit_new, which has brought back the variable credits problem, and is not doing too well with APR at predicting estimated processing time, yet, I might add, as we can see it seems to be work (back) in progress. ID: 1154891 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1154892 - Posted: 22 Sep 2011, 14:21:55 UTC - in response to Message 1154883. DCF appears to be very poor at predicting the run times of most jobs - on one of my hosts currently I have a load of "shorties" that will take about 4 minutes on the GPU, these are predicted as going to take about 1 hour (better than yesterday's wild guess of a couple of hours), and at the same time a few shorties for the CPU which will take between two and three hours, these are also predicting about 1 hour. I've just received what I assume to be a handful of longer tasks for the GPU, with initial estimates of about 4hours, against my expectation of about 20-30 minutes. This is a real case of breaking something that was working fairly well to achieve a steadier download rate. Exactly so. Something got (accidentally) broken while trying to fix something else. We're busily trying to work with the project staff to ensure that they know what's currently broken, and what else they need to avoid breaking in the course of fixing the current breakage. It won't be fixed instantaneously, especially while all the other little fires (like the misbehaving HE router) need to be fought at the same time. But in the medium term, we ought to get back to the situation as it was a couple of weeks ago, with APR sorting out the large-scale differences between the runtimes of the different applications, and DCF merely fluctuating slightly above and below 1.0000 with the minor variations between tasks. ID: 1154892 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22189 Credit: 416,307,556 RAC: 380	Message 1154894 - Posted: 22 Sep 2011, 14:48:16 UTC DCF is a farce, as very sad farce. I watched as a CPU task completed. Before completion the GPU task were sitting at about 45minutes guesstimate (compared to 4 minutes run time). The CPU tasks finished in 3 hours, and "instantly" the guesstimate for the GPU tasks went to over three hours - under damped or what? Now if DCF is being used in anyway other than to guess at how many tasks are needed by a given host then its really going to cause some major problems on the credits system, and should be consigned to the scrap bin of bad ideas. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1154894 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1154902 - Posted: 22 Sep 2011, 15:11:58 UTC - in response to Message 1154894. DCF is a farce, as very sad farce. I watched as a CPU task completed. Before completion the GPU task were sitting at about 45minutes guesstimate (compared to 4 minutes run time). The CPU tasks finished in 3 hours, and "instantly" the guesstimate for the GPU tasks went to over three hours - under damped or what? Now if DCF is being used in anyway other than to guess at how many tasks are needed by a given host then its really going to cause some major problems on the credits system, and should be consigned to the scrap bin of bad ideas. DCF has been around since BOINC v4.70 (according to the unofficial BOINC Wiki), and was a very good idea in its time. Unfortunately, its time was an era when computers had CPUs only (not general-purpose comuting GPUs), and when BOINC projects only had one application. There are still computers, and projects, like that around: for them, DCF is perfectly adequate. For projects like this one, using GPUs and with multiple (AP/MB) applications, DCF on its own was no longer enough, and the replacement - introduced some 15 months ago - has been more-or-less working ever since (the way that all good ideas work - so unobtrusively that most of the time you don't notice it's there). Until it breaks. Which it did a couple of weeks ago. You're seeing the reults of the breakage of the 'new' way, and a reminder of why the 'old' way was inadequate for us. Yes, it's a pain: and yes, it needs fixing. But not by throwing out the old, tried-and-tested, DCF. That would involve getting everybody who saw the problem to download and install a new version of the BOINC client (which doesn't exist yet): and getting every project that any one of those users attach to to upgrade to the latest version of the BOINC server software - which I, for one, would be very hesitant about suggesting any project administistrator should do. Sorry, I'm afraid we're in the realm of 'fix it', not 'throw it away and buy a new one'. As so often, the only advice is to have patience. ID: 1154902 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22189 Credit: 416,307,556 RAC: 380	Message 1154904 - Posted: 22 Sep 2011, 15:14:33 UTC One has to ask, what changed a couple of weeks ago to cause this change in behaviour - I didn't update anything at my end, so?? Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1154904 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1154905 - Posted: 22 Sep 2011, 15:18:49 UTC - in response to Message 1154904. One has to ask, what changed a couple of weeks ago to cause this change in behaviour - I didn't update anything at my end, so?? The servers were changed - nothing at our end. If you really want to know, read Shorties estimate up from three minutes to six hours after today's outage! (that was the outage of 13 September. Was it really only nine days ago?) ID: 1154905 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19048 Credit: 40,757,560 RAC: 67	Message 1154906 - Posted: 22 Sep 2011, 15:25:09 UTC - in response to Message 1154894. DCF is a farce, as very sad farce. I watched as a CPU task completed. Before completion the GPU task were sitting at about 45minutes guesstimate (compared to 4 minutes run time). The CPU tasks finished in 3 hours, and "instantly" the guesstimate for the GPU tasks went to over three hours - under damped or what? Now if DCF is being used in anyway other than to guess at how many tasks are needed by a given host then its really going to cause some major problems on the credits system, and should be consigned to the scrap bin of bad ideas. DCF was introduced to modify the estimated processing time. The initial estimated processing time was calculated from the estimated Flops for the task and the computers benchmark figures. The computers benchmark figures were never an accurate measure of the computer system and could vary greatly on the same computer using different operating system. Because there is only one DCF/project, then a project that runs more than one application, like Seti. Then the DCF will vary depending on which application was used to process the last task, and on Seti for MB tasks the angle_range of the task. This usually means a graph of the DCF against time will look like a saw blade. As it is increased immediately a task takes longer than predicted and decreases slowly for a task that finishes quicker than predicted. When the credit_new system gets sorted out, the APR, where there is one/application will be used in the predicted processing time calculation. Which will mean the DCF will effectively be obselete, as it should then stay at the default value of 1.00. ID: 1154906 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19048 Credit: 40,757,560 RAC: 67	Message 1154908 - Posted: 22 Sep 2011, 15:29:00 UTC - in response to Message 1154904. One has to ask, what changed a couple of weeks ago to cause this change in behaviour - I didn't update anything at my end, so?? I reported a problem, that needed fixing, for new or re-attached computers. And as others noticed would have been a major problem when the new Seti application that is being tested over at SetiBeta is released. ID: 1154908 ·

Floyd Send message Joined: 19 May 11 Posts: 524 Credit: 1,870,625 RAC: 0	Message 1154910 - Posted: 22 Sep 2011, 15:33:00 UTC Well... LOL... I finally got a AP WU and I was pleased , I figured about 15 hours of work there... after 21 seconds it was gone... LOL... can't win for loseing it seems... I guess we'll Just have to hang in there untill this all gets worked out , Then maybe we can get back to full speed Crunching , with out babysitting our computers 24/7 , trying to get them uploaded and downloaded... Nothing is GOOD all the time , Not even Ice cream... :-) ID: 1154910 ·

Dave Lewis Send message Joined: 12 Apr 99 Posts: 34 Credit: 53,432,603 RAC: 108	Message 1154911 - Posted: 22 Sep 2011, 15:34:54 UTC - in response to Message 1154890. Another thing about DCF, as Richard found out, there is a safety factor built in, if the DCF goes too low, then it restricts d/loads. I don't know the value of too low. 0.02 Above that is fine, below it restricts downloads. I don't know what happens if you end up with exactly 0.020000..... Richard, I just checked and my DCF is 0.015777 as I write this so I'm below the threshold you describe. About 12 hours ago my cache built up to around 145 WU's. This morning the cache was empty and the process seems to be get 1 WU, process, upload the processed WU, wait for the next WU to download. Is modifying the app_info.xml by adding <flops> values the way to go to affect a change on DCF? ID: 1154911 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19048 Credit: 40,757,560 RAC: 67	Message 1154912 - Posted: 22 Sep 2011, 15:37:43 UTC - in response to Message 1154910. Well... LOL... I finally got a AP WU and I was pleased , I figured about 15 hours of work there... after 21 seconds it was gone... LOL... can't win for loseing it seems... I guess we'll Just have to hang in there untill this all gets worked out , Then maybe we can get back to full speed Crunching , with out babysitting our computers 24/7 , trying to get them uploaded and downloaded... Nothing is GOOD all the time , Not even Ice cream... :-) 50% of the AP tasks I d/loaded today did the same, d/loaded/processed/uploaded/reported/validated, all within an hour. ID: 1154912 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.