Message boards :
Number crunching :
Panic Mode On (84) Server Problems?
Message board moderation
Previous · 1 . . . 17 · 18 · 19 · 20 · 21 · Next
Author | Message |
---|---|
Tom* Send message Joined: 12 Aug 11 Posts: 127 Credit: 20,769,223 RAC: 9 |
Mike, Juan is a TOTAL PROPONENT OF FREEING CORES TO PROCESS gpu TASKS MORE EFFICIENTLY Opps more efficiently. Thanks for the knowledge that angle rates above 1.0 do less science, suspected that but your Unknown ATI likely due to version 6.12 BOINC, seems to confound BOINC with an APR of SETI@home v7 (anonymous platform, ATI GPU) Average processing rate 2315.981175065 Just how do you get that fantastic APR? Juan takes just as long to process a VHAR as a normal angle range. Is this an issue between CUDA and OpenCL? If so GREAT Jason has something to work on, My GTX660 doen't take quite 800 secs to process a VHAR but it still takes 650 to 700 secs. PS - Did your RAC drop more than 30% (new length of tasks) when moving to V7? I don't remember V6 taking a credit dump on VHAR's but that might be due to no autocorrelations? |
Mike Send message Joined: 17 Feb 01 Posts: 34257 Credit: 79,922,639 RAC: 80 |
The APR is a server issue. I didn`t loose anything of my RAC but thats simply because i upgraded from a HD 5850 to a HD 7970 just a few days before V7 release. Juans times are not very good specially on VHARs. This indicates not enough CPU time available. At least for WU start sometimes. Its definetly not the apps. I check hundreds of hosts a week. Sciman Steve`s host is how it should work with nvidia. With each crime and every kindness we birth our future. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Sorry i can´t realy understand what you say, you need to free a core to crunch a MB? i know that happens with AP but never see that on MB with Intel CPU´s. This particular hosts have 2x690 and runs 2 WU at a time on each GPU and allready reaches 98-99% of GPU utilization, free more cores makes no diference, i try to free 1, 2 or even 3 and the times are actualy allmost the same. I don't do any aditional heavy task on that host, without Boinc running the actual work load of the host never go above 5% at most. Both GPU´s are EVGA clasified and i use no OC. So please someone tell me what is wrong... since you say the times are not very good... This particular host was before V7 the #5~#7 at SETI top computers (running cuda50 x41zc from Jason´s) with about 100-120K of RAC! Crunching only MB, and i made no modifications on it´s configuration just take out one 670 to help with the hot air flow. SETI@home v7 (anonymous platform, nvidia GPU) Number of tasks completed 1757 Max tasks per day 3288 Number of tasks today 351 Consecutive valid tasks 879 Average processing rate 180.40354863605 Média do tempo de resposta 0.15 days That is bad times? Check this 2 WU: almost the same AR... 0.4.... WU 1 - http://setiathome.berkeley.edu/result.php?resultid=3076351507 WU 2 - http://setiathome.berkeley.edu/result.php?resultid=3076460895 Time GPU+CPU Credit WU 1 - 1,278.78+179.52 97.61 WU 2 - 905.52+97.06 103.22 WU 1 - more time to process less credit than WU 2 with almost the same AR. That´s all realy bug´s my mind... i need a beer... BTW i will PM Steve´s for some help, i´m sure he will give me a hand... |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
Well,look at it from their side, why should they do it? Bill, your comment indicates to me that you missed the last sentence: As to cost, I'm quite sure they could get a bunch of 2nd or 3rd year computer science students to do the analysis, write new code and test it for the appropriate credit. I just want to make sure that you do understand what I am saying through this... |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
Well,look at it from their side, why should they do it? Bill, in case you haven't seen this, I have copied it in from another thread. Look at the data. The picture is clear. There is an issue with the logic of determining v7 credits. I’ve had a look at the data around v7 and v6 WUs as well as AP WUs and below is a quick observational analysis of my data and the impact of v7 in relation to granted credits. Under v6, I was roughly averaging 100 credits per Work Unit (WU). Under v7, it seems that the average is sitting around 75-80 credits per WU. In looking at run-times and taking the outliers out, cpu run time was around 600-660 seconds (10-11 minutes) per WU for v6, and appears to be around 800-1100 seconds (13+ to 18+ minutes) for a v7 WU. CPU time seems to have gone up by a factor of 2-3 from 50-60 seconds for v6 to 90-180 seconds for v7. So doing a quick Back of the Envelope (10.5/15.83=0.66) shows that from a WU processing/throughput capability, I can expect to do roughly 66% of the volume of WUs that I did before (for example, if I was doing 400 WUs per day under v6, I can now expect to do around 264 WUs per day under v7). Looking at the impact on credit gives 0.66*0.775 = 0.514 or 51.4%. In essence I can expect that daily credit for v7 will drop to circa 51% of what I was getting under v6. I am aware of the comments around “that the system needs time to settle down†and that “it thinks all the WUs coming back at the moment are easy, hence the low credit†however, if the system continues to perform as is, then I can expect to see no change from current trajectory. To test the assumption, I have looked at credit per day pre v7 and post v7 implementation (the data is contained below). The average daily credit prior to migration was 221,878. Following migration on 1st June, the average daily credit is showing as 102,376 which is circa 46.1% of the previous daily average under v6. Up to and including 22 June, I saw no real change in daily total even though was comment that the credit system had been tweaked. As part of my thinking, I decided to “benchmark†v7 credits against Astropulse credits. I started this on 23 June with the migration of a single box. Over the next day or two, I noticed what appeared to be an increase in daily totals, so decided to migrate the other two boxes to AP only to see what the full impact would be. The assumption that I was running with was that pre v7, v6 and AP should have been fairly well benchmarked against each other and that the granting of credits would be in approximate equilibrium based on a daily basis. If this was the case, then I would expect to see a rise in daily totals from circa 102k to circa 220k and for the daily total to remain around 220k. From 23 June onwards, totals increased on a daily basis towards the peak of 222k on 29 June. Unfortunately there has been a lack of AP WUs starting around 30 June and so daily totals have declined over the last few days. My observation in here (albeit based on a small amount of data) is that v6 and AP seemed to be fairly well benchmarked against each other. The issue is with v7. V7 is not well benchmarked against v6 WUs, nor is it well benchmarked against AP WUs. In fact, one can almost double their existing daily run rate through only processing AP WUs. Daily run rates: 2013.05.16 – 244,130 2013.05.17 – 220,168 2013.05.18 – 231,098 2013.05.19 – 226,353 2013.05.20 – 224,723 2013.05.21 – 210,477 2013.05.22 - 0 2013.05.23 – 431,485 2013.05.24 – 229,312 2013.05.25 – 228,767 2013.05.26 – 239,021 2013.05.27 – 231,271 2013.05.28 – 231,050 2013.05.29 – 0 2013.05.30 – 392,635 2013.05.31 – 209,556 Migration of all boxes to v7 on “day 1†2013.06.01 – 123,072 2013.06.02 – 94,061 2013.06.03 – 102,333 2013.06.04 – 99,896 2013.06.05 - 65,653 2013.06.06 - 112,209 2013.06.07 - 102,538 2013.06.08 - 110,760 2013.06.09 - 89,757 2013.06.10 - 96,018 2013.06.11 - 111,653 2013.06.12 - 90,091 2013.06.13 - 119,848 2013.06.14 - 99,884 2013.06.15 - 104,561 2013.06.16 - 110,566 2013.06.17 - 110,603 2013.06.18 - 102,856 2013.06.19 - 85,268 2013.06.20 - 140,694 2013.06.21 - 70,247 2013.06.22 - 109,698 Migration of all boxes away from v7 to AP only (over period of ~4 days) 2013.06.23 – 126,873 2013.06.24 – 141,847 2013.06.25 – 169,625 2013.06.26 – 169,517 2013.06.27 – 183,693 2013.06.28 – 206,019 2013.06.29 – 222,126 2013.06.30 – 211,468 (Lack of AP wus from here) 2013.07.01 – 182,394 |
betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66 |
Mike, Sciman Steve`s host is how it should work with nvidia.is not fair, Steve is running a one off, heavily OC'd system vs AFAIK "stock" hardware in a production environment. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Mike,Sciman Steve`s host is how it should work with nvidia.is not fair, Steve is running a one off, heavily OC'd system vs AFAIK "stock" hardware in a production environment. LOL - That´s what i say, i´m a poor poor BFB cruncher who only could runs "stock" hardware... now just the #5 at SETI Top participants... with a great help of creditnew of course! |
bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0 |
Understood your point perfectly. What you seem to be missing that anybody's work will be put to use that the 'powers that be' consider more productive. Increasing the RAC for those that are dissatisfied is so far down on the list that it's beyond comprehension. Sorry, but there it is. I wish you luck. |
Lionel Send message Joined: 25 Mar 00 Posts: 680 Credit: 563,640,304 RAC: 597 |
Understood your point perfectly. What you seem to Bill, I am not asking for development of a differential credit system where those that are dissatisfied get more credit than those that are not. This in itself would lead to further dissatisfaction. I have provided data and commentary that supports the proposition that there is a problem with the credit system. The numbers above are indicative of what many others have also seen. I am sure that if you looked at daily RAC, say for the top 1,000 users pre and post the introduction of v7 you will see similar impact. This is not the grumblings of a single user, but the angst created by an ill tested system from a reasonable proportion of users. What I find hard to accept is the lack of acknowledgement, investigation and ownership that Korpela, Anderson and possible others are showing towards this issue. We are not solving for unknown particles here, all that needs to be done is "root cause analysis" and I am in a way staggered by what appears to be a fingers in the ear la la la approach hoping that the problem goes away. If you talk to them as you indicate, then you are in a prime position to affect an investigation and get them to be proactive, albiet somewhat late. |
bill Send message Joined: 16 Jun 99 Posts: 861 Credit: 29,352,955 RAC: 0 |
Understood your point perfectly. What you seem to I agree that your perception is definitely different from the folks in the lab. I suggest that you try e-mailing Dr. Anderson for his take on all this. His e-mail address is not that hard to find. I know I found it with very little looking. I'm sure a direct dialogue would be much better than relaying through a third party. That way any misconceptions are minimized and I have no interest in playing pivot man in any such conversation. |
tbret Send message Joined: 28 May 99 Posts: 3380 Credit: 296,162,071 RAC: 40 |
Ah, I see your problem, Lionel! You think "science" has something to do with observations of outcomes predicted by a theory. No. No. NO! Bad, Lionel! Bad, bad, Lionel! Science is about whatever our equations predict and whatever inadequacy we can find in our observations that dismisses the falsification of our theory. Nobody is going to listen to you! You are surrounded by Marsupials and we all know that people who live around Marsupials are, uh, upside-down and possibly backward. Living in a less-evolved world as you do, and standing on your head (from a planetary perspective -- well, at least the way globes are usually labeled) has obviously made all of the blood stagnate in your brain. I'm sorry to have to point-out these things, but you obviously have no point. The credits that you are receiving are exactly the same as they were for v6 of the application. No, really. You think you see something different? That's because you're looking. Remember that "credits" could be waves or particles. When you aren't looking they are like waves... the waves you get when someone is leaving and receding into the distance. They are just as large as they ever were, but only appear smaller from your perspective, and since there is no preferred perspective, you are wrong. So, if you don't look - RAC goes bye-bye and you don't notice. But if you DO look, then you will see your RAC as particles, and with your frame of reference, it can appear that your credits are going back in time and choosing to present themselves as being small from the beginning, even though they would have been waving if you hadn't looked. Nobody has ever heard of a large particle. If they were large, they'd be balls, and nobody would call them particles. So, you have to realize that by looking your making your balls into particles and they appear small. This is known as the duelality of credits. Juan thinks the whole thing is like a random number generator, but he only thinks that because of the Uncertainty Principle. You see, Juan can either have a high degree of confidence in the speed with-which his RAC declines, OR he can have an idea of what his RAC is, but he cannot know both with any degree of certainty and then only from his perspective. So, at the time he looks at his credit he "fixes" that credit in time. The other person who owns the rig against which his work unit validated may or may not have looked at THEIR work unit. This is a double-blind, single-blind, dual double-slit experiment where the person who is blind might get the same credit for more or less work depending on whether Juan looks at his result while sitting on his up-quark. If Juan's is up, then we know the other validater's credit must be down. That's why when you are sitting in a blind you might hear something go quark-quark before you look. I plan on taking two identical laptops and copying a work unit onto each. I will then climb a water tower and put one on top and let the other crunch at ground-level. The one near the top should crunch the work unit in less time compared to the one on the ground. If it doesn't, I will throw it off of the top of the water tower and count until it hits the ground, which will tell me how fast my RAC fell. And no matter what I observe, someone will tell me that I didn't. The real problem with the RAC we're playing with is that it is too small. Why do I say that? Because if I have a RAC of 125,000 and I add a GTX 670 to it, I *expect* to do more to my RAC than to go to 145,000. My RAC feels like it got bigger by too little. If I have four GPUs and my RAC is 60,000 and I go buy another $400 GPU and my RAC only increases to 75,000 I haven't had a "psychologically rewarding experience." If it went from 600,000 to 750,000, well, that's a whole 150,000!!! Look at my RAC climb!! AND really low RAC makes it hard as hell to see a small improvement. You have a 10,000 RAC. You make a 3% improvement, you have a RAC of 10,300. BUT, if you have a RAC of 1,000,000 a 3% improvement is 30,000!!! Hey, HEY!! We all know it's easier to see 30,000 than 300 of something. Nobody cares. It doesn't change the "SCIENCE!" of shoveling sand into a bucket and sifting it and taking the bucket of non-sand to the scientist so he can see if the stuff in the bottom that isn't sand might be something interesting. And he hasn't said a word about whether we've brought him anything interesting or not in a long time. We're arguing about whether our bucket holds 5 or 40. 5 or 40 "what" doesn't matter. BUT, if we are being *paid* by the "x", then whether our bucket holds 5 of them or 40 of them does matter. All anyone around here wants is a consistent "x" and I'm tellin' you (preaching to the choir), as long as someone *believes* their formula is correct, all of the observation in the world isn't going to shake that belief. I also notice that people who have a small RAC are the most likely to say size doesn't matter. {yes, I am very tired and a little loopy} |
James Sotherden Send message Joined: 16 May 99 Posts: 10436 Credit: 110,373,059 RAC: 54 |
I see that its the ones with the bigger RAC that seem to cry the most. O woe is me I went from #23 to #24. Give it a rest, Dr.A could care less what anyones RAC is. What with all the tear jerking calls for a better way to calulate RAC, I wish hed say screw it and go back to the way it was. Crunch one you get a 1. That would be fair right? Will we lose folks who crunch? Maybe, But we get them all the time for various reasons. I see you are all still here. [/quote] Old James |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13731 Credit: 208,696,464 RAC: 304 |
I wish hed say screw it and go back to the way it was. Crunch one you get a 1. That would be fair right? Nope. Hence the Cobblestone. Just a shame credit new borked it. Grant Darwin NT |
Mike Send message Joined: 17 Feb 01 Posts: 34257 Credit: 79,922,639 RAC: 80 |
Juan. First of all no offense meant. Looking at this unit for example http://setiathome.berkeley.edu/result.php?resultid=3076512395 This is a VHAR unit and took 200 seconds longer than other VHARs on the same host. It indicates the CPU had not enough free cycles at startup. 4 690`s are physically 8 680`s so in theory you`d need 8 free cores to feed the GPU`s properly. At least 4 cores. As i can see your host with 3 670`s has only 5K RAC less which shows this one is perfect. One 690 should produce more than 5K. All i want to say is that this has nothing to do with credit new. Each host with 4+ cards has the same issue. And i dont talk about overclocked hardware. With each crime and every kindness we birth our future. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
First of all no offense meant. That i clearely undestand, and is the same from my side. My question was only in the direction to learn how to optimize the host, realy the numbers bug´s my mind, even with the slow problem (i will make some test during this week to see what i could do) the diference of credit on the allready processed WU (on the same host) seems totaly wierd, more processig time get less credit is what i can't understand. The 3x670 host actualy is a 690+670 host. A 670 is faster than 1/2 690 so the RAC diferences are not so big. The APR on this host is 193.14636426995 about 10% faster than the 2x690 host. I will change later (the host is on a remote location) the configuration on the 2x690 host and free 4 cores to feeds the GPU´s let´s see what i get. But that rises the question, why works fine before with V6 and don´t work with V7? |
Mike Send message Joined: 17 Feb 01 Posts: 34257 Credit: 79,922,639 RAC: 80 |
First of all no offense meant. Its not a does work or does not work situation. Its a optimizing thing. It works but not perfect. V7 uses more CPU cycles than V6 does. So if 4 GPU instances start at the same time there is simply a lack in feeding the GPU`s. But this is a long term thing. I also freed one CPU core more since V7 release and as you can see i dont suffer anything. You need patience and try for weeks not days. With each crime and every kindness we birth our future. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13731 Credit: 208,696,464 RAC: 304 |
You need patience and try for weeks not days. Almost (but not quite) 2 months now. RAC is still falling. Grant Darwin NT |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
So if 4 GPU instances start at the same time there is simply a lack in feeding the GPU`s. That´s is very unlikely, this hosts runs 24/7 so if that happening sure is very rare and must be a hell of coincidence we look exactly that WU. I also freed one CPU core more since V7 release and as you can see i dont suffer anything. I could understand that but you don't have a 2 x very fast GPU on the same host, so is easy not to suffer too much, faster is your GPU´s bigger is the fall, that what i see on the field, just look the posts. Who runs most CPU work suffer a lot less than us who depends on GPU´s work. You need patience and try for weeks not days. I know that, an totaly agree, to run SETI patience is always needed, to many problems and to few suport/feedback from the lab, the only things that allready helps is the volunteers like you who try to help the community. Thanks for your help, let´s see what i get. |
Mike Send message Joined: 17 Feb 01 Posts: 34257 Credit: 79,922,639 RAC: 80 |
Like you said i only have 1 GPU and it happens at least 4 times a day that 3 instances starting at the same time. I only noticed that because of those happenings. With each crime and every kindness we birth our future. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Besides when i do a reboot of the host (something very rare normaly just when windows or nvidia made some update) i not realy see more than 2 WU starting at some determinate time and normaly takes few seconds to DL the data from the HD to the GPU to crunch could be because i use a Intel CPU diferent from the AMD you use and my CPU load for others task (beside boinc task itself) is very very low (normaly less than 2% - 5% top), i could said this is almost a crunching only host. But i will try to free 2 more cores (actualy 2 cores are allready free) latter ASAP when i arrive to the remote location, i don´t like to make any changes in the configuration remotely, you know sh*** happenigs... and our friend Murphy is allways around. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.