Message boards :
Number crunching :
AstroPulse Work Fetch Thread
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
1) I don't see how 10 days cache setting could prevent GPU from getting work if no bugs involved. If project adds additional limitations like current 100 per device cache should fill up to 100 tasks and being keeped near 100 tasks always, that's what 10/10 setting should imply. And definitely not any complications regarding CPU + GPU work receiving in single request. Exactly. The question originally arose as a potential case of the server 'blocking' or 'refusing' GPU requests when coupled with CPU requests. I think we've checked that out thoroughly, and seen no evidence of a bug of that nature. Any imbalance in work allocation is due to problems of supply and demand - there isn't enough AP work in existence to fill every request on every occasion. On that basis, I see no issue that I need or want to report to the project team (or to BOINC), and I'll bow out of the conversation here. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
This would be Hilarious if it wasn't so depressing. Let me see if I have this straight. I took the time and trouble to run a test, as requested, and now I'm catching flak over it? People are arguing over cache settings? I have a stalker doing nothing but deliberately insulting me? You can't make this stuff up. Have at it boys I hope you appreciate the test. If you want another, I'll run it then step back and observe. *cheers* |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I think we've checked that out thoroughly, and seen no evidence of a bug of that nature. Any imbalance in work allocation is due to problems of supply and demand - there isn't enough AP work in existence to fill every request on every occasion. On that basis, I see no issue that I need or want to report to the project team (or to BOINC) Good if so. That means server is able to fulfill CPU and GPU work request in single communication transaction provided there is work to send available for both devices, right? But also it means client should continually (each 5 mins currently) ask server if GPU work queue less than 100 tasks. Right? Or some additional backoffs included I didn't account for? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Well I see some counter-productive behavior in provided log: Wed Oct 29 21:56:33 2014 | SETI@home | [sched_op] estimated total ATI task duration: 0 seconds That is, client will not ask for GPU work LONGER than server-imposed delay. It's not wise behavior when device sits idle. It can be allowed only if GPU has enough work in cache to sustain few more empty work fetches. Does it the case? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I see no issue that I need or want to report to the project team (or to BOINC), and I'll bow out of the conversation here. Could you please point place in provided logs that made you think so? |
W-K 666 Send message Joined: 18 May 99 Posts: 19014 Credit: 40,757,560 RAC: 67 |
This would be Hilarious if it wasn't so depressing. I presume you are talking about me. Well go back and read every post about cache settings it has been discussed many times. And you will find that trying to set a large cache, especially when the project cannot fairly grant you your wishes, is doomed to failure. I have tried to show you what the better settings are, and you choose to reject them. Be it on your own head then, you will be the one waiting for your gpu's to start getting work. We all know BOINC has it's flaws, and wish it was re-coded each time there is a technology advance. It is not going to happen, from personal experience I know that when it was first released BOINC was designed for single cpu personal computers. I was running a dual P3, before P4 HT computers, and the designer struggled then to fix flaws when one wanted to run two projects at the same time. The only way then to run, with the best efficiency was with ZERO CACHE. And don't try to tell me how to do testing, as you're are running at Beta also, you should recognise the name. So if you are going to do testing put some thought in and not try to run BOINC beyond it's limits, you will run into problems that are probably not going to be fixed in the foreseeable future. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
This would be Hilarious if it wasn't so depressing. You still don't understand I run a MAC do you? Do you realize the Only App a MAC with ATI cards can run is APs? Do You Realize APs are Hit and Miss? That means if you want to keep it running as long as possible you Fill the Cache when possible. I see you have two Window 7 machines, ever try to keep a MAC filled with the only task the cards will run? Please, enough with the cache talk. I've run this machine for a couple of months and didn't bring up the subject until other users started asking why they are only receiving CPU AP task. Apparently very few people around here could give them the Correct answer. Well I just did. The Server changed to filling the AP CPU cache First. How many people in this thread gave them the correct answer? Seems most of the people here didn't have a clue. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I think we've checked that out thoroughly, and seen no evidence of a bug of that nature. Any imbalance in work allocation is due to problems of supply and demand - there isn't enough AP work in existence to fill every request on every occasion. On that basis, I see no issue that I need or want to report to the project team (or to BOINC) Wed Oct 29 21:56:33 2014 | SETI@home | [sched_op] estimated total ATI task duration: 0 seconds That is the standard BOINC client backoff that applies across all applications, all projects. You can see it lower down in the log, as well: Wed Oct 29 21:56:38 2014 | | [work_fetch] --- state for ATI --- "inc" (for increment) is the nominal baseline for the resource backoff: the actual value is given a randomisation tweak (up or down) to avoid hosts getting into lockstep and always hitting the server at the same time. This backoff prevents the client requesting work for a while, when there is no work to be had. It's one of the reasons why I tried so hard in the original thread to get TBar to be clear and precise in his use of language, and to post enhanced debug logs relevant to his concerns. Now that the debug data is available for all to see, we can have more constructive conversations like this. The 'resource backoff' is set - by the client, please remember that - whenever the server replies 'no tasks' to a request for work for that resource. So yes, it's very relevant to TBar's original concern - but let's please make an effort to understand how and why BOINC works the way it does, before asking for changes to suit a particular special case. There are projects out there - LHC is a good case in point - which have batches of workunits available sporadically, but often no work available at all. Even LHC's BOINC administrator doesn't know when work might be available - the task batches are submitted remotely by scientists working on the LHC accelarator upgrade. They don't need simulations every day, but when they have work to be done, they'd like it to be done quickly. This log for LHC shows a lot of the features of the backoff system in action. 30/10/2014 12:09:23 | | [work_fetch] --- state for Intel GPU --- LHC is a CPU-only project, so it never issues iGPU tasks - that's why the backoff increment has reached the maximum value of 86,400 seconds (24 hours) This request was initiated because of the CPU shortfall - but note that the BOINC client has piggy-backed an iGPU request on the RPC while the channel is open - just in case a new application has been deployed since we last asked. As it happens, there are no CPU tasks this morning either, so the client has started the CPU backoff at the minimum 600 seconds (randomised slightly downwards, in this case). 'inc' will be doubled at each successive work request failure, until the maximum of 86400 is reached. LHC scientists want the work processed quickly once it becomes available, so they have set a server backoff of just 6 seconds. If there was no other limiting factor in place, clients which were idle between batches of work could hammer the server 10,000 times a day with fruitless (and pointless) requests for work. I don't know the details of the remote job submission process, but when the scientists submit a new batch of work, I doubt it all appears instantaneously on the server as available for download. Just like our own splitters, it'll take a finite amount of time for the full set of tasks to appear. Having all active hosts contact the server at random intervals of up to 24 hours apart practically guarantees that the first few tasks will be collected within seconds of the batch becoming available, but the server isn't crashed by a massive DDOS attack before the batch preparation is complete. And so on. LHC's batch system requires a different systems analysis approach from SETI's steady-state (for MB, anyway) non-urgent processing, and the two servers have been configured differently by the projects' respective administration teams. Those tools (and there are more that I haven't touched on in this very abbreviated summary) are used by server administrators in the light of their experience and knowledge of their own particular project. I think you'll find that requests for turning and tweaking particular knobs are better received if you show some understanding of the complex and difficult sets of interrelationships that the server team are managing. |
W-K 666 Send message Joined: 18 May 99 Posts: 19014 Credit: 40,757,560 RAC: 67 |
The Server changed to filling the AP CPU cache First. In my experience BOINC has always filled the cpu cache first. GPU's were added later and therefore is probably 2nd in the queue, I doubt if they even thought to check which part of the computer was the hungriest. If I missed anything, it is because to put it bluntly I don't come to Number Crunching that often these days, only to see if other people notice server issues, or like, at the moment when new apps are introduced. WHY? Because I am fed up of repeating the same old advise, that some people choose to ignore, because they think they know better, every few months. P.S. If you do find anything and submit a report to Dr A, His response will be, as always, he doesn't expect you A) to keep a large cache and that B) you should be running multiple projects and applications and C) you should only be running BOINC/Projects on a computer that has spare cycles when it would normally be switched on for business or pleasure. Not to specifically run BOINC. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The Server changed to filling the AP CPU cache First. Yes, I've heard about DA before, it's sad. Well, I've been running this Mac from Download to Download since the Beta ATI App was released almost 17 months ago. Believe me, the Server use to fill the GPU cache first, and most of the resends went to the GPUs. I use to have to jump through hoops to get a couple CPU tasks to run because it would sometimes be a day before the GPU cache was full and it would start downloading CPU tasks. Now it's the other way around. That's alright, 17 months of trying to keep my Mac running has taught me how to compensate in a number of ways. But others haven't gone through that particular school. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
Yesterday after AP work started going out I checked my two i5-4670K systems with BOINC 7.2.42 to see how they were getting work. Most requests would only send 3 or 4 tasks at a time. Sometimes they would be all CPU, sometimes they would be all ATI, but most of the time it would be a mix of CPU & ATI. I didn't have any extra debug flags set as this was more of a "let's see what BOINC does with how I'm setup now" test. When I go home for lunch I'll setup a new BOINC instance with debug flags set to see how it operates from the perspective of a new host. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
LHC's batch system requires a different systems analysis approach from SETI's steady-state (for MB, anyway) Very true. Apart the fact that LHC-related tuning became part of overall client behavior, not project-specific one. And this I consider as wrong move. If they need such behavior they should make it available as project-specific settings (backoffs and so on). In AstroPulse case client needs to ask as often as server allows. And that LHC-inspired backoff prevents it. Hence, backoffed by clients, not server (!) hosts can miss rare AP tasks ALWAYS at some circumstances. Not good. But it's just collateral issue regarding main issue discussing here: to fill CPU preferably over GPU. Though such backoffs can add to this issue too. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Just thought I'd mention I'm still working on filling my Mac's cache. It's been over a day now and it's just gone above 300 on it's way to 400. Don't worry, it's also been this way for a while now. It's just another one of my observations from repeated events I though someone may be interested in. It will get there eventually. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
Didn't this discussion start when someone's host received work only for CPU even when the GPU was idle? I believe it has always been considered a bug when the server allocates work only for a non-idle resource when the host requests work for both non-idle and idle resources. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Didn't this discussion start when someone's host received work only for CPU even when the GPU was idle? I believe it has always been considered a bug when the server allocates work only for a non-idle resource when the host requests work for both non-idle and idle resources. There are 2 related events at play. This thread was about the delay in receiving GPU work when starting from an empty/near empty cache. If you just wait long enough or change your settings you will eventually receive GPU work, at least I always have. The other is when a new app version is released and you have to wait until you have 11 *valid* completions before the time estimates reduce from extremely high, to some, values. In those cases some people receive only enough gpu tasks to start the gpus then start receiving cpu tasks with estimates that send the cpus into panic mode. There have been cases where GPUs have been stopped due to cpus being in panic mode. It is also very difficult to download any new tasks once the CPUs are in panic mode. I've been able to work around this issue by entering a flops setting in my app_info file. I assume if you wait long enough, maybe a week, maybe more, you will eventually receive gpu tasks. I, for one, have experienced both of these events. It would also appear that a few others have as well. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
BTW, I was going to run another test with a Host running XP that hasn't completed any APv7 tasks to see how it worked with the CPU panic. It's the same machine that runs Ubuntu and Vista. I already know how Vista responded and since entering a flops setting cured it, I have a good idea of the problem. If it would help in some way I'll run the test. I'll have to stop Ubuntu and remove the ATI 7750 to run XP, but I'm willing if it would help. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
For anonymous platform hosts, the method of choosing the best app version has not fundamentally changed. The primary decision is based on the "projected flops" value for the version, and that value is essentially the same as the "Average processing rate" (APR) shown on the host's Application details page. For TBar's Mac with the AP v7 CPU APR at ~53 GFLOPS and the AP v7 GPU APR at ~1020 GFLOPS any work request for both CPU and GPU work should get GPU work first. If enough GPU tasks were chosen to get to the 100 limit per GPU, only then should tasks for CPU be sent. That's from my review of the current BOINC source code. The scheduler reply files say "<scheduler_version>705" which only says the server build was made from source since the minor version was bumped to 5 last June. It is possible there was some temporary bug no longer seen in the source, or the compiler produced code which doesn't always work, or any other example of Murphy's Law. There is of course a lot more detail surrounding the choice of best version. Much of that relates to BOINC options not used here, but the problem could possibly be caused by some logic which ought not be active. And finally, it's worth mentioning that there's a tie-in to sched_customise source which a project can use to modify how the Scheduler works; that's how .vlar MB tasks are kept off GPUs, for instance. In short, looking at the source code has confirmed that the BOINC developers have not reversed their basic scheme of delivering work to the resource which can do it quickest. It did not reveal what is bypassing that basic scheme. Joe |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
BTW, I was going to run another test with a Host running XP that hasn't completed any APv7 tasks to see how it worked with the CPU panic. It's the same machine that runs Ubuntu and Vista. I already know how Vista responded and since entering a flops setting cured it, I have a good idea of the problem. I didn't end up setting up my system to test until I got home from work. Instead of at lunch like I had planned. So host 7421701 is now, after a server hiccup, moving right along. With cache settings of 10 & 10. The first request gave me 4 CPU task. Then the second netted 2 GPU &then 3 more GPU on the 3rd request as well.The fourth resulted in 3 GPU & 1 CPU. With CPU estimates at 72 hours it did fill up on CPU eventually. At the moment the CPU tasks are running in High Priority & BOINC is still requesting more GPU work. Which looks like this: 30-Oct-2014 22:04:50 [SETI@home] update requested by user SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
BTW, I was going to run another test with a Host running XP that hasn't completed any APv7 tasks to see how it worked with the CPU panic. It's the same machine that runs Ubuntu and Vista. I already know how Vista responded and since entering a flops setting cured it, I have a good idea of the problem. That's nice, so why do you suppose my 3 Core2Quads were so different? You saw all 3 of them loading nothing but CPU tasks. They did start loading GPU tasks after the CPU run finished. The results from one Core2Quad is above. The major difference would appear to be you have a newer/faster CPU and just one card. That card seems to have a much faster FLOPS listed in the results - 4,032.00 GFLOPS. Mine are lower even though one is the same card, Device peak FLOPS 1,008.00 GFLOPS, http://setiathome.berkeley.edu/result.php?resultid=3809653402 Other differences are my Vista machine had a CPU estimate of 180 hours where you say yours was 72 hours, and the Vista GPU says 422.26 GFLOPS. I have to agree with Joe, my Mac should have been receiving GPU tasks but it wasn't. It use to, up until that day Claggy remarked about the change, I had witnessed the change myself and was glad someone else noticed it. It's been that way ever since. I guess I'm going to have to run that XP test now. On second thought, I just checked my Mountain Lion Host. That Host has only one completed GPU APv7 and No CPU APv7s. It would be much easier to test that machine. |
HAL9000 Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57 |
I didn't end up setting up my system to test until I got home from work. Instead of at lunch like I had planned. So host 7421701 is now, after a server hiccup, moving right along. With cache settings of 10 & 10. The first request gave me 4 CPU task. Then the second netted 2 GPU &then 3 more GPU on the 3rd request as well.The fourth resulted in 3 GPU & 1 CPU. With CPU estimates at 72 hours it did fill up on CPU eventually. It is curious as to why the behavior is different. I could run a similar test on my i7-860 w/ a 5750 or my i3-390m w/ a 6360m if you think the slower CPU might be a factor. I could always load up Vista on the i7 as well, but I really don't think the OS should be a factor. Unless there is code in the BOINC Client to act differently with different OS versions. I'm not sure which mechanism is responsible for generating the Device peak FLOPS, but bumping the GPU clock will give a higher value in the results. HD 6870 @ 4256 GFLOPS with a 950MHz clock. Mine is displaying twice what it should & yours is showing half of what it should. Looking at some old AP v6 results with Cat 11.12 my GFLOPS looks like it is always double what it should be. HD 5750 @ 2016 GFLOPS & HD 6870 @ 4097 GFLOPS Also the 6870 GPU estimates for my host are 00:50:32. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.