GUPPI Rescheduler for Linux and Windows - Move GUPPI work to CPU and non-GUPPI to GPU

Author	Message
Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1809211 - Posted: 15 Aug 2016, 4:22:57 UTC - in response to Message 1809202. Last modified: 15 Aug 2016, 4:32:45 UTC I am seeing almost all of the Arecibo nonVLAR work get assigned to the GPU's in the past couple of days. Mr Kevvy's .exe will *always* transfer ALL nonVLARs to GPU. This is a known issue for PCs that have 6+ CPUcores crunching Guppis, and a non-highend GPU. This was the case with my HP Z400 with Xeon W3550 (8 cores with HT on) and one GTX 750 Ti. The GPU queue can grow by 100 on some days. But, my test have shown (from the thread that was removed due to the ongoing "August 2016" saga...and I'm fine with that): the Boinc client refuses to ask for more tasks when it has reached 1,000 tasks (or slightly more). So any PC processing more than 100 tasks/day can NEVER exceed a stash of 10 days...which means Hoarding will never apply to even a faulty DQO that only reassigns tasks to another queue. (But a Core2Quad with anything less than a GTX750 could have a stash >10days) I haven't dropped in on the developer forum lately and don't know if any changes have been implemented in the project code. Richard Haselgrove is always on top of this. Maybe he will chime in on this thread or I should contact him directly and see if anything has been afoot by D.A. Or could be simple coincidence. If you're running "Anonymous Platform", then the only things that can be changed are at the server end/side (other than the server sending you a few nonTask files that your Boinc Client would likely ignore). If Richard is the go-to guy for such info, please ask him as it might partially answer my unresolved thread that has fallen off the 1st page of NC. Keep in mind that this current thread is still slightly taboo ...and now for 2 reasons: DQO & me! lol It might be better to PM him or start a new thread (with a link to my unresolved thread). For a new thread, I would suggest to concentrate your Q on the future, such as: When will the Guppi to nonVLAR ratio be increased in favour of more Guppis? Cheers "nefarious" DQOers! lol RobG :-} ID: 1809211 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1809213 - Posted: 15 Aug 2016, 5:08:54 UTC - in response to Message 1809193. Last modified: 15 Aug 2016, 5:10:35 UTC Does this look normal? I looked at your image again and just noticed the "Waiting to run" guppis at the top. I'm guessing these might have started on the GPU when you ran the DQO and are now waiting for their new priority in the CPU queue to continue them soon. If that's the case, then this is a scenario where the DQO should have been run earlier than it was. (I know you just started using it so this is normal) Keep in mind, it could also be an indication that there was already 100 guppi in the CPU queue when you last ran the DQO, and it left some guppis in the GPU queue (if there was >100 on the PC). FYI, I find BoincTasks much easier to use than the Boinc Manager when dealing with more than 200 tasks since it groups the tasks by DeviceQueue, making it easier to see a guppi in the GPU queue and a nonVLAR in the CPU queue. R :-} ID: 1809213 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1809217 - Posted: 15 Aug 2016, 5:29:07 UTC Last modified: 15 Aug 2016, 5:30:56 UTC Stubbles, I have been getting about 5:1 Guppi to nonVLAR tasks since the Guppis started flowing. This is just the luck of the draw from when I request tasks from the scheduler and pull from the 100 task server buffer. I looked at the SVN repository and didn't see any changes checked in recently that might account for the change I'm seeing in nonVLAR tasks. In my case and opposite to yours, almost all of my nonVLAR, Arecibo work is getting assigned to the GPU's by the server by default. That is before I run the script and app. That is why I don't see many tasks moved each day when I run the script. When I first started using the script a few weeks ago, I was getting 30-70 tasks swapped and reassigned each day. Its between Jason, Raistmer, Petri, Juha and Eric and the rest of the developers to plan out where the project needs to go with regard to optimum resource utilization. They are planning big changes in cross system development where they only need to write code once and apply it across all hardware platforms without having to resort to hand writing-debugging for every new piece of hardware or device driver change. This is not a simple or easily accomplished task and will take years likely to come to fruition. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1809217 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1809231 - Posted: 15 Aug 2016, 6:42:30 UTC - in response to Message 1809217. Last modified: 15 Aug 2016, 6:43:41 UTC Stubbles, I have been getting about 5:1 Guppi to nonVLAR tasks since the Guppis started flowing. This is just the luck of the draw from when I request tasks from the scheduler and pull from the 100 task server buffer. I looked at the SVN repository and didn't see any changes checked in recently that might account for the change I'm seeing in nonVLAR tasks. In my case and opposite to yours, almost all of my nonVLAR, Arecibo work is getting assigned to the GPU's by the server by default. That is before I run the script and app. That is why I don't see many tasks moved each day when I run the script. When I first started using the script a few weeks ago, I was getting 30-70 tasks swapped and reassigned each day. On the server status page, there is only 1 scheduling server (called: synergy). So (unless it discriminates based on IP or some other param), over the course of a week, our ratios should be fairly close. If it does discriminate on IP (or lets say total credits as an example), that could explain our different observations (currently not covering the same time-period). Very strange! ID: 1809231 ·

Mark Stevenson Volunteer tester Send message Joined: 8 Sep 11 Posts: 1736 Credit: 174,899,165 RAC: 91	Message 1809232 - Posted: 15 Aug 2016, 6:53:24 UTC - in response to Message 1809231. Last modified: 15 Aug 2016, 7:05:00 UTC On the server status page, there is only 1 scheduling server (called: synergy). But there's TWO download servers called georgem and vader , think you will find the scheduling server tells the download servers to sent out the wu's to the host that's asking for work , and think that they work on a "rota" switching between the two different machines after a certain amount of time . The download servers can only download what work that has been split at the time depending on what "tape" they are splitting at that moment could be from Green Bank or Arecebo , all the hardware knows is theres a tape to be split Life is what you make of it :-) When i'm good i'm very good , but when i'm bad i'm shi#eloads better ;-) In't I " buttercups " p.m.s.l at authoritie !!;-) ID: 1809232 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1809240 - Posted: 15 Aug 2016, 7:46:56 UTC - in response to Message 1809231. Last modified: 15 Aug 2016, 7:47:21 UTC On the server status page, there is only 1 scheduling server (called: synergy). So (unless it discriminates based on IP or some other param), over the course of a week, our ratios should be fairly close. If it does discriminate on IP (or lets say total credits as an example), that could explain our different observations (currently not covering the same time-period). Very strange! The mix of tasks you get when you request work is just the luck of the draw. Think of how many hundreds of requests the download servers get every minute. They just pull from synergy, the scheduling server, which only has a 100 task buffer at any time. It gets emptied by the download servers and constantly refilled by the splitters. The splitters have a never ending rotation of tapes between the two scopes. Only over a very long period, because of the large number of tasks it serves up would the chance of getting either a nonVLAR or Guppi would equate to the 50-50 chance of flipping a coin. Its just my luck of getting mostly nonVLAR's lately and you getting mostly Guppi. Three months later, the ratio could flip and we would get exactly the opposite mix. There is no discrimination going on, its just probabilities coming into play. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1809240 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22190 Credit: 416,307,556 RAC: 380	Message 1809248 - Posted: 15 Aug 2016, 8:47:13 UTC The split of VLARS/non-VLARS is down to what is on the tapes, and that is outside the control of the project. With the GBT ("guppi") and Arecibo data there is a tiny bit of control in the project's hands - they can decide what pile of tapes to draw from next, however we are going to see a decrease in the number of "Arecibo" tapes in the foreseeable future - and these tapes are the source of the majority of non-VLAR data. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1809248 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1809255 - Posted: 15 Aug 2016, 9:26:46 UTC - in response to Message 1809240. Richard Haselgrove is always on top of this. Maybe he will chime in on this thread or I should contact him directly and see if anything has been afoot by D.A. Or could be simple coincidence. The mix of tasks you get when you request work is just the luck of the draw. Think of how many hundreds of requests the download servers get every minute. They just pull from synergy, the scheduling server, which only has a 100 task buffer at any time. It gets emptied by the download servers and constantly refilled by the splitters. The splitters have a never ending rotation of tapes between the two scopes. Only over a very long period, because of the large number of tasks it serves up would the chance of getting either a nonVLAR or Guppi would equate to the 50-50 chance of flipping a coin. Its just my luck of getting mostly nonVLAR's lately and you getting mostly Guppi. Three months later, the ratio could flip and we would get exactly the opposite mix. There is no discrimination going on, its just probabilities coming into play. No, there hasn't been any significant change in the scheduler code recently - what little BOINC development work that has been done in the last few months has mainly been to support newer versions of VirtualBox for CERN. I'd agree with the second quote I've pulled out of the discussion. It's random. You say it could be different three months later: I'd add that it could be different three weeks, three days, three hours, three (well, five) minutes later. I've been watching three identical machines, and distributing VLARs to SoG and Arecibo to cuda. Sometimes one machine will have a surfeit of Guppies, another will be stuffed with Arecibo. By the next day, they may have swapped places - or they may not. The one thing I can say is that they're all on the same router, so have the same IP address as far as the server infrastructure is concerned. Any suggestion of discrimination by IP is a paranoid conspiracy theory too far. The staff simply don't have the time, or more importantly the inclination, to micro-manage the system in that way. They have set the system up to manage itself 24/7, while they work 9-5 at their day jobs and relax at home after hours. They probably only spend a few minutes a day checking that no alarms have gone off anywhere: provided they're getting something close to the two million results they expect every day, they don't obsess about the fine detail. They leave that to us... ID: 1809255 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1809256 - Posted: 15 Aug 2016, 9:30:28 UTC - in response to Message 1809255. They leave that to us... Well said! Thanks Richard ID: 1809256 ·

Shaggie76 Send message Joined: 9 Oct 09 Posts: 282 Credit: 271,858,118 RAC: 196	Message 1809287 - Posted: 15 Aug 2016, 12:28:18 UTC I would feel a lot better about tools like these if the server was properly informed about the switch (ie: the recorded stats are wrong and record that the GUPPI ran on your GPU). If the accounting was accurate I'd be all for these tools (and even help work on them). ID: 1809287 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1809290 - Posted: 15 Aug 2016, 12:32:50 UTC - in response to Message 1809287. I would feel a lot better about tools like these if the server was properly informed about the switch (ie: the recorded stats are wrong and record that the GUPPI ran on your GPU). If the accounting was accurate I'd be all for these tools (and even help work on them). It was only after Mr Kevvy's script went public that I realized the reporting issue to the upload server. I'd like to submit a potential bug report...but where do I do that? ID: 1809290 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1809293 - Posted: 15 Aug 2016, 12:38:39 UTC - in response to Message 1809287. I would feel a lot better about tools like these if the server was properly informed about the switch (ie: the recorded stats are wrong and record that the GUPPI ran on your GPU). If the accounting was accurate I'd be all for these tools (and even help work on them). Until then, how about I direct your eyes to: msg 1809207 above. More specifically: I had already asked Shaggie if he could do a script to find out what is the contribution of the daily project output: (of ~150G cr/day) for: - the top 10,000 PCs; and I'll now add: - the "Anonymous Platform"s within the top 10,000 PCs. ...but he was too busy back then with his incredible GPU comparison charts. Maybe someone else might want to tap his sholder to see if he has more time and interest now. Consider yourself tapped! RobG :-D ID: 1809293 ·

Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482	Message 1809300 - Posted: 15 Aug 2016, 13:05:11 UTC - in response to Message 1809207. Last modified: 15 Aug 2016, 13:13:52 UTC ...and until Jason has worked his magic on the guts of BOINC and that gets built into the system by default. Could you expand on this Al? I thought Jason was working on Cuda...or are you referring to another Jason? Sorry, my bad, I misspoke about that, you are correct, he is working on the CUDA side of things. Wishful thinking, I guess. Seeing that BOINC bas become such a big and multi project thing, is there anyone on a dedicated team (of 1+ ppl?) that works on this software on a regular basis? I believe I read that it got moved from the pervue of Mr. Big to a committee late last year or early this year, and from my experince in working for businesses large enough to need committees, they most often produced varying amounts of heat, but usually very little light. If this is true, I hope BOINC doesn't fall down that rabbit hole. Ish. edit Just ran it again, this time is swapped 75 tasks back and forth, and this time I have 23 tasks in the Waiting to Run status. So yeah, I think I would def need to run this more often. I just finished reading thru all the responses since my post, and I'm not sure if I got the exact answer yes or no about if it is possible to run it too often, and would cause any harm if it was? I would just suppose that it would only say 0 tasks moved, as there wasn't anything for it to do? Thanks! ID: 1809300 ·

Stubbles Volunteer tester Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0	Message 1809308 - Posted: 15 Aug 2016, 13:25:53 UTC - in response to Message 1809300. edit Just ran it again, this time is swapped 75 tasks back and forth, and this time I have 23 tasks in the Waiting to Run status. So yeah, I think I would def need to run this more often. I just finished reading thru all the responses since my post, and I'm not sure if I got the exact answer yes or no about if it is possible to run it too often, and would cause any harm if it was? I would just suppose that it would only say 0 tasks moved, as there wasn't anything for it to do? Thanks! Exactly: 0 + 0 I tried that testing scenario (sorry I didn't mention it). The only problem I encountered is when I launched my front-end script twice by error; the 1st one that ended restarted the client while the second one was still processing the client_state.xml file. But since the transfer was 0 + 0, it didn't change a thing. That reminds me that I never tested that with a client_state.xml that has tasks to reassign. But there's always a backup of client_state.xml that can be restored if someone starts the front-end script twice. Now I remember why I only like programming MS Access prototype databases; it doesn't have to be perfect! You only need to convince the stakeholder to finance the project past proof-of-concept. lol RobG :-) PS: T -2h35mins in case anyone isn't looking at https://www.seti-germany.de/Wow/ ID: 1809308 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1809312 - Posted: 15 Aug 2016, 13:30:49 UTC - in response to Message 1809300. ...and until Jason has worked his magic on the guts of BOINC and that gets built into the system by default. Could you expand on this Al? I thought Jason was working on Cuda...or are you referring to another Jason? Sorry, my bad, I misspoke about that, you are correct, he is working on the CUDA side of things. Yes, in the present tense, CUDA is his thing. But he did (historically - around 2013) look in some detail at the server runtime estimation code, which is tightly bound with CreditNew - and he referred to that today, in message 1809230. Wishful thinking, I guess. Seeing that BOINC bas become such a big and multi project thing, is there anyone on a dedicated team (of 1+ ppl?) that works on this software on a regular basis? I believe I read that it got moved from the pervue of Mr. Big to a committee late last year or early this year, and from my experince in working for businesses large enough to need committees, they most often produced varying amounts of heat, but usually very little light. If this is true, I hope BOINC doesn't fall down that rabbit hole. Ish. There was at one time a list of some 140 volunteers who had contributed to BOINC code, but the number of active contributors has dropped to (probably) single digits, and the largest single group of them seem to work for the Einstein project. Those people are the best route in, but there's a strong bias against "not invented here" code, which is hard to overcome. Speaking bluntly, unless you speak David Anderson's sort of language (and sadly, I don't think Jason's input this morning is likely to pass that test), there's very little chance of getting major functional changes accepted into the master codebase. ID: 1809312 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1809348 - Posted: 15 Aug 2016, 14:56:00 UTC - in response to Message 1809193. Last modified: 15 Aug 2016, 15:15:06 UTC [quote] edit Just ran it again before I head upstairs (first time was ~3 hours ago), it moved 24 tasks each way, which is about 1/2 of the currently running tasks. Wonder if it needs to be ran more frequently? I have these tasks now waiting to run, after running the script again. Does this look normal? . . I have found that on restart CPU tasks that were running before the "requeue" took place take a while to kick off, and in that delay period other tasks start up instead. This leaves the first tasks in the waiting to run status but they will be the "next cabs on the rank" and kick off OK when their deposing tasks are finished. ID: 1809348 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1809354 - Posted: 15 Aug 2016, 15:14:44 UTC - in response to Message 1809287. I would feel a lot better about tools like these if the server was properly informed about the switch (ie: the recorded stats are wrong and record that the GUPPI ran on your GPU). If the accounting was accurate I'd be all for these tools (and even help work on them). . . So is there any way to tell the scheduler that this is in fact the case? That job "nnnnnnnn" was distributed as a GPU task but executed on CPU instead? In the meantime it does not seem to interfere in WU distribution. ID: 1809354 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22190 Credit: 416,307,556 RAC: 380	Message 1809369 - Posted: 15 Aug 2016, 15:54:16 UTC The data to which type of processor a task was sent is held in the database, but this is not compared to the type of processor on which the task was run for any "effective" purpose. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1809369 ·

Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482	Message 1809399 - Posted: 15 Aug 2016, 17:20:59 UTC - in response to Message 1809312. Last modified: 15 Aug 2016, 17:22:36 UTC There was at one time a list of some 140 volunteers who had contributed to BOINC code, but the number of active contributors has dropped to (probably) single digits, and the largest single group of them seem to work for the Einstein project. Those people are the best route in, but there's a strong bias against "not invented here" code, which is hard to overcome. Speaking bluntly, unless you speak David Anderson's sort of language (and sadly, I don't think Jason's input this morning is likely to pass that test), there's very little chance of getting major functional changes accepted into the master codebase. Thanks for the info, Richard. So from that, could I take that Einstein is the 'Glamour' project right now, and even though SETI is the 'old man' around here (and the biggest still?), it's sort of the tail wagging the dog so to speak, they have more devs, are more 'popular', so they get the changes that they want, as long as they develop them? Are they the keepers of the keys, I'm just curious how enhancements and updates are allowed into the program, especially ones that effect our little corner of BOINC? Does DA have much of an interest in SETI any longer, or are we now sort of the ugly redheaded stepchild of BOINC? ID: 1809399 ·

Al Send message Joined: 3 Apr 99 Posts: 1682 Credit: 477,343,364 RAC: 482	Message 1809405 - Posted: 15 Aug 2016, 17:31:19 UTC Oh, and just ran it again, this time it was 19 tasks moved each way, and 9 tasks are waiting to run. One question, before running it, I manually shut down BOINC, but it seems to shut down BOINCtasks as well? The last line in the window says something about restarting BOINC, but it doesn't restart, at least it's not in the taskbar at the bottom. I need to restart both programs. Is this normal? ID: 1809405 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.