Is it possible to swap a guppi assigned to GPU with a Arecibo assigned to CPU?

Message boards : Number crunching : Is it possible to swap a guppi assigned to GPU with a Arecibo assigned to CPU?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1798552 - Posted: 24 Jun 2016, 23:51:50 UTC

Hello all,

After having installed Lunatics v0.45 Beta3, I started to wonder if it might be possible to reassign WUs from CPU to GPU and vice versa since the setup had reassigned Cuda labeled WU to be crunched by SoG.
As the post title implies, it would be to swap guppi...vlars assigned to GPU with Arecibos WU assigned to CPU in order to optimize crunch times both ways with nvidia GPUs.

If there are other planned initiatives to resolve the current situation, please let me know.
Rob :-}
ID: 1798552 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1798559 - Posted: 24 Jun 2016, 23:57:34 UTC - in response to Message 1798552.  
Last modified: 25 Jun 2016, 0:32:14 UTC

Indeed it is... I have been working on a program to do just this. It works fine and dandy in Linux but because of SoG changing the version numbers I have to get it working 100% in Windows and test.

I'm hoping that will finally be done this weekend. I will be starting a thread here for it. It's a simple command-line utility that needs to be run manually after closing BOINC, but it doesn't need to be done more than a few times a day and the command window can be left open so it's not much work.

Good timing asking this. :^)

Edit: I may PM you... actually I am pretty sure of it... lol... when I think it's ready for actual use to give it a try. If not the thread will be called "GUPPI Rescheduler". Short of witty names these days.
ID: 1798559 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1798566 - Posted: 25 Jun 2016, 0:28:27 UTC - in response to Message 1798559.  

WoW! ...looking forward to it :-D
ID: 1798566 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1799187 - Posted: 28 Jun 2016, 8:15:44 UTC - in response to Message 1798559.  
Last modified: 28 Jun 2016, 8:22:20 UTC

You don't need to create such a tool.

If you make a forum search with 'rescheduler' you will find such tool.

But, (after release of CreditNew) the admins aren't pleased about the usage of such tools, because it will screw up (the correct calculation of) the Credit granting for you and your wingmen (-> less Credits).
ID: 1799187 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1799198 - Posted: 28 Jun 2016, 11:04:46 UTC - in response to Message 1799187.  
Last modified: 28 Jun 2016, 11:32:48 UTC

You don't need to create such a tool.

If you make a forum search with 'rescheduler' you will find such tool.


I doubt it... I Googled extensively before working on this. If you're referring to Fred's Rescheduler, it was only hosted at its forum here and appears to have been taken offline as it wasn't maintained so was no longer compatible. So please provide a link.

There's no possible way that I can think of that this will affect anyone's credit when the entire premise it is based on, which is pretty well documented, is that the problem with the GUPPI work units is that they pay the same credit as the non-GUPPIs, but take 3x or more the time to complete on GPUs, thus the large RAC hit and correspondingly much less work done. Again a link to any data supporting this would be appreciated (to add to this, any corroborative statement to the the effect "the admins aren't pleased about the usage of such tools" would be good too. If there was such then I won't be releasing this.)

I did make some progress on the weekend, it now preserves the version number but have to add preservation of the platform name as well. (Also been rather busy including receiving the replacement board and getting the failed machine back.) So it will be another couple of days at least before I may have anything worth putting out for testing.
ID: 1799198 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1799202 - Posted: 28 Jun 2016, 11:26:49 UTC - in response to Message 1799198.  
Last modified: 28 Jun 2016, 11:27:47 UTC

I would say there are quite many urban myths around credit area currently.
Local re-scheduling (that is, inside single host) much more preferable than "global re-scheduling" that constitutes task abortion en masse.
So, if robust re-scheduling tool will exist it would be good.
BTW, old Fred's re-scheduler most probably coul dbe configured for that too. It has configurable tab.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1799202 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1799221 - Posted: 28 Jun 2016, 14:02:25 UTC
Last modified: 28 Jun 2016, 14:12:03 UTC

I don't want to throw oil on the fire.

It was discussed in the forum many times, rescheduling and the resulting screw up of the Credit granting.

Each app have his own 'average processing rate'. E.g. the server send a WU to the GPU, will be calculated on the GPU, maybe 10 minutes. An other WU which was send to your GPU will now be calculated on the CPU (because of the usage of a rescheduling tool), the server (CreditNew calculation) think the WU was calculated on the GPU and is confused because it lasts maybe 5 hours.
You could find the related threads after a forum search.

I don't want to give much tips...
Fred's rescheduler is still available in his forum for download (he made this tool before CreditNew was installed here at SETI@home). He have also a warning near the download URL that it 'may result in less credits, for you or your wing man'.
And an instruction is available in this forum to upgrade the tool for SETIv8.

I was in E-Mail contact inter alia with Eric (SETI@home Director) - long time ago as CreditNew was installed here,
I asked why they don't write 'please don't use rescheduler' in the forum, Eric replied (from my remembering) something like 'that would do/change nothing'.

Maybe if you don't trust my statements, maybe you don't have the E-Mail address of him, here is she.
ID: 1799221 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1799227 - Posted: 28 Jun 2016, 14:31:46 UTC

I thought the Rescheduling Myth had been laid to rest some time ago, back when a couple of users were rescheduling thousands of CPU APs to run on their GPUs. It went on for months without any adverse effects to the granted credit. The only outcome was they managed to raise the CPU APR to the point that the task would time out when they actually did run them on the CPU. If changing the APR affected credit then you would see a large change when changing to running multiple tasks per GPU as that will lower your APR by Half, One Third, or lower depending on how many instances you run.

I don't see anyone complaining about running Multiple tasks per GPU...
ID: 1799227 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1799261 - Posted: 28 Jun 2016, 23:54:36 UTC - in response to Message 1798552.  
Last modified: 28 Jun 2016, 23:56:25 UTC

Hello all,

After having installed Lunatics v0.45 Beta3, I started to wonder if it might be possible to reassign WUs from CPU to GPU and vice versa since the setup had reassigned Cuda labeled WU to be crunched by SoG.
As the post title implies, it would be to swap guppi...vlars assigned to GPU with Arecibos WU assigned to CPU in order to optimize crunch times both ways with nvidia GPUs.

If there are other planned initiatives to resolve the current situation, please let me know.
Rob :-}

Yes, it can be done, while resend lost task isn't active, there is a condition that will trigger it anyway, a 2nd reporting of a completed task,

It involves a backup of the client_state.xml, setting the project preferences to only ask for the device you want the task on, reporting a task, shutting down Boinc,
restoring the client_state.xml, with incremented <rpc_seqno> by 2, and deleting the <file> <workunit> and <result> references from the client_state.xml for the task you want resent,
then starting Boinc and requesting tasks, as long as the device asking for tasks isn't on it's number of tasks limit, the lost tasks should be resent to the device asking for it.

Here's an example of it happening:

http://setiathome.berkeley.edu/forum_thread.php?id=77597&postid=1697668

The double reporting of a task does still trigger a resend lost tasks event:

Here i reported a task and got a task (without resend lost tasks occuring):

02-Jul-2015 08:33:42 [SETI@home] update requested by user
02-Jul-2015 08:33:42 [SETI@home] Sending scheduler request: Requested by user.
02-Jul-2015 08:33:42 [SETI@home] Reporting 1 completed tasks
02-Jul-2015 08:33:42 [SETI@home] Requesting new tasks for CPU
02-Jul-2015 08:33:43 [SETI@home] Scheduler request completed: got 1 new tasks
02-Jul-2015 08:35:16 [SETI@home] Started download of 31dc12ad.30467.464287.438086664204.12.215.vlar
02-Jul-2015 08:35:19 [SETI@home] Finished download of 31dc12ad.30467.464287.438086664204.12.215.vlar

After restoring the reported task back into the CS, and reporting it again (with suitably increased cache values), my four ghosts are resent:

Thu Jul 2 08:44:52 2015 | SETI@home | update requested by user
Thu Jul 2 08:44:55 2015 | SETI@home | sched RPC pending: Requested by user
Thu Jul 2 08:44:55 2015 | SETI@home | [sched_op] Starting scheduler request
Thu Jul 2 08:44:55 2015 | SETI@home | Sending scheduler request: Requested by user.
Thu Jul 2 08:44:55 2015 | SETI@home | Reporting 1 completed tasks
Thu Jul 2 08:44:55 2015 | SETI@home | Requesting new tasks for CPU
Thu Jul 2 08:44:55 2015 | SETI@home | [sched_op] CPU work request: 997588.31 seconds; 0.00 devices
Thu Jul 2 08:44:57 2015 | SETI@home | Scheduler request completed: got 4 new tasks
Thu Jul 2 08:44:57 2015 | SETI@home | [sched_op] Server version 707
Thu Jul 2 08:44:57 2015 | SETI@home | Resent lost task 13se12af.32371.215837.438086664207.12.127.vlar_1
Thu Jul 2 08:44:57 2015 | SETI@home | Resent lost task 19se12af.753.16836.438086664199.12.153_1
Thu Jul 2 08:44:57 2015 | SETI@home | Resent lost task 19se12af.753.16836.438086664199.12.215_1
Thu Jul 2 08:44:57 2015 | SETI@home | Resent lost task 20au12ag.15429.18063.438086664201.12.128_0

Thu Jul 2 08:44:57 2015 | SETI@home | Project requested delay of 303 seconds
Thu Jul 2 08:44:57 2015 | SETI@home | [sched_op] estimated total CPU task duration: 883795 seconds
Thu Jul 2 08:44:57 2015 | SETI@home | [sched_op] handle_scheduler_reply(): got ack for task 11se12ab.7663.16427.438086664201.12.135_0
Thu Jul 2 08:44:57 2015 | SETI@home | [sched_op] Deferring communication for 00:05:03
Thu Jul 2 08:44:57 2015 | SETI@home | [sched_op] Reason: requested by project
Thu Jul 2 08:45:25 2015 | SETI@home | Started download of 13se12af.32371.215837.438086664207.12.127.vlar
Thu Jul 2 08:45:25 2015 | SETI@home | Started download of 19se12af.753.16836.438086664199.12.153
Thu Jul 2 08:45:28 2015 | SETI@home | Finished download of 13se12af.32371.215837.438086664207.12.127.vlar
Thu Jul 2 08:45:28 2015 | SETI@home | Finished download of 19se12af.753.16836.438086664199.12.153
Thu Jul 2 08:45:28 2015 | SETI@home | Started download of 19se12af.753.16836.438086664199.12.215
Thu Jul 2 08:45:28 2015 | SETI@home | Started download of 20au12ag.15429.18063.438086664201.12.128
Thu Jul 2 08:45:36 2015 | SETI@home | Finished download of 19se12af.753.16836.438086664199.12.215
Thu Jul 2 08:45:36 2015 | SETI@home | Finished download of 20au12ag.15429.18063.438086664201.12.128


Claggy
ID: 1799261 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1799284 - Posted: 29 Jun 2016, 2:51:42 UTC - in response to Message 1799261.  

Hey all,
Thanks for the replies and input. Being fairly new to this forum, I certainly didn't want to stir up some old stuff that was controversial in the past.

In the spirit of clarity, I would like to point out that I am looking for a "fairly simple way" of swapping x WUs assigned to GPU with x WUs assigned to GPU so that there is always a 1:1 swap. Here's the scenario I was think of:

Already in client cache: 100 WUs assigned to GPU, and 100 WUs assigned to CPU
Of those assigned to GPU, b are of type: blc...guppi...vlar
Of those assigned to CPU, a are of type: Arecibo ##mm##...

if b>a: swap(a)
else: swap(b)


From what I understand of Claggy's post, it doesn't seem "fairly simple" and also it doesn't seem to put a constraint on a 1-to-1 swap.

Is there a reason why there isn't something being done on the server side?
To me it seems like a simple scheduling exercise taught in a CS201 courses! lol

Cheers,
Rob :-}
ID: 1799284 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1799286 - Posted: 29 Jun 2016, 3:01:38 UTC - in response to Message 1799284.  

In the spirit of clarity, I would like to point out that I am looking for a "fairly simple way" of swapping x WUs assigned to GPU with x WUs assigned to GPU so that there is always a 1:1 swap.
...
To me it seems like a simple scheduling exercise taught in a CS201 courses!


What I have so far working on Linux is a simple commmand-line app. It does do an almost 1:1... it moves all the non-VLAR non-GUPPI MB work units assigned to the CPU to the GPU. It then moves up to that many GPU-assigned GUPPI work units to CPU assignment. The "up to" keeps the queue from growing without limit, and the faster GPU will run down a slightly overloaded MB queue. It's required to quit BOINC manager first and restart it after so that the files are not in use.

Here is what the output currently looks like on Linux moving actual work around:

kevin@KevsNewToy /media/kevin/SSD/BOINC $ ./GUPPIRescheduler
Mr. Kevvy's GUPPI Rescheduler v0.3 - (c)2016 Kevin Dorner

Reading configuration files...
Found sched_request app_version for GPU=1 version_num=800, CPU=0 version_num=800 and plan_class=opencl_nvidia_sah
Searching for and moving work units in client state...
Matching client state changes to scheduler files...
Writing updated configuration files...
Done: 2 non-GUPPI workunits moved to GPU and 2 GUPPI workunits moved to CPU.

ID: 1799286 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1799300 - Posted: 29 Jun 2016, 4:37:13 UTC

For those interested in some hard data in regard to rescheduling VLARs from GPU to CPU and non-VLARs from CPU to GPU, the table below contains a small sample of tasks from two of my machines.

Yes, I've been rescheduling VLARs and non-VLARs for a little over a week now, after feeling like I had waited long enough for project administration to implement some sort of VLARs-to-CPUs, non-VLARs-to-GPUs option. I'm satisfied with the results so far and can see no more impact to the credits than is caused by the random number generator that assigns them in the first place.

VLAR Rescheduling - Credit Comparison
Host: 6942834 (AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ w/ NVIDIA GeForce GT 630)]
VLARs
   Run on GPU (as originally scheduled)  |    Run on CPU (as originally scheduled)    |    Run on CPU (rescheduled from GPU)			
Task #	      AR      Run Time	Credits	 |  Task #         AR      Run Time  Credits  |  Task #        AR      Run Time  Credits
4993528119  0.009093   2:11:09  118.50   |  5008922033   0.008328   5:21:27   82.30   |  5008922165  0.008749   5:06:20  100.32
4993528118  0.009093   2:07:35  117.12	 |  4996046222   0.011193   5:34:10   98.26   |  5008448553  0.008749   5:26:16   91.73
4993205931  0.009236   2:06:24  126.80	 |  4995601343   0.008251   5:01:52  102.82   |  4996408050  0.009326   4:52:23  118.85
4993206390  0.009236   2:18:06   88.43	 |  4994390993   0.008735   5:03:36  100.12   |  4999213561  0.008251   5:07:44  106.11

Non-VLARs (approx. normal AR)
   Run on GPU (as originally scheduled)  |     Run on CPU (as originally scheduled)   |    Run on GPU (rescheduled from CPU)			
Task #	      AR      Run Time  Credits  |  Task #         AR      Run Time  Credits  |  Task #        AR      Run Time  Credits
5008448355  0.416306   1:10:49  111.53   |  4992035680   0.422157   5:49:21  103.52   |  5008150151  0.416043   1:30:00  109.70
5008245540  0.416235   1:11:23  118.92   |  4990401265   0.410352   7:36:01  122.05   |  4995310479  0.425248   1:09:50  100.30
5008150014  0.416043   1:14:04  114.88   |  4990223694   0.426270   6:51:07  106.43   |  (no more available in same approx. AR)			
4997635645  0.405997   1:14:47  107.64   |  4989815754   0.426296   6:35:29  113.67

Host: 7057115 (Intel(R) Xeon(R) CPU E5430 Dual Quad Core - 3 GPUs but using only NVIDIA GTX 670 in examples)]
VLARs
   Run on GPU (as originally scheduled)  |    Run on CPU (as originally scheduled)      |    Run on CPU (rescheduled from GPU)			
Task #	      AR      Run Time	Credits	 |  Task #         AR      Run Time  Credits  |  Task #        AR      Run Time  Credits
5009326355  0.011098   0:59:30   67.35   |  4998596435   0.008582   3:29:24 	95.90   |  5005715436  0.009883   2:53:42  174.39
5009319007  0.011107   0:52:38   73.52   |  4998573969   0.007367   3:41:18 	81.81   |  5005696056  0.009883   3:00:58   75.95
5009318351  0.011107   1:12:54  143.42   |  4998573960   0.008582   3:36:21 	81.09   |  5005696054  0.009883   3:00:39   77.07
5009318998  0.011107   1:35:08   90.18   |  4998573956   0.008582   3:24:56 	76.55   |  5005602376  0.007995   2:57:01   90.37

Non-VLARs (approx. normal AR)
   Run on GPU (as originally scheduled)  |     Run on CPU (as originally scheduled)   |    Run on GPU (rescheduled from CPU)			
Task #	      AR      Run Time  Credits  |  Task #         AR      Run Time  Credits  |  Task #        AR      Run Time  Credits
5009318949  0.414921   0:35:37   81.11   |  4995157224   0.442436   5:03:02  101.95   |  4999189921  0.408052   0:30:16  105.91
5005650157  0.426203   0:28:36   53.68   |  4995150139   0.442730   5:19:15  118.65   |  4998551166  0.431675   0:25:45	 116.46
5005556435  0.426150   0:28:50  111.66   |  4994207188   0.422130   5:36:43  127.21   |  4997635293  0.405690   0:30:25	 110.20
5005470004  0.423057   0:26:44  115.31   |  4994151453   0.422029   5:10:36   85.22   |  4996909612  0.442163   0:27:01	  59.37

All tasks used for the comparisons achieved normal completion (i.e., no -9 overflows). All GPU tasks were run using cuda50. In fact, all my hosts are running either cuda50 or cuda42, making the rescheduling much simpler, since both CPU and GPU apps currently have the same version number (8.00).

I'm not going to try to do any statistical analysis of those credit numbers (good luck to anyone who wants to try), but I think it should be clear that the range of credits granted to tasks run as originally scheduled is little different than the range for those that were rescheduled.

Most of the tasks listed have already dropped off the BOINC database, but I still have all the task detail pages in my archives, if anyone's interested in more detail on any of them.
ID: 1799300 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1799439 - Posted: 29 Jun 2016, 19:30:37 UTC - in response to Message 1799286.  

My concern with non-1:1 swapping is because I see 3 general scenarios.
First, let's assume a Computer with: 1 mid-range nVidia GPU where the GPU (supported by 1 CPU core) has about the same productivity as the rest of the CPU.

A = # of non-vlars assigned to CPU (aka Arecibo)
B = # of guppi assigned to GPU (aka Blc...)

Scenarios:
    1. A = B somewhat (give or take 10% or so)
    2. A > B
    3. A < B

I am not concerned with Scenario 1, and in Kevin's Linux app description above, scenario 3 seems to taken care of with the "up to".
As for scenario 2, what if A = dozens and B = 0 and the script is run too often.
As for scenario #3, is there a chance that the GPU never gets new WUs assigned to it by the server because the local script/app keeps the GPU local cache with more than 100 WUs at all times? If so, then overall wouldn't that computer be sent fewer guppis since (from my observations) there seems to be no predictability as to long runs of only As on CPU and mostly Bs assigned to GPU?

I'm asking the last 2 points as questions since I don't know what happens on the server side when a task report is sent back if it had been assigned to the CPU originally but gets processed by the GPU.

Another concern I have is with the assumption that the GPU cache has a quicker turnaround. On most heavy crunching machines, that is usually the case, but there could be situations where someone puts an older GPU into a new PC since they might be upgrading their primary rig incrementally based on available funds over time.

I might be biased in favour of the 1:1 swap since I don't know enough about what happens on the server side and the client side...but it does seem safer for a v1.0 of such an app.

Looking forward to learning some more from all your replies!
Cheers,
Rob ;-)


ID: 1799439 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1799476 - Posted: 29 Jun 2016, 21:44:06 UTC - in response to Message 1799439.  

I can only tell you what my rather limited experience has been over the last 9 days or so, and that is that your scenario #3 is almost always the one that I've run into. I think only once have I made a rescheduling run where there were more non-VLARs assigned to CPU than there were VLARs assigned to GPU, and the difference was only a couple of tasks. Usually, it's greatly tilted in the other direction. It's impossible to know whether that will remain generally true over the long haul or not, but Breakthrough Listen seems to be generating far more new data from Greenbank than what little we've gotten from Arecibo.

Now, my little app, which appears to work just fine for my 5 GPU-enabled boxes running Cuda, isn't as sophisticated as what it appears Mr. Kevvy is developing and is not something I plan to build up and distribute. It does not, for instance do the approximate 1:1 replacement that he's doing and which, I think is an excellent approach. I simply move all currently available VLARs from GPU to CPU and all non-VLARs from CPU to GPU. Since your scenario #3 is almost always in play, that tends to create a temporary imbalance with an overage of tasks on the CPU and an underage on the GPU.

I usually maintain no more than a 2-day work buffer on any of my boxes so, on my two dual-core boxes with GT 630s, which are never even close to any queue limits, that imbalance is never a big deal and corrects itself very quickly. I find that rescheduling once, or occasionally twice, a day works best. On my dual-core with a GTX 550Ti, I think I've exceeded the 100-task CPU limit once, for a short time. (In fact, right now that machine has just 43 tasks on the CPU, all VLARs. Unfortunately, out of the 100 tasks on the GPU, 72 of those are also VLARs, which I'm just going to let run for now. I don't reschedule until the scheduler starts requesting more CPU tasks.)

My two big boxes, one with 3 GPUs and one with 4 GPUs, are a different story. With all those GPUs to reschedule "from" (at 100 tasks per GPU), the CPU queue is almost certain to go over the 100-task limit with all the VLARs. I don't see that as a significant problem, as long as I just let it steadily drain back down before I make another rescheduling run. It appears that one run every 4-5 days is about right. Of course, that means that the GPUs still end up processing a lot of #@#$%#$^ VLARs, but at least I know that they're also doing all of the non-VLARs. Again, I don't make a rescheduling run at least until the scheduler starts requesting CPU work again. Implementing more of a 1-for-1 swap approach would certainly avoid the temporary imbalance in the queue, but in the end wouldn't actually change the overall CPU/GPU balance and would just require more frequent rescheduling runs.

Anyway, I'd really prefer to see the project treat VLAR MBs as an entirely different data class, alongside non-VLAR MBs and APs, which could be selected and scheduled in the same manner that those two are. That would avoid this whole rescheduling conundrum, but I finally got tired of waiting for that to happen. I figured I'd be old and gray by then.....oh, wait....too late!
ID: 1799476 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1799501 - Posted: 29 Jun 2016, 22:56:35 UTC - in response to Message 1799476.  

Wow! Thanks Jeff for sharing :-D
I thought going over the 100 limit might be a problem. I guess not!
Also, sounds like any type of swapping is not very beneficial for multiple GPU heavy crunchers.

Any thoughts about how many GPUs (like the GTX 750 Ti, which I have 2 of) might be the limit in an average box (say: quad cores with or w/o HT) in order to hit a sweet zone where swapping of As >= Bs most of the time? Obviously it depends on what the server throws at your rig (which doesn't seem to be very predictable).

The reason I'm asking is because the GTX 750 Ti seems to be one of the best value GPUs and it doesn't require an extra power source. Usually, if the mobo has 2 PCIe 16x slots avail, it's fairly easy to throw in 2 GTX 750 Ti
...and within a month you have a cruncher in the top 1000 (as per RAC rank) since it currently only requires ~13.5k.

Obviously the situation will likely be different early next year with all the new GPUs to be sold at the end of the year to support VR.

R ;-)
ID: 1799501 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1799522 - Posted: 30 Jun 2016, 0:16:56 UTC - in response to Message 1799501.  

I don't think temporarily going over the 100 task limit by a modest amount should be a problem, but others may differ on that point. In a sense, I can justify it by the fact that I also have 3 crunchers which never come anywhere near that limit, and one that rarely does. For instance, my daily driver here currently only has 10 tasks assigned to the CPU. Even when I next make a rescheduling run (probably shortly after I write this) it appears that only 9 additional tasks will be added to the CPU, still 81 tasks under that arbitrary limit. I also have an old laptop running, with no GPU, that currently has just 7 tasks assigned, 93 tasks under the limit. Now, I could certainly bump up my work buffer to 10+ days, like some SETIzens do, and still be under 100, but I won't. So, I don't feel particularly guilty about having my two big crunchers temporarily exceed that limit after a rescheduling run.

<soapbox> Besides, I think many folks here feel like the CPU limit should be a core-based limit, rather than a host-based limit. And even if that limit is much less than 100 (50/40/25/whatever), an 8-core system should still be able to have more than 100 tasks ready to run on the CPUs.</soapbox>

I wouldn't say that swapping isn't beneficial for multiple GPU heavy crunchers. If I use RAC as a measure (tenuous, at best, I know, but...), even my big boxes have increased about 5-6% in the last week. Now, one of my concerns with rescheduling was that the rescheduling itself (quantum rescheduling?) would skew the credits and render that measure more unreliable than usual. That's why I did the comparisons shown in my post last evening, which I believe show no material effect on credits.

As far as finding a combination of CPUs and GPUs which would render rescheduling a (carbon) neutral endeavor, I haven't a clue. Certainly, you could look at your existing setup and physically count how many of each would theoretically be swapped in a rescheduling run at this moment. Then, perhaps do that once a day at the same time for a week and see how the mix changes. Then estimate how much additional processing time the CPU will be facing versus how much less the GPU will have. Then do math. Good luck! ;^)
ID: 1799522 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1799537 - Posted: 30 Jun 2016, 1:25:09 UTC - in response to Message 1799522.  

<soapbox> Besides, I think many folks here feel like the CPU limit should be a core-based limit, rather than a host-based limit.</soapbox>

Not really.
As it is CPUs already have much larger caches than GPUs.
What would be good is if the 100 WU limit was per socket, not per device. That would allow those with multi socket systems to be able to get through the weekly outages without running out of work.
Grant
Darwin NT
ID: 1799537 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1799692 - Posted: 30 Jun 2016, 16:45:33 UTC - in response to Message 1799286.  
Last modified: 30 Jun 2016, 16:52:16 UTC

Matching client state changes to scheduler files...

If this means your script edits sched_request_setiathome.berkeley.edu.xml
- I don't think it is needed - BOINC creates new sched_request_ file (without reading the previous) before every scheduler request ([Update])
(i.e. BOINC never reads/parse sched_request_ files (you may even delete them), only writes to them.)

The only thing your script need to analyse and edit is client_state.xml
(the analyse for <app_version> ... <version_num> <plan_class> have to be in client_state.xml and not in sched_request_*)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1799692 · Report as offensive
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 1799694 - Posted: 30 Jun 2016, 16:53:21 UTC - in response to Message 1799692.  
Last modified: 30 Jun 2016, 16:53:35 UTC

Matching client state changes to scheduler files...

The only thing your script need to edit is client_state.xml


I will try that out... early in testing I found it was losing work when client_state was out of synch with sched_request and sched_reply but that may have been another cause then. Thank you.
ID: 1799694 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1799697 - Posted: 30 Jun 2016, 17:05:09 UTC - in response to Message 1799694.  

... early in testing I found it was losing work when client_state was out of synch with sched_request and sched_reply but that may have been another cause then. Thank you.

'Request' should certainly be irrelevant, but the client needs to read the 'reply' file and merge the contents into the next written copy of client_state.xml

client_state.xml is the vital file, and it is only read once, as the client starts up (after that, the state is held in memory only, and there is no direct way of editing that). So you have to ensure that the client has shut down and written the definitive state file before starting any editing.
ID: 1799697 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Is it possible to swap a guppi assigned to GPU with a Arecibo assigned to CPU?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.