BOINC 7.6.9 - Scheduler behaves strange.

Message boards : Number crunching : BOINC 7.6.9 - Scheduler behaves strange.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1741498 - Posted: 12 Nov 2015, 6:55:53 UTC

Hi there,

since ~2 days ago the scheduler from BOINC 7.6.9 behaves a little strange on my system:

I have set AP7-GPU-WUs to use 1 GPU + 0.5 CPU, so when running 2 GPU-AP7s there runs one additional AP7-CPU-task on my CORE 2 Duo. Since about 2 days ago, there are running 2(!) CPU tasks fighting each other for the remaining CPU. I changed nothing in the app_config.xml, it is time stamped August, 12th 2015.

So what happens here?! :?
Aloha, Uli

ID: 1741498 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1741559 - Posted: 12 Nov 2015, 16:02:40 UTC - in response to Message 1741498.  

Just a guess:

BOINC thinks that those two AP's are in danger to miss deadline, so it has put those two in High Priority -mode.

I don't know how to view in BOINC Manager are those AP's in HP-mode.

I do use Efmer BoincTasks, it does show HP-mode. (And lot's of more)
ID: 1741559 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1741639 - Posted: 12 Nov 2015, 23:35:15 UTC

Well, no, this never happened before...

Nobody else have any helpful answer?
Oh dear... :/
Aloha, Uli

ID: 1741639 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1741657 - Posted: 13 Nov 2015, 1:24:05 UTC

Ulrich Metzner mmm you say you have it set for 1 GPU + 0.5 CPU not shore what you mean my client says

Running (0.04 CPU's + 0.5 GPU's)

I am running 2 units per GPU but the difference should say this on yours if your only doing one unit per gpu


Running (0.04 CPU's + 1 GPU's) so I'm thinking yu have changed the wrong setting

<app_version>
<app_name>astropulse_v7</app_name>
<version_num>710</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_100</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>0.5</count>
</coproc>


Have you changed the <avg_ncpus>0.04</avg_ncpus> and <max_ncpus>0.2</max_ncpus>
setting to 1 and not 0.04 and 0.2 if so you are asking it to use the whole core not just 4% of it and that would mean your asking it to do 3 units on a 2 cored system .

Sorry if I misunderstood what your asking but it needs a bit better expaination form you if I have misunderstood it
ID: 1741657 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1741690 - Posted: 13 Nov 2015, 6:04:25 UTC - in response to Message 1741498.  

So you use:
<app_config>
   <app>
      <name>astropulse_v7</name>
      <gpu_versions>
         <gpu_usage>1.0</gpu_usage>
         <cpu_usage>0.5</cpu_usage>
      </gpu_versions>
   </app>
</app_config>


And you see 2 AstroPulse GPU tasks running - one per GPU (GeForce GT 640 + GeForce GT 430)?
And you also see 2 CPU tasks running (= total 4 tasks running)?


As a test for some rounding bug in BOINC you may try <cpu_usage>0.51</cpu_usage> or even <cpu_usage>0.99</cpu_usage>
(both 0.5 and 0.99 should free 1 core if 2 GPU tasks running)

Also you may set some log_flags to see what BOINC it "thinking", maybe:
<coproc_debug>
<cpu_sched>
<cpu_sched_debug>
cc_config.xml
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1741690 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1741715 - Posted: 13 Nov 2015, 10:02:19 UTC

Couple of debug flags as already stated. Cpu_sched_debug will show if EDF is involved. [though that only affects the order in which tasks are run not how many where]
Might want to add the gpu flags too.

Flags can now be conveniently set and unset via options event log options.

Which rig? Ah sorry, the xp one.

Finally did that happen with the upgrade to 7.6.9 or was it running ok at first?
Any other changes to the system? Drivers?
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1741715 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1741734 - Posted: 13 Nov 2015, 12:06:44 UTC

Hi there,

lots of good advises, thanks!
Meanwhile it stopped running 2 instances of CPU tasks and runs only 1.
I use configuration in app_info.xml as follows:

...
    <app_version>
        <app_name>astropulse_v7</app_name>
        <version_num>710</version_num>
        <platform>windows_intelx86</platform>
        <avg_ncpus>0.5</avg_ncpus>
        <max_ncpus>0.5</max_ncpus>
        <plan_class>cuda_opencl_100</plan_class>
        	<cmdline></cmdline>	
        <coproc>
            <type>CUDA</type>
            <count>1</count>
        </coproc>
        <file_ref>
            <file_name>AP7_win_x86_SSE2_OpenCL_NV_r2887.exe</file_name>
            <main_program/>
        </file_ref>
        <file_ref>
            <file_name>libfftw3f-3-3-4_x86.dll</file_name>
        </file_ref>
        <file_ref>
            <file_name>ap_cmdline_win_x86_SSE2_OpenCL_NV.txt</file_name>
            <open_name>ap_cmdline.txt</open_name>
        </file_ref>
        <file_ref>
            <file_name>AstroPulse_NV_config.xml</file_name>
            <open_name>AstroPulse_NV_config.xml</open_name>
        </file_ref>
    </app_version>
...

I found out, i could influence the number of CPU threads by modifying the cache size. If i used 10 days it ran 2 CPU tasks and when i used lower settings of ~5 days it ran only 1 task. Also setting CPU usage per GPU task to 0.51 solved the problem. :?
Aloha, Uli

ID: 1741734 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1741753 - Posted: 13 Nov 2015, 14:12:36 UTC - in response to Message 1741734.  

I found out, i could influence the number of CPU threads by modifying the cache size. If i used 10 days it ran 2 CPU tasks and when i used lower settings of ~5 days it ran only 1 task.


Sounds like a bug that mite have to be reported .
ID: 1741753 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1741762 - Posted: 13 Nov 2015, 14:51:43 UTC

Ulrich Metzner I am doing a test with your cpu settings and I can see already if anything it has slow'd things down .

I believe the CPU only passes data to and from the GPU and they are small amounts of data at a time the GPU does all the work .

So having 1 core to do this is wasting the core as it sits there doing not much until it has to pass another bit of data to the GPU .

Having it set to 1 means you can't do 2 units per GPU with 1 core . Even thou you may have a older CPU I don't think it's so slow you need to use any more than 0.2 or 20 % .

So I would change the setting back to 0.2 for <avg_ncpus>0.2</avg_ncpus> to compensate for having a older CPU (being safe here ) and change the <max_ncpus>??</max_ncpus> max amount the CPU can use to 0.2 or 0.5 all thou I think 0.5 is a waste of cycles

You will then be able to do 2 units per GPU and none on the CPU or 1 unit on each GPU leaving 1 CPU core for other things or to do 1 unit on the CPU

I'm shore if I'm wrong someone will correct me .

I'm changing mine back to default values I can't see any or much improvement with the units it's just done .
ID: 1741762 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1741769 - Posted: 13 Nov 2015, 15:04:17 UTC - in response to Message 1741734.  
Last modified: 13 Nov 2015, 15:13:08 UTC

I found out, i could influence the number of CPU threads by modifying the cache size. If i used 10 days it ran 2 CPU tasks and when i used lower settings of ~5 days it ran only 1 task. Also setting CPU usage per GPU task to 0.51 solved the problem. :?

Which bit of the cache size - the first 'keep at least' setting (which incidentally for historical reasons is used by boinc as the old 'connect at least every x days') or the second 'additional work' ?

if you used the first, then you probably sent BOINC into EDF - the 'high priority' run state message you used to get then was removed - these days you need to use cpu_sched_debug to check for EDF.
Running EDF might make BOINC use all cores regardless of how much you reserved for the GPUs.

The 0.51 oddity has something to do with the way the adding up to reserve a full CPU works IIRC. Richard may remember more.

NB Boinc only 'reserves' a core when the combined CPU usage as specified exceeds 1 (or 2, 3 etc.) boinc then reduces the number of CPU tasks accordingly. Having 2 GPU tasks with 0.2 CPU each doesn't do anyting at all.
Whether it is actually necessary to have a free core for feeding the GPUs and what the performance increase might be if you do, is a completely different question.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1741769 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1741786 - Posted: 13 Nov 2015, 15:48:44 UTC
Last modified: 13 Nov 2015, 15:53:31 UTC

@Glenn:
When i use both cores for CPU crunching, the GPUs starve at ~60% usage, so the system throughput is falling. If i crunch 2 WUs per GPU the system begins to lag and video streaming is a PITA.

@William:
Yes, i set the first "Store at least xx days of work" setting for this. Maybe i sent BOINC into EDF, can't reproduce anymore now. I want to keep one core for feeding the GPUs for the reasons mentioned above.

[edit]
That's also the reason, i keep using driver version 337.88 on this machine. All newer drivers screw up the well tuned point of maximum performance and stutter free video streaming. See here also:
http://setiathome.berkeley.edu/forum_thread.php?id=77227&postid=1672992#1672992
Unfortunately all pictures are gone... :/
Aloha, Uli

ID: 1741786 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1741935 - Posted: 14 Nov 2015, 3:02:20 UTC - in response to Message 1741786.  
Last modified: 14 Nov 2015, 3:34:55 UTC

Ulrich Metzner

When i use both cores for CPU crunching, the GPUs starve at ~60% usage


Yes this is correct it will starve , you only have 2 cores so doing AP's on them means you can't really do any on the GPU as you do need at least 1 core for them . So only do 1 unit on each GPU and that should allow you to still have 1 core to either do a unit or do other things .

MB's it's not such a problem with the newer versions of Bionic .

A core 2 is not really up to the task for 2 GPU's you just don't have enough cores .

My second machine is a core 2 Quad and I only run 1 GPU a gtx 650 on it and I only do 2 units on the GPU and 2 units on the CPU as I've noticed if I try to use a 3rd core the GPU utilisation goes down about 3% on the GPU so even thou I have 4 cores even I'm limited .

I can use all the cores and 2 units on the GPU and it will hang .

I will be putting another GTX650 in that machine but once I do I will not be doing units on the CPU only on the GPU there just will not be enough cores to do any on the CPU .

Changing the <avg_ncpus>0.2</avg_ncpus> will not help you much . if I was you I would try using a cmdline and a single unit on the GPU you may find that will give you more throughput . (I can't believe I just said that I agreed with someone :-) )

Or maybe it's time to upgrade and if you look on Ebay you may well be able to get a core 2 Qadd and have 4 cores and it should not cost much more than $60 ozzie 50 euro a cheap upgrade and will allow you to use the GPU's better .

EDIT: Look at the R.A.C of the Core 2 Quad and it has not topped out as I only changed back to crunching 2 weeks ago . todays R.A.C 6996 and that's gone down in last couple of days because of the server problems .
ID: 1741935 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1741962 - Posted: 14 Nov 2015, 5:04:50 UTC

@Ulrich

I think your problem is RAM, 2 GB for 4 tasks is really pushing it.

I can't keep my AMD 4200+ 750TI full, with 1 GB of RAM ... it seems fine with MB task, but it is REALLY slow with AP tasks ... compared to my i5.
ID: 1741962 · Report as offensive
Darth Beaver Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 20 Aug 99
Posts: 6728
Credit: 21,443,075
RAC: 3
Australia
Message 1741973 - Posted: 14 Nov 2015, 6:09:01 UTC

Brent I can see your i5 times suggest your doing 3 - 4 units on the 750 GPU .

This machine HAS 4 CORES so yes you can do up to 3 Ap's on that one (better 2)

Your AMD HAS 2 CORES hence it will be slow doing AP's if your asking it to do more than 2 units .

Ulrich

The Ram you have could be a problem it's been a while since I used XP so maybe up it to 4 gig , above that you will need to upgrade the operating system to Win 7 at least .
ID: 1741973 · Report as offensive
Ulrich Metzner
Volunteer tester
Avatar

Send message
Joined: 3 Jul 02
Posts: 1256
Credit: 13,565,513
RAC: 13
Germany
Message 1742019 - Posted: 14 Nov 2015, 12:54:59 UTC

Hi there,

i found out BOINC runs in EDF:

...
14/11/2015 13:37:21 |  | [cpu_sched_debug] Request CPU reschedule: Core client configuration
14/11/2015 13:37:21 |  | [cpu_sched_debug] schedule_cpus(): start
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] add to run list: ap_19oc11ae_B0_P0_00052_20151112_02655.wu_0 (NVIDIA GPU, FIFO) (prio -1.000000)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] add to run list: ap_19oc11ae_B0_P1_00052_20151112_02694.wu_0 (NVIDIA GPU, FIFO) (prio -1.020745)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] add to run list: ap_20oc11aa_B3_P0_00232_20151031_30892.wu_0 (CPU, EDF) (prio -1.041491)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] add to run list: ap_25se11ah_B0_P0_00344_20151031_03356.wu_1 (CPU, EDF) (prio -1.041667)
14/11/2015 13:37:21 |  | [cpu_sched_debug] enforce_run_list(): start
14/11/2015 13:37:21 |  | [cpu_sched_debug] preliminary job list:
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] 0: ap_19oc11ae_B0_P0_00052_20151112_02655.wu_0 (MD: no; UTS: no)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] 1: ap_19oc11ae_B0_P1_00052_20151112_02694.wu_0 (MD: no; UTS: yes)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] 2: ap_20oc11aa_B3_P0_00232_20151031_30892.wu_0 (MD: yes; UTS: no)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] 3: ap_25se11ah_B0_P0_00344_20151031_03356.wu_1 (MD: yes; UTS: no)
14/11/2015 13:37:21 |  | [cpu_sched_debug] final job list:
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] 0: ap_20oc11aa_B3_P0_00232_20151031_30892.wu_0 (MD: yes; UTS: no)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] 1: ap_25se11ah_B0_P0_00344_20151031_03356.wu_1 (MD: yes; UTS: no)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] 2: ap_19oc11ae_B0_P1_00052_20151112_02694.wu_0 (MD: no; UTS: yes)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] 3: ap_19oc11ae_B0_P0_00052_20151112_02655.wu_0 (MD: no; UTS: no)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] scheduling ap_20oc11aa_B3_P0_00232_20151031_30892.wu_0 (high priority)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] scheduling ap_25se11ah_B0_P0_00344_20151031_03356.wu_1 (high priority)
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] scheduling ap_19oc11ae_B0_P1_00052_20151112_02694.wu_0
14/11/2015 13:37:21 | SETI@home | [cpu_sched_debug] scheduling ap_19oc11ae_B0_P0_00052_20151112_02655.wu_0
14/11/2015 13:37:21 |  | [cpu_sched_debug] enforce_run_list: end
...


RAM shouldn't be an issue, the machine has 4GB of it but the 2 graphics cards eat up RAM because their memory space is mapped in the "real" 4GB address space.
See screenshots from task manager:

This system runs comfortably with the status quo.
Aloha, Uli

ID: 1742019 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1742024 - Posted: 14 Nov 2015, 13:36:25 UTC

Your cache days is to large! BOINC will take over your CPU even if you are limiting it ... it says, I NEED TO RUN THESE, before deadline.

I have seen this before on my AMD, BONIC completely took over when my cache qas to big.

Set your cache to a realistic value, I think 3 days should be good for that computer.
ID: 1742024 · Report as offensive
ChrisD
Volunteer tester

Send message
Joined: 25 Sep 99
Posts: 158
Credit: 2,496,342
RAC: 0
Denmark
Message 1742025 - Posted: 14 Nov 2015, 13:36:42 UTC
Last modified: 14 Nov 2015, 13:38:28 UTC

This is not the only thing that does not work with 7.6.9.

Project: SETI@Home, Resource share 600
Project: SETI@HomeBeta, Resource share 100.

This setting should give me 1 Hour of crunching for Beta and 6 Hours for SETI@Home.

Here is what happened:

As soon as I enabled the Beta Project, The machine started crunching Beta tasks and has been doing this for a solid 10 Hours, before I decided to take action.

All SETI@Home tasks were suspended. This seems OK, and with settings like (switch tasks every 30 Minutes), these should have been suspended after 30 mins and normal SETI tasks should resume.

The only way I can get normal SETI tasks active is by stopping work fetch on Beta and then suspend all beta tasks.

??

ChrisD
ID: 1742025 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1742035 - Posted: 14 Nov 2015, 14:39:45 UTC

This is not the only thing that does not work with 7.6.9.

Project: SETI@Home, Resource share 600
Project: SETI@HomeBeta, Resource share 100.

This setting should give me 1 Hour of crunching for Beta and 6 Hours for SETI@Home.


While one might expect that to be the case it hasn't been so for a good few years (and many versions of BOINC).
The project scheduler part of BOINC works on a "work achieved balance" not a time slice. If the share between two projects is set at 6:4 and you start with a work achieved balance of 5:5 then the scheduler will preferentially run the first project until the balance is approximately 6:4, then it will run the other project for a bit, and swap about so that in the long term the work achieved balance is maintained at approximately 6:4.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1742035 · Report as offensive
ChrisD
Volunteer tester

Send message
Joined: 25 Sep 99
Posts: 158
Credit: 2,496,342
RAC: 0
Denmark
Message 1742093 - Posted: 14 Nov 2015, 17:42:10 UTC - in response to Message 1742035.  

If I understand You right, the project scheduler will keep running Beta tasks until the score reaches 1/6 of my SETI@Home score.

This may take a while :)

So, what I do, I simply suspend SETI@Home a couple of Hours every Day. During this time the computer will chew its way through Beta tasks.

Maybe some time in the future this ratio could be CPU cores. This way my 8 core CPU would crunch 6 Standard SETI tasks and 2 Beta tasks.

Wishful thinking :)

Thanks for Your answer and happy crunching.

ChrisD
ID: 1742093 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1742094 - Posted: 14 Nov 2015, 17:47:02 UTC

Err, just about, it is actually some sort of rolling average, but I'm not sure of the time base for the average, not "all time".
I agree, it would be nice if it were some sensible time slice as that would be far easier to understand and explain.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1742094 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : BOINC 7.6.9 - Scheduler behaves strange.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.