All CPU tasks not running. Now all are: - "Waiting to run"

Questions and Answers : Unix/Linux : All CPU tasks not running. Now all are: - "Waiting to run"
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969035 - Posted: 7 Dec 2018, 2:12:01 UTC

OK, after changing my cc_config to exclude the new 2080 card from both Einstein and GPUGrid, now only the four gpus are running tasks. None of the cpu cores are running cpu tasks.

cpu_sched_debug shows only "using 4 of out 24 CPUs"

So what has happened to the cpu tasks? Did the exclude of the 2080 card shift work away from the cpus?
Even suspending all other projects other than Seti won't get the cpu tasks already started running again.

Anyone have a clue what has happened on the host?
https://setiathome.berkeley.edu/show_host_detail.php?hostid=8030022
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969035 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969039 - Posted: 7 Dec 2018, 2:40:51 UTC

If I remove my exclusion statements from cc_config.xml then I am able to run cpu tasks normally. So how do I exclude the RTX 2080 from Einstein and GPUGrid projects and still run Seti cpu tasks?
This are my exclusion statements.

    <exclude_gpu>
        <url>http://einstein.phys.uwm.edu/</url>
        <device_num>0</device_num>
        <app>hsgamma_FGRPB1G</app>
    </exclude_gpu>
    <exclude_gpu>
        <url>http://www.gpugrid.net/</url>
        <device_num>0</device_num>
        <app>acemdshort</app>
    </exclude_gpu>
    <exclude_gpu>
        <url>http://www.gpugrid.net/</url>
        <device_num>0</device_num>
        <app>acemdlong</app>
    </exclude_gpu>


The exclusions are picked up in the Event Log properly with no syntax errors.

06-Dec-2018 18:29:27 [---] OS: Linux Ubuntu: Ubuntu 18.04.1 LTS [4.15.0-42-generic]
06-Dec-2018 18:29:27 [---] Memory: 31.34 GB physical, 2.00 GB virtual
06-Dec-2018 18:29:27 [---] Disk: 227.74 GB total, 204.38 GB free
06-Dec-2018 18:29:27 [---] Local time is UTC -8 hours
06-Dec-2018 18:29:27 [Einstein@Home] Found app_config.xml
06-Dec-2018 18:29:27 [GPUGRID] Found app_config.xml
06-Dec-2018 18:29:27 [Milkyway@Home] Found app_config.xml
06-Dec-2018 18:29:27 [SETI@home] Found app_config.xml
06-Dec-2018 18:29:27 [---] Config: GUI RPC allowed from any host
06-Dec-2018 18:29:27 [Einstein@Home] Config: excluded GPU.  Type: all.  App: hsgamma_FGRPB1G.  Device: 0
06-Dec-2018 18:29:27 [GPUGRID] Config: excluded GPU.  Type: all.  App: acemdshort.  Device: 0
06-Dec-2018 18:29:27 [GPUGRID] Config: excluded GPU.  Type: all.  App: acemdlong.  Device: 0
06-Dec-2018 18:29:27 [---] Config: use all coprocessors

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969039 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1969090 - Posted: 7 Dec 2018, 14:32:25 UTC

I've run a very similar config for several years, with no problems:

    <exclude_gpu>
        <url>http://www.gpugrid.net/</url>
        <device_num>1</device_num>
        <type>NVIDIA</type>
    </exclude_gpu>
It starts up as you would expect:

18-Nov-2018 16:16:34 [---] Starting BOINC client version 7.14.2 for windows_x86_64
18-Nov-2018 16:16:34 [---] log flags: file_xfer, sched_ops, task, cpu_sched, sched_op_debug
18-Nov-2018 16:16:34 [---] Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
18-Nov-2018 16:16:34 [---] Data directory: D:\BOINCdata
18-Nov-2018 16:16:34 [---] Running under account Richard Haselgrove
18-Nov-2018 16:16:36 [---] CUDA: NVIDIA GPU 0: GeForce GTX 970 (driver version 373.06, CUDA version 8.0, compute capability 5.2, 4096MB, 3066MB available, 4087 GFLOPS peak)
18-Nov-2018 16:16:36 [---] CUDA: NVIDIA GPU 1: GeForce GTX 750 Ti (driver version 373.06, CUDA version 8.0, compute capability 5.0, 2048MB, 1904MB available, 1639 GFLOPS peak)
18-Nov-2018 16:16:36 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 970 (driver version 373.06, device version OpenCL 1.2 CUDA, 4096MB, 3066MB available, 4087 GFLOPS peak)
18-Nov-2018 16:16:36 [---] OpenCL: NVIDIA GPU 1: GeForce GTX 750 Ti (driver version 373.06, device version OpenCL 1.2 CUDA, 2048MB, 1904MB available, 1639 GFLOPS peak)
18-Nov-2018 16:16:36 [---] OpenCL: Intel GPU 0: Intel(R) HD Graphics 4600 (driver version 10.18.10.3621, device version OpenCL 1.2, 1298MB, 1298MB available, 192 GFLOPS peak)
18-Nov-2018 16:16:36 [---] OpenCL CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 3.0.1.10878, device version OpenCL 1.2 (Build 76413))

18-Nov-2018 16:16:36 [---] Processor: 4 GenuineIntel Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz [Family 6 Model 60 Stepping 3]

18-Nov-2018 16:16:36 [Einstein@Home] Found app_config.xml
18-Nov-2018 16:16:36 [GPUGRID] Found app_config.xml
18-Nov-2018 16:16:36 [SETI@home] Found app_config.xml

18-Nov-2018 16:16:36 [GPUGRID] Config: excluded GPU.  Type: NVIDIA.  App: all.  Device: 1
18-Nov-2018 16:16:36 [---] Config: use all coprocessors
The object of the exercise is to run GPUGrid only on the fastest card, but allow backup projects to run on that card if GPUGrid is out of work. The slight difference - I've excluded by <type> explicitly, and not worried about <app> - shouldn't make any difference, according to the manual:

<exclude_gpu>
Don't use the given GPU for the given project. If <device_num> is not specified, exclude all GPUs of the given type. <type> is required if your computer has more than one type of GPU; otherwise it can be omitted. <app> specifies the short name of an application (i.e. the <name> element within the <app> element in client_state.xml). If specified, only tasks for that app are excluded. You may include multiple <exclude_gpu> elements. If you change GPU exclusions, you must restart the BOINC client for these changes to take effect. If you want to exclude the GPU use for all projects, look at the <ignore_ati_dev>, <ignore_nvidia_dev> and <ignore_intel_dev> options further down. Requires a client restart.
The only curiosity occurs when I examine the scheduling decisions:

07-Dec-2018 13:32:00 [---] [cpu_sched_debug] schedule_cpus(): start
07-Dec-2018 13:32:00 [GPUGRID] [cpu_sched_debug] add to run list: e9s181_e7s90p0f68-PABLO_v2Q9UM73_MOR_14_IDP-0-2-RND5567_0 (NVIDIA GPU, FIFO) (prio -1.151271)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.156.vlar_0 (NVIDIA GPU, FIFO) (prio -2.615060)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85972_GJ687_0028.11957.818.22.45.238.vlar_0 (NVIDIA GPU, FIFO) (prio -2.717740)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_83308_HIP86141_0020.11735.818.22.45.96.vlar_0 (NVIDIA GPU, FIFO) (prio -2.820420)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_83308_HIP86141_0020.11746.818.21.44.96.vlar_1 (NVIDIA GPU, FIFO) (prio -2.923100)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.187.vlar_1 (NVIDIA GPU, FIFO) (prio -3.025780)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85972_GJ687_0028.28463.818.22.45.204.vlar_1 (NVIDIA GPU, FIFO) (prio -3.128460)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.182.vlar_0 (NVIDIA GPU, FIFO) (prio -3.231140)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.179.vlar_1 (NVIDIA GPU, FIFO) (prio -3.333820)
07-Dec-2018 13:32:00 [Einstein@Home] [cpu_sched_debug] reserving 1.000000 of coproc intel_gpu
07-Dec-2018 13:32:00 [Einstein@Home] [cpu_sched_debug] add to run list: p2030.20170408.G39.56+01.96.N.b1s0g0.00000_1540_1 (Intel GPU, FIFO) (prio -0.062618)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] add to run list: wu_sf5_DS-13x11_Grp157060of666667_0 (CPU, FIFO) (prio -0.019780)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] add to run list: wu_sf5_DS-13x11_Grp182717of666667_0 (CPU, FIFO) (prio -0.020064)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] add to run list: wu_sf5_DS-13x11_Grp160060of666667_0 (CPU, FIFO) (prio -0.020348)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] add to run list: wu_sf5_DS-13x11_Grp186020of666667_0 (CPU, FIFO) (prio -0.020632)
07-Dec-2018 13:32:00 [---] [cpu_sched_debug] enforce_run_list(): start
07-Dec-2018 13:32:00 [---] [cpu_sched_debug] preliminary job list:
07-Dec-2018 13:32:00 [GPUGRID] [cpu_sched_debug] 0: e9s181_e7s90p0f68-PABLO_v2Q9UM73_MOR_14_IDP-0-2-RND5567_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 1: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.156.vlar_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 2: blc13_2bit_guppi_58405_85972_GJ687_0028.11957.818.22.45.238.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 3: blc13_2bit_guppi_58405_83308_HIP86141_0020.11735.818.22.45.96.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 4: blc13_2bit_guppi_58405_83308_HIP86141_0020.11746.818.21.44.96.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 5: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.187.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 6: blc13_2bit_guppi_58405_85972_GJ687_0028.28463.818.22.45.204.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 7: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.182.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 8: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.179.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [Einstein@Home] [cpu_sched_debug] 9: p2030.20170408.G39.56+01.96.N.b1s0g0.00000_1540_1 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 10: wu_sf5_DS-13x11_Grp157060of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 11: wu_sf5_DS-13x11_Grp182717of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 12: wu_sf5_DS-13x11_Grp160060of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 13: wu_sf5_DS-13x11_Grp186020of666667_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [---] [cpu_sched_debug] final job list:
07-Dec-2018 13:32:00 [GPUGRID] [cpu_sched_debug] 0: e9s181_e7s90p0f68-PABLO_v2Q9UM73_MOR_14_IDP-0-2-RND5567_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 1: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.156.vlar_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [Einstein@Home] [cpu_sched_debug] 2: p2030.20170408.G39.56+01.96.N.b1s0g0.00000_1540_1 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 3: blc13_2bit_guppi_58405_85972_GJ687_0028.11957.818.22.45.238.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 4: blc13_2bit_guppi_58405_83308_HIP86141_0020.11735.818.22.45.96.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 5: blc13_2bit_guppi_58405_83308_HIP86141_0020.11746.818.21.44.96.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 6: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.187.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 7: blc13_2bit_guppi_58405_85972_GJ687_0028.28463.818.22.45.204.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 8: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.182.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 9: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.179.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 10: wu_sf5_DS-13x11_Grp157060of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 11: wu_sf5_DS-13x11_Grp182717of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 12: wu_sf5_DS-13x11_Grp160060of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 13: wu_sf5_DS-13x11_Grp186020of666667_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [GPUGRID] [cpu_sched_debug] scheduling e9s181_e7s90p0f68-PABLO_v2Q9UM73_MOR_14_IDP-0-2-RND5567_0
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] scheduling blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.156.vlar_0
07-Dec-2018 13:32:00 [Einstein@Home] [cpu_sched_debug] scheduling p2030.20170408.G39.56+01.96.N.b1s0g0.00000_1540_1
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] scheduling wu_sf5_DS-13x11_Grp157060of666667_0
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] scheduling wu_sf5_DS-13x11_Grp182717of666667_0
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] scheduling wu_sf5_DS-13x11_Grp160060of666667_0
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] all CPUs used (4.60 >= 4), skipping wu_sf5_DS-13x11_Grp186020of666667_0
07-Dec-2018 13:32:00 [---] [cpu_sched_debug] enforce_run_list: end
The client adds too many SETI tasks (to be run on device 1, the 750Ti) to the preliminary and final job lists, but is only allowed to schedule one of them.

I have seen cases where the provisional list is filled up with so many (too many) infeasible tasks that it never adds feasible ones from other projects:

Client: premature check for max_concurrent can starve resources #1677

but we're getting into very deep waters there.

I think I'd suggest:

1) Include a <type> tag in the GPU exclude (can't do any harm)
2) Post a fuller startup log, including the GPU detection lines and CPU core count
3) Stand by to post a <cpu_sched_debug> event log (just one cycle!) as above, if changing the exclude doesn't cure it.
ID: 1969090 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969112 - Posted: 7 Dec 2018, 17:58:34 UTC - in response to Message 1969090.  

I've run a very similar config for several years, with no problems:

    <exclude_gpu>
        <url>http://www.gpugrid.net/</url>
        <device_num>1</device_num>
        <type>NVIDIA</type>
    </exclude_gpu>
It starts up as you would expect:

18-Nov-2018 16:16:34 [---] Starting BOINC client version 7.14.2 for windows_x86_64
18-Nov-2018 16:16:34 [---] log flags: file_xfer, sched_ops, task, cpu_sched, sched_op_debug
18-Nov-2018 16:16:34 [---] Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
18-Nov-2018 16:16:34 [---] Data directory: D:\BOINCdata
18-Nov-2018 16:16:34 [---] Running under account Richard Haselgrove
18-Nov-2018 16:16:36 [---] CUDA: NVIDIA GPU 0: GeForce GTX 970 (driver version 373.06, CUDA version 8.0, compute capability 5.2, 4096MB, 3066MB available, 4087 GFLOPS peak)
18-Nov-2018 16:16:36 [---] CUDA: NVIDIA GPU 1: GeForce GTX 750 Ti (driver version 373.06, CUDA version 8.0, compute capability 5.0, 2048MB, 1904MB available, 1639 GFLOPS peak)
18-Nov-2018 16:16:36 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 970 (driver version 373.06, device version OpenCL 1.2 CUDA, 4096MB, 3066MB available, 4087 GFLOPS peak)
18-Nov-2018 16:16:36 [---] OpenCL: NVIDIA GPU 1: GeForce GTX 750 Ti (driver version 373.06, device version OpenCL 1.2 CUDA, 2048MB, 1904MB available, 1639 GFLOPS peak)
18-Nov-2018 16:16:36 [---] OpenCL: Intel GPU 0: Intel(R) HD Graphics 4600 (driver version 10.18.10.3621, device version OpenCL 1.2, 1298MB, 1298MB available, 192 GFLOPS peak)
18-Nov-2018 16:16:36 [---] OpenCL CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 3.0.1.10878, device version OpenCL 1.2 (Build 76413))

18-Nov-2018 16:16:36 [---] Processor: 4 GenuineIntel Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz [Family 6 Model 60 Stepping 3]

18-Nov-2018 16:16:36 [Einstein@Home] Found app_config.xml
18-Nov-2018 16:16:36 [GPUGRID] Found app_config.xml
18-Nov-2018 16:16:36 [SETI@home] Found app_config.xml

18-Nov-2018 16:16:36 [GPUGRID] Config: excluded GPU.  Type: NVIDIA.  App: all.  Device: 1
18-Nov-2018 16:16:36 [---] Config: use all coprocessors
The object of the exercise is to run GPUGrid only on the fastest card, but allow backup projects to run on that card if GPUGrid is out of work. The slight difference - I've excluded by <type> explicitly, and not worried about <app> - shouldn't make any difference, according to the manual:

<exclude_gpu>
Don't use the given GPU for the given project. If <device_num> is not specified, exclude all GPUs of the given type. <type> is required if your computer has more than one type of GPU; otherwise it can be omitted. <app> specifies the short name of an application (i.e. the <name> element within the <app> element in client_state.xml). If specified, only tasks for that app are excluded. You may include multiple <exclude_gpu> elements. If you change GPU exclusions, you must restart the BOINC client for these changes to take effect. If you want to exclude the GPU use for all projects, look at the <ignore_ati_dev>, <ignore_nvidia_dev> and <ignore_intel_dev> options further down. Requires a client restart.
The only curiosity occurs when I examine the scheduling decisions:

07-Dec-2018 13:32:00 [---] [cpu_sched_debug] schedule_cpus(): start
07-Dec-2018 13:32:00 [GPUGRID] [cpu_sched_debug] add to run list: e9s181_e7s90p0f68-PABLO_v2Q9UM73_MOR_14_IDP-0-2-RND5567_0 (NVIDIA GPU, FIFO) (prio -1.151271)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.156.vlar_0 (NVIDIA GPU, FIFO) (prio -2.615060)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85972_GJ687_0028.11957.818.22.45.238.vlar_0 (NVIDIA GPU, FIFO) (prio -2.717740)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_83308_HIP86141_0020.11735.818.22.45.96.vlar_0 (NVIDIA GPU, FIFO) (prio -2.820420)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_83308_HIP86141_0020.11746.818.21.44.96.vlar_1 (NVIDIA GPU, FIFO) (prio -2.923100)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.187.vlar_1 (NVIDIA GPU, FIFO) (prio -3.025780)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85972_GJ687_0028.28463.818.22.45.204.vlar_1 (NVIDIA GPU, FIFO) (prio -3.128460)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.182.vlar_0 (NVIDIA GPU, FIFO) (prio -3.231140)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.179.vlar_1 (NVIDIA GPU, FIFO) (prio -3.333820)
07-Dec-2018 13:32:00 [Einstein@Home] [cpu_sched_debug] reserving 1.000000 of coproc intel_gpu
07-Dec-2018 13:32:00 [Einstein@Home] [cpu_sched_debug] add to run list: p2030.20170408.G39.56+01.96.N.b1s0g0.00000_1540_1 (Intel GPU, FIFO) (prio -0.062618)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] add to run list: wu_sf5_DS-13x11_Grp157060of666667_0 (CPU, FIFO) (prio -0.019780)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] add to run list: wu_sf5_DS-13x11_Grp182717of666667_0 (CPU, FIFO) (prio -0.020064)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] add to run list: wu_sf5_DS-13x11_Grp160060of666667_0 (CPU, FIFO) (prio -0.020348)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] add to run list: wu_sf5_DS-13x11_Grp186020of666667_0 (CPU, FIFO) (prio -0.020632)
07-Dec-2018 13:32:00 [---] [cpu_sched_debug] enforce_run_list(): start
07-Dec-2018 13:32:00 [---] [cpu_sched_debug] preliminary job list:
07-Dec-2018 13:32:00 [GPUGRID] [cpu_sched_debug] 0: e9s181_e7s90p0f68-PABLO_v2Q9UM73_MOR_14_IDP-0-2-RND5567_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 1: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.156.vlar_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 2: blc13_2bit_guppi_58405_85972_GJ687_0028.11957.818.22.45.238.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 3: blc13_2bit_guppi_58405_83308_HIP86141_0020.11735.818.22.45.96.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 4: blc13_2bit_guppi_58405_83308_HIP86141_0020.11746.818.21.44.96.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 5: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.187.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 6: blc13_2bit_guppi_58405_85972_GJ687_0028.28463.818.22.45.204.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 7: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.182.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 8: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.179.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [Einstein@Home] [cpu_sched_debug] 9: p2030.20170408.G39.56+01.96.N.b1s0g0.00000_1540_1 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 10: wu_sf5_DS-13x11_Grp157060of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 11: wu_sf5_DS-13x11_Grp182717of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 12: wu_sf5_DS-13x11_Grp160060of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 13: wu_sf5_DS-13x11_Grp186020of666667_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [---] [cpu_sched_debug] final job list:
07-Dec-2018 13:32:00 [GPUGRID] [cpu_sched_debug] 0: e9s181_e7s90p0f68-PABLO_v2Q9UM73_MOR_14_IDP-0-2-RND5567_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 1: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.156.vlar_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [Einstein@Home] [cpu_sched_debug] 2: p2030.20170408.G39.56+01.96.N.b1s0g0.00000_1540_1 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 3: blc13_2bit_guppi_58405_85972_GJ687_0028.11957.818.22.45.238.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 4: blc13_2bit_guppi_58405_83308_HIP86141_0020.11735.818.22.45.96.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 5: blc13_2bit_guppi_58405_83308_HIP86141_0020.11746.818.21.44.96.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 6: blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.187.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 7: blc13_2bit_guppi_58405_85972_GJ687_0028.28463.818.22.45.204.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 8: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.182.vlar_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] 9: blc13_2bit_guppi_58405_84630_HIP86141_0024.28612.818.21.44.179.vlar_1 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 10: wu_sf5_DS-13x11_Grp157060of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 11: wu_sf5_DS-13x11_Grp182717of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 12: wu_sf5_DS-13x11_Grp160060of666667_0 (MD: no; UTS: yes)
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] 13: wu_sf5_DS-13x11_Grp186020of666667_0 (MD: no; UTS: no)
07-Dec-2018 13:32:00 [GPUGRID] [cpu_sched_debug] scheduling e9s181_e7s90p0f68-PABLO_v2Q9UM73_MOR_14_IDP-0-2-RND5567_0
07-Dec-2018 13:32:00 [SETI@home] [cpu_sched_debug] scheduling blc13_2bit_guppi_58405_83308_HIP86141_0020.11756.818.22.45.156.vlar_0
07-Dec-2018 13:32:00 [Einstein@Home] [cpu_sched_debug] scheduling p2030.20170408.G39.56+01.96.N.b1s0g0.00000_1540_1
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] scheduling wu_sf5_DS-13x11_Grp157060of666667_0
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] scheduling wu_sf5_DS-13x11_Grp182717of666667_0
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] scheduling wu_sf5_DS-13x11_Grp160060of666667_0
07-Dec-2018 13:32:00 [NumberFields@home] [cpu_sched_debug] all CPUs used (4.60 >= 4), skipping wu_sf5_DS-13x11_Grp186020of666667_0
07-Dec-2018 13:32:00 [---] [cpu_sched_debug] enforce_run_list: end
The client adds too many SETI tasks (to be run on device 1, the 750Ti) to the preliminary and final job lists, but is only allowed to schedule one of them.

I have seen cases where the provisional list is filled up with so many (too many) infeasible tasks that it never adds feasible ones from other projects:

Client: premature check for max_concurrent can starve resources #1677

but we're getting into very deep waters there.

I think I'd suggest:

1) Include a <type> tag in the GPU exclude (can't do any harm)
2) Post a fuller startup log, including the GPU detection lines and CPU core count
3) Stand by to post a <cpu_sched_debug> event log (just one cycle!) as above, if changing the exclude doesn't cure it.

Hi and thanks Richard. I wondered about <type> when the status reported in the log shows <all>. I wondered if it affected cpus for some reason. That github bug might be coming into play as I DO use a a max_concurrent statement in every projects app_config.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969112 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969117 - Posted: 7 Dec 2018, 18:13:35 UTC

OK just tried with the type added and app removed for both projects. Still stopping all cpu work.

Fri 07 Dec 2018 10:05:32 AM PST |  | Starting BOINC client version 7.8.3 for x86_64-pc-linux-gnu
Fri 07 Dec 2018 10:05:32 AM PST |  | log flags: file_xfer, sched_ops, task, sched_op_debug
Fri 07 Dec 2018 10:05:32 AM PST |  | Libraries: libcurl/7.58.0 OpenSSL/1.0.2n zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
Fri 07 Dec 2018 10:05:32 AM PST |  | Data directory: /home/keith/Desktop/BOINC
Fri 07 Dec 2018 10:05:33 AM PST |  | CUDA: NVIDIA GPU 0: GeForce RTX 2080 (driver version 410.78, CUDA version 10.0, compute capability 7.5, 4096MB, 3978MB available, 21197 GFLOPS peak)
Fri 07 Dec 2018 10:05:33 AM PST |  | CUDA: NVIDIA GPU 1: GeForce GTX 1080 Ti (driver version 410.78, CUDA version 10.0, compute capability 6.1, 4096MB, 3976MB available, 11974 GFLOPS peak)
Fri 07 Dec 2018 10:05:33 AM PST |  | CUDA: NVIDIA GPU 2: GeForce GTX 1070 (driver version 410.78, CUDA version 10.0, compute capability 6.1, 4096MB, 3984MB available, 6463 GFLOPS peak)
Fri 07 Dec 2018 10:05:33 AM PST |  | CUDA: NVIDIA GPU 3: GeForce GTX 1070 (driver version 410.78, CUDA version 10.0, compute capability 6.1, 4096MB, 3984MB available, 6463 GFLOPS peak)
Fri 07 Dec 2018 10:05:33 AM PST |  | OpenCL: NVIDIA GPU 0: GeForce RTX 2080 (driver version 410.78, device version OpenCL 1.2 CUDA, 7944MB, 3978MB available, 21197 GFLOPS peak)
Fri 07 Dec 2018 10:05:33 AM PST |  | OpenCL: NVIDIA GPU 1: GeForce GTX 1080 Ti (driver version 410.78, device version OpenCL 1.2 CUDA, 11178MB, 3976MB available, 11974 GFLOPS peak)
Fri 07 Dec 2018 10:05:33 AM PST |  | OpenCL: NVIDIA GPU 2: GeForce GTX 1070 (driver version 410.78, device version OpenCL 1.2 CUDA, 8120MB, 3984MB available, 6463 GFLOPS peak)
Fri 07 Dec 2018 10:05:33 AM PST |  | OpenCL: NVIDIA GPU 3: GeForce GTX 1070 (driver version 410.78, device version OpenCL 1.2 CUDA, 8120MB, 3984MB available, 6463 GFLOPS peak)
Fri 07 Dec 2018 10:05:33 AM PST | SETI@home | Found app_info.xml; using anonymous platform
Fri 07 Dec 2018 10:05:33 AM PST |  | Host name: Numbskull
Fri 07 Dec 2018 10:05:33 AM PST |  | Processor: 24 AuthenticAMD AMD Ryzen Threadripper 2920X 12-Core Processor [Family 23 Model 8 Stepping 2]
Fri 07 Dec 2018 10:05:33 AM PST |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
Fri 07 Dec 2018 10:05:33 AM PST |  | OS: Linux Ubuntu: Ubuntu 18.04.1 LTS [4.15.0-42-generic]
Fri 07 Dec 2018 10:05:33 AM PST |  | Memory: 31.34 GB physical, 2.00 GB virtual
Fri 07 Dec 2018 10:05:33 AM PST |  | Disk: 227.74 GB total, 204.41 GB free
Fri 07 Dec 2018 10:05:33 AM PST |  | Local time is UTC -8 hours
Fri 07 Dec 2018 10:05:33 AM PST | Einstein@Home | Found app_config.xml
Fri 07 Dec 2018 10:05:33 AM PST | GPUGRID | Found app_config.xml
Fri 07 Dec 2018 10:05:33 AM PST | Milkyway@Home | Found app_config.xml
Fri 07 Dec 2018 10:05:33 AM PST | SETI@home | Found app_config.xml
Fri 07 Dec 2018 10:05:33 AM PST |  | Config: GUI RPC allowed from any host
Fri 07 Dec 2018 10:05:33 AM PST | Einstein@Home | Config: excluded GPU.  Type: NVIDIA.  App: all.  Device: 0
Fri 07 Dec 2018 10:05:33 AM PST | GPUGRID | Config: excluded GPU.  Type: NVIDIA.  App: all.  Device: 0
Fri 07 Dec 2018 10:05:33 AM PST |  | Config: use all coprocessors


Here is the cpu_sched_debug output.

Fri 07 Dec 2018 10:10:15 AM PST |  | [cpu_sched_debug] Request CPU reschedule: Core client configuration
Fri 07 Dec 2018 10:10:15 AM PST |  | [cpu_sched_debug] schedule_cpus(): start
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_84963_HIP86525_0025.24020.409.22.45.121.vlar_1 (NVIDIA GPU, FIFO) (prio -0.998093)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.231.vlar_1 (NVIDIA GPU, FIFO) (prio -1.009073)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.165.vlar_1 (NVIDIA GPU, FIFO) (prio -1.020053)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.222.vlar_1 (NVIDIA GPU, FIFO) (prio -1.031033)
Fri 07 Dec 2018 10:10:15 AM PST | Milkyway@Home | [cpu_sched_debug] add to run list: de_modfit_sim19fixed_bundle4_4s_NoContraintsWithDisk260_1_1541104502_13085492_1 (NVIDIA GPU, FIFO) (prio -1.034328)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.225.vlar_1 (NVIDIA GPU, FIFO) (prio -1.042013)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.227.vlar_1 (NVIDIA GPU, FIFO) (prio -1.052993)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.216.vlar_1 (NVIDIA GPU, FIFO) (prio -1.063973)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.226.vlar_1 (NVIDIA GPU, FIFO) (prio -1.074953)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.230.vlar_1 (NVIDIA GPU, FIFO) (prio -1.085933)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.37.vlar_1 (NVIDIA GPU, FIFO) (prio -1.096913)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.232.vlar_1 (NVIDIA GPU, FIFO) (prio -1.107893)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.162.vlar_0 (NVIDIA GPU, FIFO) (prio -1.118873)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58406_00572_HIP85973_0031.25872.0.21.44.215.vlar_1 (NVIDIA GPU, FIFO) (prio -1.129853)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.143.vlar_1 (NVIDIA GPU, FIFO) (prio -1.140832)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.223.vlar_1 (NVIDIA GPU, FIFO) (prio -1.151812)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] add to run list: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.228.vlar_1 (NVIDIA GPU, FIFO) (prio -1.162792)
Fri 07 Dec 2018 10:10:15 AM PST | Milkyway@Home | [cpu_sched_debug] add to run list: de_modfit_sim19fixed_bundle6_2s_NoContraintsWithDisk140_1_1541104502_13324137_0 (NVIDIA GPU, FIFO) (prio -1.231968)
Fri 07 Dec 2018 10:10:15 AM PST |  | [cpu_sched_debug] enforce_run_list(): start
Fri 07 Dec 2018 10:10:15 AM PST |  | [cpu_sched_debug] preliminary job list:
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 0: blc13_2bit_guppi_58405_84963_HIP86525_0025.24020.409.22.45.121.vlar_1 (MD: no; UTS: yes)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 1: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.231.vlar_1 (MD: no; UTS: yes)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 2: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.165.vlar_1 (MD: no; UTS: yes)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 3: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.222.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | Milkyway@Home | [cpu_sched_debug] 4: de_modfit_sim19fixed_bundle4_4s_NoContraintsWithDisk260_1_1541104502_13085492_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 5: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.225.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 6: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.227.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 7: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.216.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 8: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.226.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 9: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.230.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 10: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.37.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 11: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.232.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 12: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.162.vlar_0 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 13: blc13_2bit_guppi_58406_00572_HIP85973_0031.25872.0.21.44.215.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 14: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.143.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 15: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.223.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 16: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.228.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | Milkyway@Home | [cpu_sched_debug] 17: de_modfit_sim19fixed_bundle6_2s_NoContraintsWithDisk140_1_1541104502_13324137_0 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST |  | [cpu_sched_debug] final job list:
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 0: blc13_2bit_guppi_58405_84963_HIP86525_0025.24020.409.22.45.121.vlar_1 (MD: no; UTS: yes)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 1: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.231.vlar_1 (MD: no; UTS: yes)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 2: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.165.vlar_1 (MD: no; UTS: yes)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 3: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.222.vlar_1 (MD: no; UTS: yes)
Fri 07 Dec 2018 10:10:15 AM PST | Milkyway@Home | [cpu_sched_debug] 4: de_modfit_sim19fixed_bundle4_4s_NoContraintsWithDisk260_1_1541104502_13085492_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 5: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.225.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 6: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.227.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 7: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.216.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 8: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.226.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 9: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.230.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 10: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.37.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 11: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.232.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 12: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.162.vlar_0 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 13: blc13_2bit_guppi_58406_00572_HIP85973_0031.25872.0.21.44.215.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 14: blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.143.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 15: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.223.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] 16: blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.228.vlar_1 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | Milkyway@Home | [cpu_sched_debug] 17: de_modfit_sim19fixed_bundle6_2s_NoContraintsWithDisk140_1_1541104502_13324137_0 (MD: no; UTS: no)
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] scheduling blc13_2bit_guppi_58405_84963_HIP86525_0025.24020.409.22.45.121.vlar_1
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] scheduling blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.231.vlar_1
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] scheduling blc13_2bit_guppi_58405_85972_GJ687_0028.25257.818.22.45.165.vlar_1
Fri 07 Dec 2018 10:10:15 AM PST | SETI@home | [cpu_sched_debug] scheduling blc13_2bit_guppi_58405_85640_HIP85417_0027.25988.0.22.45.222.vlar_1
Fri 07 Dec 2018 10:10:15 AM PST |  | [cpu_sched_debug] using 4.00 out of 24 CPUs
Fri 07 Dec 2018 10:10:15 AM PST |  | [cpu_sched_debug] enforce_run_list: end

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969117 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1969124 - Posted: 7 Dec 2018, 18:50:28 UTC - in response to Message 1969117.  
Last modified: 7 Dec 2018, 18:59:19 UTC

Well, according to cpu_sched_debug, it's only even considered 17 tasks as possibilities, for your 28 resources. That's not a good start.

Every task that it has considered requires at least part of an NVidia GPU. Would I be right in assuming that the SETI tasks are config'd to require 1+1? That would mean that the first four off the top of the list would completely use all four GPUs, and four of the 24 CPU cores - as stated at the bottom. And there are no tasks in the 'run list' which fit the remaining CPUs.

To confirm that we're on the right track, the only thing you can do quickly and easily is to selectively suspend the G(only)PU tasks in your cache. By the time you're down to about a dozen 'ready to run', you should start to see CPU tasks starting up. Once they've started, BOINC will probably let them run to completion - so you should get enough free time to make a cup of coffee - but in no way can it be a long term solution on a machine with that much grunt.

Your trouble is that BOINC is designed and intended to run by itself - making ist own decisions about what to run and when. Over the years, extra frilly bits have been added round the edges: starting with support for GPUs, then adding cc_config.xml and app_config.xml controls, a bit at a time. The resulting mess can in no way be described as "designed", and in cases like this you've bumped into edges where one set of controls interferes with some other set of controls.

It's a long time since I opened that Github issue - over 2 years, as you'll have seen - and in all that time, no developer has shown any interest. I need to go and re-read the code around line 130 to remind myself what the problem is: while I do that, could you summarise what your various max_concurrent restrictions are? We need to consider all of them, in the round.

Edit - another one for the reading list: https://github.com/BOINC/boinc/commit/952a495fb7c99f79692921fbb2afc306e8a88401
ID: 1969124 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969137 - Posted: 7 Dec 2018, 20:12:40 UTC

These are my app_config.xml files for each project.

Seti

<app_config>
  <app_version>
    <app_name>setiathome_v8</app_name>
    <plan_class>cuda90</plan_class>
    <avg_ncpus>1</avg_ncpus>
    <ngpus>1</ngpus>
    <cmdline>-nobs</cmdline>
  </app_version>

  <app_version>
    <app_name>astropulse_v7</app_name>
    <plan_class>opencl_nvidia_100</plan_class>
    <avg_ncpus>1</avg_ncpus>
    <ngpus>1</ngpus>
  </app_version>
<project_max_concurrent>16</project_max_concurrent>
</app_config>


Einstein

<app_config>
     
     <app>
        <name>hsgamma_FGRPB1G</name>
        <gpu_versions>
            <gpu_usage>1</gpu_usage>
            <cpu_usage>0.1</cpu_usage>
        </gpu_versions>
     </app>

<project_max_concurrent>2</project_max_concurrent>
</app_config>


GPUGrid

<app_config>
<app>
<name>acemdlong</name>
<max_concurrent>1</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>.1</cpu_usage>
</gpu_versions>
</app>

<app>
<name>acemdshort</name>
<max_concurrent>1</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>.1</cpu_usage>
</gpu_versions>
</app>
</app_config>

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969137 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1969145 - Posted: 7 Dec 2018, 20:33:56 UTC - in response to Message 1969137.  

It would be worth knocking out the project_max_concurrent for SETI for a few minutes, just to see what happens. You can edit the file any time, 'Read config files' from the Manager Options menu, and it should take effect immediately - no restart required. Wait 60 seconds before touching anything else.
ID: 1969145 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969147 - Posted: 7 Dec 2018, 20:43:03 UTC - in response to Message 1969145.  

It would be worth knocking out the project_max_concurrent for SETI for a few minutes, just to see what happens. You can edit the file any time, 'Read config files' from the Manager Options menu, and it should take effect immediately - no restart required. Wait 60 seconds before touching anything else.

That did the trick. All the cpu tasks are running. But not at all what I desire. I wish to only run 12 cpu tasks on the physical cores. Currently running 20 cpu tasks. Will play around some more with the app_config.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969147 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1969148 - Posted: 7 Dec 2018, 20:44:31 UTC - in response to Message 1969147.  

OK, I'll go and play with some beer glasses while you play with the app_config.
ID: 1969148 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969151 - Posted: 7 Dec 2018, 20:49:56 UTC

Just tried a max_concurrent for Seti V8 section in the app_config file instead of a max_project_concurrent. Re-read the config files and it is not being applied.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969151 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969174 - Posted: 7 Dec 2018, 22:38:46 UTC

Well if no one can help sort out the incompatibility between the gpu exclude option and the project max concurrent option, I am going to have to suspend the Einstein and GPUGrid projects on that host. It is unacceptable to have 90 minutes of run_time and only 48 minutes of cpu_time. The computer is way overcommitted on cpu tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969174 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1969181 - Posted: 7 Dec 2018, 23:09:38 UTC - in response to Message 1969151.  

Just tried a max_concurrent for Seti V8 section in the app_config file instead of a max_project_concurrent. Re-read the config files and it is not being applied.
The trouble there is that both CPU tasks and GPU tasks are for the same 'application' - setiathome_v8.

What you need is to be able to distinguish between application versions, so you can set a separate max_concurrent for CPU versions and GPU versions. But no-one has coded for that yet.
ID: 1969181 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1969209 - Posted: 8 Dec 2018, 1:24:17 UTC

Hey Keith, I know you prefer to use project_max_concurrent (and i do too), but what happens if you remove that, and also restrict your CPU use percentage to like 80%? i think that will give you the right number of CPU tasks and GPU tasks (12+4).
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1969209 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969247 - Posted: 8 Dec 2018, 3:52:58 UTC - in response to Message 1969209.  

Hey Keith, I know you prefer to use project_max_concurrent (and i do too), but what happens if you remove that, and also restrict your CPU use percentage to like 80%? i think that will give you the right number of CPU tasks and GPU tasks (12+4).

Hi Ian, yes that is exactly what I did as suggested by Richard. That is what allowed the cpu tasks to start running again. Unfortunately, it allows ALL threads to be used for cpu tasks. And that absolutely kills the run_times of each cpu task. The cpu becomes extremely overcommitted where cpu tasks run for over 2 hours and only use 45 minutes of actual cpu_time.

I can't use the global cpu percentage because it will affect all hosts. I don't have any issues with my other hosts using 100% of the cpu. Just this new TR build with the need to exclude the RTX 2080 is having issues.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969247 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1969257 - Posted: 8 Dec 2018, 4:28:19 UTC - in response to Message 1969247.  
Last modified: 8 Dec 2018, 4:30:03 UTC

I meant only the CPU % setting in the individual host setting. Not a global config across all hosts. In compute settings on that one system. I’ve never seen changing this on one computer cause the setting to change on another. It just stays for that one system.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1969257 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969262 - Posted: 8 Dec 2018, 4:53:04 UTC - in response to Message 1969257.  
Last modified: 8 Dec 2018, 4:56:07 UTC

I meant only the CPU % setting in the individual host setting. Not a global config across all hosts. In compute settings on that one system. I’ve never seen changing this on one computer cause the setting to change on another. It just stays for that one system.

I can't use the Local preferences since I run 4 projects on the host at all times. Can't set a venue that isn't applied globally. I've tried multiple times in the past to set settings for only a single host. Never works and affects all hosts globally. If I would only run a single project on a host, it works. Not multiple projects running concurrently.

As Richard keeps pointing out to me, BOINC is supposed to run without user intervention on its defaults and just isn't capable of running all the new bits that have been added on over the years.

Your trouble is that BOINC is designed and intended to run by itself - making ist own decisions about what to run and when. Over the years, extra frilly bits have been added round the edges: starting with support for GPUs, then adding cc_config.xml and app_config.xml controls, a bit at a time. The resulting mess can in no way be described as "designed", and in cases like this you've bumped into edges where one set of controls interferes with some other set of controls

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969262 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1969294 - Posted: 8 Dec 2018, 9:50:27 UTC

I thought maybe the debt imbalance between projects had been paid off during the Seti task shortage and all my other projects were running gpu tasks. For a couple of hours I actually had only 12 cpu tasks running and none in waiting until even the cpu tasks ran dry.

But everything went back to what I have been fighting once the Seti caches filled up again. So took someone suggestion to try and limit cpu% in Local Preferences. This seemed to take and I dropped back down to preferred 12 cpu tasks with 4 gpu tasks running. So once again I am trying to use the gpu excludes and still run my other projects concurrently with Seti.

Calling a night and will see if the host is still running the preferred task count in the morning and has cleared out the waiting to run cpu tasks. Fingers crossed.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1969294 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1969297 - Posted: 8 Dec 2018, 10:03:17 UTC - in response to Message 1969294.  

'Debt' as a concept and technique was removed from BOINC in 2010, as part of the changes that introduced CreditNew. There is still a concept of Resource Share and balance between projects - both for work fetch and for CPU scheduling - but it's now based on REC. Unless you've updated your cc_config.xml file, REC has a half-life of 10 days - which IMHO is too slow. I usually set 1 day.

A local CPU % setting will prevent CPU over-commitment leading to over-heating by reducing the number of stressful SETI tasks running concurrently. But it will prevent low-stress tasks from other projects running on the 'spare' CPU cores.

I've never found a way of squaring the complex set of circles that Keith has boxed himself in with, but I'll keep thinking about it.
ID: 1969297 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1969310 - Posted: 8 Dec 2018, 13:24:28 UTC

I've added some further analysis to #1677. We'll see what happens.
ID: 1969310 · Report as offensive
1 · 2 · 3 · 4 . . . 6 · Next

Questions and Answers : Unix/Linux : All CPU tasks not running. Now all are: - "Waiting to run"


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.