All CPU tasks not running. Now all are: - "Waiting to run"

Questions and Answers : Unix/Linux : All CPU tasks not running. Now all are: - "Waiting to run"
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1972880 - Posted: 1 Jan 2019, 0:58:22 UTC

I still want to learn how to compile the binaries in Linux. The only way to learn is to try and do it. So I was going to get all the prerequisite libraries and packages and such and install them. I want to modify work_fetch.cpp like TBar did for my version also. You gave me the links for the BOINC compiling I am supposed to need. I was just waiting on the master to be finalized.

So I am still game in trying to compile for Linux eventually.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1972880 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1972902 - Posted: 1 Jan 2019, 2:39:26 UTC

That compiled version from bintray must have been compiled on an older version of Linux compared to my Ubuntu 18.04. My version uses libpng16.so.16. That is why the libpng12.so version was not found and was proclaimed obsolete for my software sources.

I don't think I will attempt to run that bintray version again. I found I could not connect to the scheduler on my version until I backed everything out that I had to install to run that version.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1972902 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973115 - Posted: 3 Jan 2019, 1:17:07 UTC

I seem to have figured out how to compile a client with Jord's assistance. But I can't tell if anything is fixed yet because of inability to get any Seti work.

Richard, is this version the one you said you wanted me to try. https://github.com/BOINC/boinc/commit/9f8f52b7824164091828bc586189771971b399d5

I remember you said something about one of the versions being less talkative or something. I see error messages about not having a cpu application for my projects that are not Seti.
Also lots of informative information when cpu_sched_debug is set in Options.
This is the Event Log startup.

./boinc
02-Jan-2019 17:06:31 [---] Starting BOINC client version 7.15.0 for x86_64-pc-linux-gnu
02-Jan-2019 17:06:31 [---] This a development version of BOINC and may not function properly
02-Jan-2019 17:06:31 [---] log flags: file_xfer, sched_ops, task, sched_op_debug
02-Jan-2019 17:06:31 [---] Libraries: libcurl/7.61.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.4) nghttp2/1.32.1 librtmp/2.3
02-Jan-2019 17:06:31 [---] Data directory: /home/keith/Desktop/BOINC
02-Jan-2019 17:06:31 [---] CUDA: NVIDIA GPU 0: GeForce GTX 1080 (driver version 410.78, CUDA version 10.0, compute capability 6.1, 4096MB, 3980MB available, 9523 GFLOPS peak)
02-Jan-2019 17:06:31 [---] CUDA: NVIDIA GPU 1: GeForce GTX 1070 (driver version 410.78, CUDA version 10.0, compute capability 6.1, 4096MB, 3984MB available, 6463 GFLOPS peak)
02-Jan-2019 17:06:31 [---] CUDA: NVIDIA GPU 2: GeForce GTX 1070 (driver version 410.78, CUDA version 10.0, compute capability 6.1, 4096MB, 3984MB available, 6463 GFLOPS peak)
02-Jan-2019 17:06:31 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 1080 (driver version 410.78, device version OpenCL 1.2 CUDA, 8120MB, 3980MB available, 9523 GFLOPS peak)
02-Jan-2019 17:06:31 [---] OpenCL: NVIDIA GPU 1: GeForce GTX 1070 (driver version 410.78, device version OpenCL 1.2 CUDA, 8120MB, 3984MB available, 6463 GFLOPS peak)
02-Jan-2019 17:06:31 [---] OpenCL: NVIDIA GPU 2: GeForce GTX 1070 (driver version 410.78, device version OpenCL 1.2 CUDA, 8116MB, 3984MB available, 6463 GFLOPS peak)
02-Jan-2019 17:06:31 [SETI@home] Found app_info.xml; using anonymous platform
02-Jan-2019 17:06:31 [---] [libc detection] gathered: 2.28, Ubuntu GLIBC 2.28-0ubuntu1
02-Jan-2019 17:06:31 [---] Host name: Mal
02-Jan-2019 17:06:31 [---] Processor: 16 AuthenticAMD AMD Ryzen 7 2700X Eight-Core Processor [Family 23 Model 8 Stepping 2]
02-Jan-2019 17:06:31 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
02-Jan-2019 17:06:31 [---] OS: Linux Ubuntu: Ubuntu 18.10 [4.18.0-13-generic|libc 2.28 (Ubuntu GLIBC 2.28-0ubuntu1)]
02-Jan-2019 17:06:31 [---] Memory: 15.65 GB physical, 2.00 GB virtual
02-Jan-2019 17:06:31 [---] Disk: 99.41 GB total, 84.38 GB free
02-Jan-2019 17:06:31 [---] Local time is UTC -8 hours
02-Jan-2019 17:06:31 [Einstein@Home] Found app_config.xml
02-Jan-2019 17:06:31 [GPUGRID] Found app_config.xml
02-Jan-2019 17:06:31 [Milkyway@Home] Found app_config.xml
02-Jan-2019 17:06:31 [SETI@home] Found app_config.xml
02-Jan-2019 17:06:31 [SETI@home] Max 11 concurrent jobs
02-Jan-2019 17:06:31 [Einstein@Home] hsgamma_FGRPB1G: Max 2 concurrent jobs
02-Jan-2019 17:06:31 [GPUGRID] acemdlong: Max 1 concurrent jobs
02-Jan-2019 17:06:31 [GPUGRID] acemdshort: Max 1 concurrent jobs
02-Jan-2019 17:06:31 [Milkyway@Home] milkyway: Max 2 concurrent jobs
02-Jan-2019 17:06:31 [---] Config: GUI RPC allowed from any host
02-Jan-2019 17:06:31 [Einstein@Home] Config: excluded GPU. Type: NVIDIA. App: all. Device: 0
02-Jan-2019 17:06:31 [GPUGRID] Config: excluded GPU. Type: NVIDIA. App: all. Device: 0
02-Jan-2019 17:06:31 [---] Config: use all coprocessors
02-Jan-2019 17:06:31 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 12762011; resource share 25
02-Jan-2019 17:06:31 [Einstein@Home] Your settings do not allow fetching tasks for CPU. To fix this, you can change Project Preferences on the project's web site.
02-Jan-2019 17:06:31 [GPUGRID] URL http://www.gpugrid.net/; Computer ID 495691; resource share 25
02-Jan-2019 17:06:31 [GPUGRID] Your settings do not allow fetching tasks for CPU. To fix this, you can change Project Preferences on the project's web site.
02-Jan-2019 17:06:31 [Milkyway@Home] URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 793614; resource share 25
02-Jan-2019 17:06:31 [Milkyway@Home] Your settings do not allow fetching tasks for CPU. To fix this, you can change Project Preferences on the project's web site.
02-Jan-2019 17:06:31 [SETI@home] URL http://setiathome.berkeley.edu/; Computer ID 8646147; resource share 900
02-Jan-2019 17:06:31 [SETI@home] General prefs: from SETI@home (last modified 26-Dec-2018 22:58:31)
02-Jan-2019 17:06:31 [SETI@home] Host location: none
02-Jan-2019 17:06:31 [SETI@home] General prefs: using your defaults
02-Jan-2019 17:06:31 [---] Preferences:
02-Jan-2019 17:06:31 [---] max memory usage when active: 12823.21 MB
02-Jan-2019 17:06:31 [---] max memory usage when idle: 16029.01 MB
02-Jan-2019 17:06:31 [---] max disk usage: 8.00 GB
02-Jan-2019 17:06:31 [---] (to change preferences, visit a project web site or select Preferences in the Manager)
02-Jan-2019 17:06:31 [---] Setting up project and slot directories
02-Jan-2019 17:06:31 [---] Checking active tasks
02-Jan-2019 17:06:31 [---] Setting up GUI RPC socket
02-Jan-2019 17:06:31 [---] Checking presence of 430 project files
02-Jan-2019 17:06:31 Initialization completed
Einstein@Home: setting reason to BUFFER_FULL
GPUGRID: setting reason to BUFFER_FULL
02-Jan-2019 17:06:31 [SETI@home] [sched_op] Starting scheduler request
02-Jan-2019 17:06:31 [SETI@home] Sending scheduler request: To fetch work.
02-Jan-2019 17:06:31 [SETI@home] Requesting new tasks for CPU and NVIDIA GPU
02-Jan-2019 17:06:31 [SETI@home] [sched_op] CPU work request: 1221514.43 seconds; 16.00 devices
02-Jan-2019 17:06:31 [SETI@home] [sched_op] NVIDIA GPU work request: 87082.43 seconds; 3.00 devices
02-Jan-2019 17:06:34 [SETI@home] Scheduler request completed: got 0 new tasks
02-Jan-2019 17:06:34 [SETI@home] [sched_op] Server version 709
02-Jan-2019 17:06:34 [SETI@home] Project has no tasks available
02-Jan-2019 17:06:34 [SETI@home] Project requested delay of 303 seconds
02-Jan-2019 17:06:34 [SETI@home] [sched_op] Deferring communication for 00:05:03
02-Jan-2019 17:06:34 [SETI@home] [sched_op] Reason: requested by project
Einstein@Home: setting reason to BUFFER_FULL
GPUGRID: setting reason to BUFFER_FULL
Einstein@Home: setting reason to BUFFER_FULL
GPUGRID: setting reason to BUFFER_FULL
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973115 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973123 - Posted: 3 Jan 2019, 2:51:52 UTC

I think the beta branch client is working and not showing any of the issues for the original post conditions. Not running it on the same host but similar. Only 3 cards instead of 4 and no RTX 2080 card. Only 16 threads instead of 24. But the <gpu_exclude>0</gpu_exclude> statements for both Einstein and GPUGrid are in place along with a slightly modified <project_max_concurrent>11</project_max_concurrent> statement. The max concurrent statement is being obeyed and all my cpu tasks are running.

So I think that David's PR#2918 commit is working as he wanted. It seems to have fixed my issue.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973123 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973134 - Posted: 3 Jan 2019, 5:21:30 UTC

I may have spoke too enthusiastically it seems. For some reason I am only running on two gpus out of three. I removed the gpu_exclude from both Einstein and GPUGrid in an attempt to clear out the task cache so I can get back to my normal daily driver. But I don't run on gpu:1 for some reason. I was earlier when I had Seti work on board. I have only Einstein work and GPUGrid work currently and a dozen MilkyWay tasks completed but can't report because the MW server database is offline.

Also I am picking up for some reason something about an ATI card library being missing. I don't have any ATI cards and I have never installed any ATI drivers.

This is the Event Log on startup.

Wed 02 Jan 2019 09:08:10 PM PST | | Starting BOINC client version 7.15.0 for x86_64-pc-linux-gnu
Wed 02 Jan 2019 09:08:10 PM PST | | This a development version of BOINC and may not function properly
Wed 02 Jan 2019 09:08:10 PM PST | | log flags: file_xfer, sched_ops, task, coproc_debug, sched_op_debug
Wed 02 Jan 2019 09:08:10 PM PST | | Libraries: libcurl/7.61.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.4) nghttp2/1.32.1 librtmp/2.3
Wed 02 Jan 2019 09:08:10 PM PST | | Data directory: /home/keith/Desktop/BOINC
Wed 02 Jan 2019 09:08:10 PM PST | | [coproc] launching child process at /home/keith/Desktop/BOINC/boinc
Wed 02 Jan 2019 09:08:10 PM PST | | [coproc] with data directory /home/keith/Desktop/BOINC
Wed 02 Jan 2019 09:08:10 PM PST | | CUDA: NVIDIA GPU 0: GeForce GTX 1080 (driver version 410.78, CUDA version 10.0, compute capability 6.1, 4096MB, 3980MB available, 9523 GFLOPS peak)
Wed 02 Jan 2019 09:08:10 PM PST | | CUDA: NVIDIA GPU 1: GeForce GTX 1070 (driver version 410.78, CUDA version 10.0, compute capability 6.1, 4096MB, 3984MB available, 6463 GFLOPS peak)
Wed 02 Jan 2019 09:08:10 PM PST | | CUDA: NVIDIA GPU 2: GeForce GTX 1070 (driver version 410.78, CUDA version 10.0, compute capability 6.1, 4096MB, 3984MB available, 6463 GFLOPS peak)
Wed 02 Jan 2019 09:08:10 PM PST | | OpenCL: NVIDIA GPU 0: GeForce GTX 1080 (driver version 410.78, device version OpenCL 1.2 CUDA, 8120MB, 3980MB available, 9523 GFLOPS peak)
Wed 02 Jan 2019 09:08:10 PM PST | | OpenCL: NVIDIA GPU 1: GeForce GTX 1070 (driver version 410.78, device version OpenCL 1.2 CUDA, 8120MB, 3984MB available, 6463 GFLOPS peak)
Wed 02 Jan 2019 09:08:10 PM PST | | OpenCL: NVIDIA GPU 2: GeForce GTX 1070 (driver version 410.78, device version OpenCL 1.2 CUDA, 8116MB, 3984MB available, 6463 GFLOPS peak)
Wed 02 Jan 2019 09:08:10 PM PST | | [coproc] NVIDIA library reports 3 GPUs
Wed 02 Jan 2019 09:08:10 PM PST | | [coproc] ATI: libaticalrt.so: cannot open shared object file: No such file or directory
Wed 02 Jan 2019 09:08:10 PM PST | SETI@home | Found app_info.xml; using anonymous platform
Wed 02 Jan 2019 09:08:10 PM PST | | [libc detection] gathered: 2.28, Ubuntu GLIBC 2.28-0ubuntu1
Wed 02 Jan 2019 09:08:10 PM PST | | Host name: Mal
Wed 02 Jan 2019 09:08:10 PM PST | | Processor: 16 AuthenticAMD AMD Ryzen 7 2700X Eight-Core Processor [Family 23 Model 8 Stepping 2]
Wed 02 Jan 2019 09:08:10 PM PST | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca
Wed 02 Jan 2019 09:08:10 PM PST | | OS: Linux Ubuntu: Ubuntu 18.10 [4.18.0-13-generic|libc 2.28 (Ubuntu GLIBC 2.28-0ubuntu1)]
Wed 02 Jan 2019 09:08:10 PM PST | | Memory: 15.65 GB physical, 2.00 GB virtual
Wed 02 Jan 2019 09:08:10 PM PST | | Disk: 99.41 GB total, 84.25 GB free
Wed 02 Jan 2019 09:08:10 PM PST | | Local time is UTC -8 hours
Wed 02 Jan 2019 09:08:10 PM PST | Einstein@Home | Found app_config.xml
Wed 02 Jan 2019 09:08:10 PM PST | GPUGRID | Found app_config.xml
Wed 02 Jan 2019 09:08:10 PM PST | Milkyway@Home | Found app_config.xml
Wed 02 Jan 2019 09:08:10 PM PST | SETI@home | Found app_config.xml
Wed 02 Jan 2019 09:08:10 PM PST | SETI@home | Max 11 concurrent jobs
Wed 02 Jan 2019 09:08:10 PM PST | Einstein@Home | hsgamma_FGRPB1G: Max 2 concurrent jobs
Wed 02 Jan 2019 09:08:10 PM PST | GPUGRID | acemdlong: Max 1 concurrent jobs
Wed 02 Jan 2019 09:08:10 PM PST | GPUGRID | acemdshort: Max 1 concurrent jobs
Wed 02 Jan 2019 09:08:10 PM PST | Milkyway@Home | milkyway: Max 2 concurrent jobs
Wed 02 Jan 2019 09:08:10 PM PST | | Config: GUI RPC allowed from any host
Wed 02 Jan 2019 09:08:10 PM PST | | Config: use all coprocessors
Wed 02 Jan 2019 09:08:10 PM PST | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 12762011; resource share 25
Wed 02 Jan 2019 09:08:10 PM PST | Einstein@Home | Your settings do not allow fetching tasks for CPU. To fix this, you can change Project Preferences on the project's web site.
Wed 02 Jan 2019 09:08:10 PM PST | GPUGRID | URL http://www.gpugrid.net/; Computer ID 495691; resource share 25
Wed 02 Jan 2019 09:08:10 PM PST | GPUGRID | Your settings do not allow fetching tasks for CPU. To fix this, you can change Project Preferences on the project's web site.
Wed 02 Jan 2019 09:08:10 PM PST | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 793614; resource share 25
Wed 02 Jan 2019 09:08:10 PM PST | Milkyway@Home | Your settings do not allow fetching tasks for CPU. To fix this, you can change Project Preferences on the project's web site.
Wed 02 Jan 2019 09:08:10 PM PST | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 8646147; resource share 900
Wed 02 Jan 2019 09:08:10 PM PST | SETI@home | General prefs: from SETI@home (last modified 26-Dec-2018 22:58:31)
Wed 02 Jan 2019 09:08:10 PM PST | SETI@home | Host location: none
Wed 02 Jan 2019 09:08:10 PM PST | SETI@home | General prefs: using your defaults
Wed 02 Jan 2019 09:08:10 PM PST | | Preferences:
Wed 02 Jan 2019 09:08:10 PM PST | | max memory usage when active: 12823.21 MB
Wed 02 Jan 2019 09:08:10 PM PST | | max memory usage when idle: 16029.01 MB
Wed 02 Jan 2019 09:08:10 PM PST | | max disk usage: 8.00 GB
Wed 02 Jan 2019 09:08:10 PM PST | | (to change preferences, visit a project web site or select Preferences in the Manager)
Wed 02 Jan 2019 09:08:10 PM PST | | Setting up project and slot directories
Wed 02 Jan 2019 09:08:10 PM PST | | Checking active tasks
Wed 02 Jan 2019 09:08:10 PM PST | | Setting up GUI RPC socket
Wed 02 Jan 2019 09:08:10 PM PST | | Checking presence of 375 project files
Wed 02 Jan 2019 09:08:10 PM PST | GPUGRID | [coproc] Assigning NVIDIA instance 0 to e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:08:10 PM PST | GPUGRID | [coproc] Assigning NVIDIA instance 1 to e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0
Wed 02 Jan 2019 09:08:10 PM PST | Einstein@Home | [coproc] Assigning NVIDIA instance 2 to LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:08:10 PM PST | Milkyway@Home | [sched_op] Starting scheduler request
Wed 02 Jan 2019 09:08:10 PM PST | Milkyway@Home | Sending scheduler request: To report completed tasks.
Wed 02 Jan 2019 09:08:10 PM PST | Milkyway@Home | Reporting 10 completed tasks
Wed 02 Jan 2019 09:08:10 PM PST | Milkyway@Home | Not requesting tasks: "no new tasks" requested via Manager
Wed 02 Jan 2019 09:08:10 PM PST | Milkyway@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
Wed 02 Jan 2019 09:08:10 PM PST | Milkyway@Home | [sched_op] NVIDIA GPU work request: 0.00 seconds; 0.00 devices
Wed 02 Jan 2019 09:08:12 PM PST | Milkyway@Home | Scheduler request completed
Wed 02 Jan 2019 09:08:12 PM PST | Milkyway@Home | [sched_op] Server version 707
Wed 02 Jan 2019 09:08:12 PM PST | Milkyway@Home | Server can't open database
Wed 02 Jan 2019 09:08:12 PM PST | Milkyway@Home | Project requested delay of 900 seconds
Wed 02 Jan 2019 09:08:12 PM PST | Milkyway@Home | [sched_op] Deferring communication for 00:15:00
Wed 02 Jan 2019 09:08:12 PM PST | Milkyway@Home | [sched_op] Reason: project requested delay
Wed 02 Jan 2019 09:08:12 PM PST | Milkyway@Home | [sched_op] Deferring communication for 00:32:04
Wed 02 Jan 2019 09:08:12 PM PST | Milkyway@Home | [sched_op] Reason: project is down
Wed 02 Jan 2019 09:09:11 PM PST | GPUGRID | [coproc] NVIDIA instance 0; 1.000000 pending for e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:09:11 PM PST | Einstein@Home | [coproc] NVIDIA instance 0; 1.000000 pending for LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:09:11 PM PST | GPUGRID | [coproc] NVIDIA instance 0: confirming 1.000000 instance for e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:09:11 PM PST | Einstein@Home | [coproc] NVIDIA instance 2: confirming 1.000000 instance for LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:09:11 PM PST | GPUGRID | [coproc] Assigning NVIDIA instance 1 to e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0
Wed 02 Jan 2019 09:10:11 PM PST | GPUGRID | [coproc] NVIDIA instance 0; 1.000000 pending for e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:10:11 PM PST | Einstein@Home | [coproc] NVIDIA instance 0; 1.000000 pending for LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:10:11 PM PST | GPUGRID | [coproc] NVIDIA instance 0: confirming 1.000000 instance for e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:10:11 PM PST | Einstein@Home | [coproc] NVIDIA instance 2: confirming 1.000000 instance for LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:10:11 PM PST | GPUGRID | [coproc] Assigning NVIDIA instance 1 to e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0
Wed 02 Jan 2019 09:10:30 PM PST | | Re-reading cc_config.xml
Wed 02 Jan 2019 09:10:30 PM PST | | Config: GUI RPC allowed from any host
Wed 02 Jan 2019 09:10:30 PM PST | | Config: use all coprocessors
Wed 02 Jan 2019 09:10:30 PM PST | | log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched_debug, sched_op_debug
Wed 02 Jan 2019 09:10:30 PM PST | Einstein@Home | Found app_config.xml
Wed 02 Jan 2019 09:10:30 PM PST | GPUGRID | Found app_config.xml
Wed 02 Jan 2019 09:10:30 PM PST | Milkyway@Home | Found app_config.xml
Wed 02 Jan 2019 09:10:30 PM PST | SETI@home | Found app_config.xml
Wed 02 Jan 2019 09:10:30 PM PST | | [cpu_sched_debug] Request CPU reschedule: Core client configuration
Wed 02 Jan 2019 09:10:31 PM PST | | [cpu_sched_debug] schedule_cpus(): start
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] add to run list: e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0 (NVIDIA GPU, FIFO) (prio -0.984000)
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] add to run list: e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0 (NVIDIA GPU, FIFO) (prio -1.011610)
Wed 02 Jan 2019 09:10:31 PM PST | Einstein@Home | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
Wed 02 Jan 2019 09:10:31 PM PST | Einstein@Home | [cpu_sched_debug] add to run list: LATeah2006L_1100.0_0_0.0_353313_1 (NVIDIA GPU, FIFO) (prio -1.016000)
Wed 02 Jan 2019 09:10:31 PM PST | | [cpu_sched_debug] enforce_run_list(): start
Wed 02 Jan 2019 09:10:31 PM PST | | [cpu_sched_debug] preliminary job list:
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] 0: e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] 1: e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0 (MD: no; UTS: no)
Wed 02 Jan 2019 09:10:31 PM PST | Einstein@Home | [cpu_sched_debug] 2: LATeah2006L_1100.0_0_0.0_353313_1 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:10:31 PM PST | | [cpu_sched_debug] final job list:
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] 0: e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:10:31 PM PST | Einstein@Home | [cpu_sched_debug] 1: LATeah2006L_1100.0_0_0.0_353313_1 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] 2: e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0 (MD: no; UTS: no)
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [coproc] NVIDIA instance 0; 1.000000 pending for e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:10:31 PM PST | Einstein@Home | [coproc] NVIDIA instance 0; 1.000000 pending for LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [coproc] NVIDIA instance 0: confirming 1.000000 instance for e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:10:31 PM PST | Einstein@Home | [coproc] NVIDIA instance 2: confirming 1.000000 instance for LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [coproc] Assigning NVIDIA instance 1 to e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] scheduling e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:10:31 PM PST | Einstein@Home | [cpu_sched_debug] scheduling LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] skipping e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0; max concurrent limit 1 reached
Wed 02 Jan 2019 09:10:31 PM PST | | [cpu_sched_debug] using 2.00 out of 16 CPUs
Wed 02 Jan 2019 09:10:31 PM PST | | [cpu_sched_debug] enforce_run_list: end
Wed 02 Jan 2019 09:10:44 PM PST | | Re-reading cc_config.xml
Wed 02 Jan 2019 09:10:44 PM PST | | Config: GUI RPC allowed from any host
Wed 02 Jan 2019 09:10:44 PM PST | | Config: use all coprocessors
Wed 02 Jan 2019 09:10:44 PM PST | | log flags: file_xfer, sched_ops, task, coproc_debug, cpu_sched_debug, sched_op_debug
Wed 02 Jan 2019 09:10:44 PM PST | Einstein@Home | Found app_config.xml
Wed 02 Jan 2019 09:10:44 PM PST | GPUGRID | Found app_config.xml
Wed 02 Jan 2019 09:10:44 PM PST | Milkyway@Home | Found app_config.xml
Wed 02 Jan 2019 09:10:44 PM PST | SETI@home | Found app_config.xml
Wed 02 Jan 2019 09:10:44 PM PST | | [cpu_sched_debug] Request CPU reschedule: Core client configuration
Wed 02 Jan 2019 09:10:45 PM PST | | [cpu_sched_debug] schedule_cpus(): start
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [cpu_sched_debug] add to run list: e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0 (NVIDIA GPU, FIFO) (prio -0.984013)
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [cpu_sched_debug] add to run list: e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0 (NVIDIA GPU, FIFO) (prio -1.011623)
Wed 02 Jan 2019 09:10:45 PM PST | Einstein@Home | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
Wed 02 Jan 2019 09:10:45 PM PST | Einstein@Home | [cpu_sched_debug] add to run list: LATeah2006L_1100.0_0_0.0_353313_1 (NVIDIA GPU, FIFO) (prio -1.015987)
Wed 02 Jan 2019 09:10:45 PM PST | | [cpu_sched_debug] enforce_run_list(): start
Wed 02 Jan 2019 09:10:45 PM PST | | [cpu_sched_debug] preliminary job list:
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [cpu_sched_debug] 0: e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [cpu_sched_debug] 1: e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0 (MD: no; UTS: no)
Wed 02 Jan 2019 09:10:45 PM PST | Einstein@Home | [cpu_sched_debug] 2: LATeah2006L_1100.0_0_0.0_353313_1 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:10:45 PM PST | | [cpu_sched_debug] final job list:
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [cpu_sched_debug] 0: e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:10:45 PM PST | Einstein@Home | [cpu_sched_debug] 1: LATeah2006L_1100.0_0_0.0_353313_1 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [cpu_sched_debug] 2: e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0 (MD: no; UTS: no)
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [coproc] NVIDIA instance 0; 1.000000 pending for e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:10:45 PM PST | Einstein@Home | [coproc] NVIDIA instance 0; 1.000000 pending for LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [coproc] NVIDIA instance 0: confirming 1.000000 instance for e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:10:45 PM PST | Einstein@Home | [coproc] NVIDIA instance 2: confirming 1.000000 instance for LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [coproc] Assigning NVIDIA instance 1 to e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [cpu_sched_debug] scheduling e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:10:45 PM PST | Einstein@Home | [cpu_sched_debug] scheduling LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:10:45 PM PST | GPUGRID | [cpu_sched_debug] skipping e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0; max concurrent limit 1 reached
Wed 02 Jan 2019 09:10:45 PM PST | | [cpu_sched_debug] using 2.00 out of 16 CPUs
Wed 02 Jan 2019 09:10:45 PM PST | | [cpu_sched_debug] enforce_run_list: end
Wed 02 Jan 2019 09:11:45 PM PST | | [cpu_sched_debug] Request CPU reschedule: periodic CPU scheduling
Wed 02 Jan 2019 09:11:45 PM PST | | [cpu_sched_debug] schedule_cpus(): start
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [cpu_sched_debug] add to run list: e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0 (NVIDIA GPU, FIFO) (prio -0.984069)
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [cpu_sched_debug] add to run list: e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0 (NVIDIA GPU, FIFO) (prio -1.011678)
Wed 02 Jan 2019 09:11:45 PM PST | Einstein@Home | [cpu_sched_debug] reserving 1.000000 of coproc NVIDIA
Wed 02 Jan 2019 09:11:45 PM PST | Einstein@Home | [cpu_sched_debug] add to run list: LATeah2006L_1100.0_0_0.0_353313_1 (NVIDIA GPU, FIFO) (prio -1.015931)
Wed 02 Jan 2019 09:11:45 PM PST | | [cpu_sched_debug] enforce_run_list(): start
Wed 02 Jan 2019 09:11:45 PM PST | | [cpu_sched_debug] preliminary job list:
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [cpu_sched_debug] 0: e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [cpu_sched_debug] 1: e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0 (MD: no; UTS: no)
Wed 02 Jan 2019 09:11:45 PM PST | Einstein@Home | [cpu_sched_debug] 2: LATeah2006L_1100.0_0_0.0_353313_1 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:11:45 PM PST | | [cpu_sched_debug] final job list:
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [cpu_sched_debug] 0: e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:11:45 PM PST | Einstein@Home | [cpu_sched_debug] 1: LATeah2006L_1100.0_0_0.0_353313_1 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [cpu_sched_debug] 2: e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0 (MD: no; UTS: no)
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [coproc] NVIDIA instance 0; 1.000000 pending for e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:11:45 PM PST | Einstein@Home | [coproc] NVIDIA instance 0; 1.000000 pending for LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [coproc] NVIDIA instance 0: confirming 1.000000 instance for e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:11:45 PM PST | Einstein@Home | [coproc] NVIDIA instance 2: confirming 1.000000 instance for LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [coproc] Assigning NVIDIA instance 1 to e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [cpu_sched_debug] scheduling e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:11:45 PM PST | Einstein@Home | [cpu_sched_debug] scheduling LATeah2006L_1100.0_0_0.0_353313_1
Wed 02 Jan 2019 09:11:45 PM PST | GPUGRID | [cpu_sched_debug] skipping e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0; max concurrent limit 1 reached
Wed 02 Jan 2019 09:11:45 PM PST | | [cpu_sched_debug] using 2.00 out of 16 CPUs
Wed 02 Jan 2019 09:11:45 PM PST | | [cpu_sched_debug] enforce_run_list: end
Wed 02 Jan 2019 09:12:24 PM PST | | Re-reading cc_config.xml
Wed 02 Jan 2019 09:12:24 PM PST | | Config: GUI RPC allowed from any host
Wed 02 Jan 2019 09:12:24 PM PST | | Config: use all coprocessors
Wed 02 Jan 2019 09:12:24 PM PST | | log flags: file_xfer, sched_ops, task, sched_op_debug
Wed 02 Jan 2019 09:12:24 PM PST | Einstein@Home | Found app_config.xml
Wed 02 Jan 2019 09:12:24 PM PST | GPUGRID | Found app_config.xml
Wed 02 Jan 2019 09:12:24 PM PST | Milkyway@Home | Found app_config.xml
Wed 02 Jan 2019 09:12:24 PM PST | SETI@home | Found app_config.xml
Wed 02 Jan 2019 09:12:28 PM PST | | Re-reading cc_config.xml
Wed 02 Jan 2019 09:12:28 PM PST | | Config: GUI RPC allowed from any host
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973134 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1973170 - Posted: 3 Jan 2019, 9:51:07 UTC - in response to Message 1973115.  

Richard, is this version the one you said you wanted me to try. https://github.com/BOINC/boinc/commit/9f8f52b7824164091828bc586189771971b399d5
Yes, that's the one. All the real work was done before that one.

You have these lines in the log at startup:

02-Jan-2019 17:06:31 [Einstein@Home] Your settings do not allow fetching tasks for CPU. To fix this, you can change Project Preferences on the project's web site.
02-Jan-2019 17:06:31 [GPUGRID] Your settings do not allow fetching tasks for CPU. To fix this, you can change Project Preferences on the project's web site.
02-Jan-2019 17:06:31 [Milkyway@Home] Your settings do not allow fetching tasks for CPU. To fix this, you can change Project Preferences on the project's web site.
Before that final change, they were repeated every time you contacted those servers - too much. Once at startup is enough.

Are these lines actually appearing like this (without timestamp) in your Event Log? They shouldn't be.

Einstein@Home: setting reason to BUFFER_FULL
GPUGRID: setting reason to BUFFER_FULL
Einstein@Home: setting reason to BUFFER_FULL
GPUGRID: setting reason to BUFFER_FULL
ID: 1973170 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1973173 - Posted: 3 Jan 2019, 10:01:28 UTC - in response to Message 1973134.  

Also I am picking up for some reason something about an ATI card library being missing. I don't have any ATI cards and I have never installed any ATI drivers.

Wed 02 Jan 2019 09:08:10 PM PST | | [coproc] NVIDIA library reports 3 GPUs
Wed 02 Jan 2019 09:08:10 PM PST | | [coproc] ATI: libaticalrt.so: cannot open shared object file: No such file or directory
That is absolutely normal. BOINC uses those library files (installed as part of the GPU drivers) to work out what GPUs are available and ready to run. It has found your NVIDIA driver, and used it to find your NVIDIA cards. It's just letting you know that it can't find an ATI driver, in case you might have installed a card but no driver.

Note that it's a <coproc_debug> message - most people wouldn't see it or be worried by it, unless they were deliberately trying to hunt down a problem.
ID: 1973173 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973233 - Posted: 3 Jan 2019, 16:43:55 UTC

Hi Richard, yes those BUFFER_Full messages are appearing in the log exactly as shown, without a timestamp.

Thanks for the explanation about the ati library. I guess I've never set coprocessor_debug before. I had never seen that before and panicked.

Any ideas or clues in the log I posted why I wasn't processing on the gpu:1 card even though there were tasks and the max_concurrent statements shouldn't have affected any of the projects and limited to a single card?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973233 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1973267 - Posted: 3 Jan 2019, 21:00:32 UTC - in response to Message 1973233.  

Hi Richard, yes those BUFFER_Full messages are appearing in the log exactly as shown, without a timestamp.
That's very odd. Those particular words, in that order, aren't found anywhere in the BOINC code base. How, exactly, are you retrieving those log messages? Please limit yourself as far as possible to copying from BOINC Manager or from a file on disk, to reduce the number of external influences we have to exclude.

Any ideas or clues in the log I posted why I wasn't processing on the gpu:1 card even though there were tasks and the max_concurrent statements shouldn't have affected any of the projects and limited to a single card?
The clues would be:

Wed 02 Jan 2019 09:08:10 PM PST | GPUGRID | acemdlong: Max 1 concurrent jobs
Wed 02 Jan 2019 09:08:10 PM PST | GPUGRID | acemdshort: Max 1 concurrent jobs

Wed 02 Jan 2019 09:08:10 PM PST | GPUGRID | [coproc] Assigning NVIDIA instance 0 to e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0
Wed 02 Jan 2019 09:08:10 PM PST | GPUGRID | [coproc] Assigning NVIDIA instance 1 to e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0
Wed 02 Jan 2019 09:08:10 PM PST | Einstein@Home | [coproc] Assigning NVIDIA instance 2 to LATeah2006L_1100.0_0_0.0_353313_1

Referring to https://www.gpugrid.net/results.php?hostid=495691&show_names=1, both those PABLO tasks come from the long queue, so instance 1 falls foul of the first max_concurrent test and will have been blocked, as shown later by

Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] skipping e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0; max concurrent limit 1 reached

cpu_sched_debug tried

Wed 02 Jan 2019 09:10:31 PM PST | | [cpu_sched_debug] preliminary job list:
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] 0: e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] 1: e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0 (MD: no; UTS: no)
Wed 02 Jan 2019 09:10:31 PM PST | Einstein@Home | [cpu_sched_debug] 2: LATeah2006L_1100.0_0_0.0_353313_1 (MD: no; UTS: yes)

but then threw out instance 1 to get

Wed 02 Jan 2019 09:10:31 PM PST | | [cpu_sched_debug] final job list:
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] 0: e10s10_e3s26p2f345-PABLO_V3_p27_isolated_IDP_IDP-1-4-RND9664_0 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:10:31 PM PST | Einstein@Home | [cpu_sched_debug] 1: LATeah2006L_1100.0_0_0.0_353313_1 (MD: no; UTS: yes)
Wed 02 Jan 2019 09:10:31 PM PST | GPUGRID | [cpu_sched_debug] 2: e14s30_e7s44p1f53-PABLO_V3_p27_sj403_IDP-0-4-RND2801_0 (MD: no; UTS: no)

- putting back the one it had just thrown out!

Did you actually have a second Einstein task available to take advantage of

Wed 02 Jan 2019 09:08:10 PM PST | Einstein@Home | hsgamma_FGRPB1G: Max 2 concurrent jobs?

It's very hard to work through all these details without having the whole broad picture spread out for consideration.
ID: 1973267 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1973269 - Posted: 3 Jan 2019, 21:26:11 UTC

Hi Richard,

Yes that post was copied directly from the Event Log generated by BOINC Manager. All I have is the stdoutdae.txt file in the directory. The earliest entry is a 20:40 startup. There is no stdoutdae.old file that would have the 17:40 log entries I copied from Event Log. I assume because the main log file never grew past 2MB. I don't know what happened to the earlier entries. I have done no erasing or editing of log files.

I have 96 Einstein task on that host and over a hundred at the time I posted about the dropping of a gpu card from computation.

I have 2 Long Runs and 1 Short Run GPUGrid tasks onboard and in progress.

I have 8 MilkyWay tasks on board and in progress.

So there was no shortage of work that could have been crunched on the third gpu even discounting the max 1 GPUGrid type work.

My app_config file limits.

Seti <project_max_concurrent>11</project_max_concurrent>

Einstein <max_concurrent>2</max_concurrent>

MikyWay <max_concurrent>2</max_concurrent>

GPUGrid Long <max_concurrent>1</max_concurrent>
Short <max_concurrent>1</max_concurrent>
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1973269 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1974227 - Posted: 9 Jan 2019, 4:13:59 UTC

Hi Richard, just wanted to ping this thread to see whether there is any further progress on #PR2918? Or has development stopped at the last commit by DA? Any response on my last observation that a gpu was excluded from computing with viable configuration and work that should have allowed it.

Any ideas when the pull request will be pulled into master?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1974227 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1974301 - Posted: 9 Jan 2019, 15:56:39 UTC - in response to Message 1974227.  

My position is still exactly the same as in my Catch-22 post. Once the code is merged into master, we're stuck with it. And although I have the power to merge it, I can't exercise that power irresponsibly.

And I won't exercise it until I'm convinced that the code is fit for purpose.

And that means testing. I've tested the key features as best I can on my more limited machines, but from what I read here (especially the post starting "I may have spoke too enthusiastically it seems. For some reason I am only running on two gpus out of three."), you weren't yet convinced.

We've been delayed by holidays and server problems, but we seem to be back in business now - after last night's spring clean, caches appear to have been refilled and 'ready to send' is positive, though not yet full. So I think now might be the right time to re-establish the test conditions that led to the opening of this thread, and get a clear 'Yay or Nay?' as to whether this code solves it.

If it works, please say so. If it doesn't, please supply all the conditions, logs, state files, and as much other evidence as possible to help us track down any remaining problems.

And remember that we're only looking at the problem identified in the title: "All CPU tasks not running." - so the critical event log flag needed in evidence is <cpu_sched_debug>. We've noticed work fetch issues along the way, but we can leave those for now and come back to them as a separate issue later.
ID: 1974301 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1974320 - Posted: 9 Jan 2019, 17:07:52 UTC - in response to Message 1974301.  

OK, I will put that commit branch client back into play on the host that had the original issue. I have full caches on all projects. I will use <cpu_sched_debug> in the log.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1974320 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1974377 - Posted: 9 Jan 2019, 23:17:38 UTC - in response to Message 1972842.  

So, it gets built, put out to the punters in what is still known (for historical reasons) as alpha testing, and, if nothing too drastic is seen, it gets released.
And then when a big bug hampers only one of the platforms, after a further round of testing and squishing that bug, the RM decides that a new release of the client for that platform only isn't needed. It's when I stopped being a tester and walked out of github. I doubt anyone has noticed.

And by then it's too late, the bugs are in the wild, and everyone has forgotten what they wrote six months, two years, and in the case of the addition of app_config.xml files, probably five years ago.
Talking about app_config.xml, have you noticed that it doesn't have a named entry anymore in the documentation, that it's now under "project level configuration"? Cos that's more important.
ID: 1974377 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1974416 - Posted: 10 Jan 2019, 1:45:16 UTC
Last modified: 10 Jan 2019, 2:42:07 UTC

OK, I just ran the commit for #PR2918 on the original host with the problem.

It solves the original problem of <gpu_exclude> with <project_max_concurrent> parameters set in cc_config.xml and app_config.xml respectively, preventing all cpu tasks from running.

Project max concurrent is limited to 16 Seti tasks. The MilkyWay, Einstein and GPUGrid max concurrent limits are obeyed.

It however introduces new problems. First it does not ask for any work when reporting. Instead it prints out "Not requesting tasks: at max_concurrent limit" for each work request.

Consequently, it never updates after each 305 second countdown. The only way to report finished work is to do a manual update.

I have uploaded a zip file containing the app_config, cc_config, app_info files plus the normal Event Log output after startup and through the first reporting of tasks after manual Update.

I also included the log entries when cpu_sched_debug is set.

The Google Drive link is:

https://drive.google.com/open?id=1Ib_BGpdTLUp57MwQBFSKdkIILVJIFjzH

What other files or information is needed to show the commit does not function correctly?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1974416 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1974478 - Posted: 10 Jan 2019, 10:00:38 UTC - in response to Message 1974416.  

If you've got that config set up, or can easily recreate it, could you please run the Event Log with both

<rr_simulation>
<work_fetch_debug>

for long enough to catch a SETI server contact - reporting work and (presumably not) requesting work. Then reset to normal - that will produce a humongous volume of output.

I'll look at the other files once the caffeine levels have risen a bit. If tasks are running at the correct level, then I can merge this particular PR as meeting its objective. Then David has a clean code base to do the second stage of work, addressing the new work fetch problem.
ID: 1974478 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1974483 - Posted: 10 Jan 2019, 10:44:52 UTC

Just seen David's emails. Leave him to me - the reply will probably contain words like "Then the design is bad", but I will try to contain myself: no point in getting him mad at both of us. There is also a conference call this evening (19:00 UTC) when this will get mentioned.
ID: 1974483 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1974547 - Posted: 10 Jan 2019, 18:51:45 UTC - in response to Message 1974478.  

I ran with rr_simulation and work_fetch_debug but since I don't have unlimited in the Event Log file size it had nothing useful in it as it was truncated. It does produce a huge amount of information.

Since the host would not run correctly with that client I reverted to the configuration that runs properly with no project_max_concurrent and cpu limited by Local Preferences.

I can revert back to the bad client with not too much trouble.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1974547 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1974549 - Posted: 10 Jan 2019, 19:06:15 UTC

OK. I am really confused. From David's reply it seems that <project_max_concurrent> is designed to only allow the project to process the N number of tasks and then quit?

That is not how the parameter is described in the documentation.

project_max_concurrent
A limit on the number of running jobs for this project.

It should NOT prevent the client from continuing to fetch replacement work to maintain the hosts cache.

It is not how it works on my earlier clients. What it allows me to do is not start a cpu task on every available cpu thread. That leads to overcommittal of cpu resources and much greater run_times versus actual computation cpu_times.

I like to run on on the physcial threads of a cpu by assigning affinity and priority with the schedtool utility.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1974549 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1974557 - Posted: 10 Jan 2019, 19:20:25 UTC

I also wanted to point out I run BOTH MB and AP apps. So based on your previous comments that max_concurrent cannot be used on apps within the same project, the only way to limit the total number of running tasks for Seti is to use the project_max_concurrent.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1974557 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Questions and Answers : Unix/Linux : All CPU tasks not running. Now all are: - "Waiting to run"


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.