BOINC assigns device X - Problem

Message boards : Number crunching : BOINC assigns device X - Problem
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1797502 - Posted: 20 Jun 2016, 5:33:20 UTC

I remember that originally I could tell which graphics card a task ran on by looking at the summary of each task for this entry:
BOINC assigns device 0
I did some of my early troubleshooting with this information, but now, every task says device 0. Maybe I messed something up when I tried to manually update to a new app version from Lunatics. Any recommendation on what could be wrong?

It is this computer:http://setiathome.berkeley.edu/show_host_detail.php?hostid=7953787

Thanks!
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1797502 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1797526 - Posted: 20 Jun 2016, 8:12:20 UTC - in response to Message 1797502.  

Most probably - new BOINC version that dropped command line --device N parameter support.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1797526 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1797530 - Posted: 20 Jun 2016, 8:53:53 UTC - in response to Message 1797526.  

Most probably - new BOINC version that dropped command line --device N parameter support.


I have not changed BOINC. Only major changes were to install Lunatics 0.44 using installer and later I tried to manually install MB app 3401. I messed things up and finally got it running again with 0.44 apps. I looked at stderr from another Fury user:
Stderr output
<core_client_version>7.6.32</core_client_version>
<![CDATA[
<stderr_txt>
Running on device number: 3
Defaults scaling is disabled, basic defaults will be used. Tuning on user's discretion.
Number of app instances per device set to:1


Difference is that mine is missing the line:
Running on device number: x
Also, I am running client 7.6.22, where the sample above it 7.6.32.

The output above is from rev 3430, where mine our 3330. Could it be an issue with the app?
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1797530 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1797531 - Posted: 20 Jun 2016, 9:00:05 UTC - in response to Message 1797530.  
Last modified: 20 Jun 2016, 9:00:33 UTC

Difference is that mine is missing the line:
Running on device number: x
Also, I am running client 7.6.22, where the sample above it 7.6.32.

The output above is from rev 3430, where mine our 3330. Could it be an issue with the app?

I'd say the application, as it is responsible for what is written in Stderr_ouput, however I don't know where the application gets the GPU information from; the system, BOINC or bits of both?
Grant
Darwin NT
ID: 1797531 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1797533 - Posted: 20 Jun 2016, 9:28:41 UTC - in response to Message 1797531.  

Difference is that mine is missing the line:
Running on device number: x
Also, I am running client 7.6.22, where the sample above it 7.6.32.

The output above is from rev 3430, where mine our 3330. Could it be an issue with the app?

I'd say the application, as it is responsible for what is written in Stderr_ouput, however I don't know where the application gets the GPU information from; the system, BOINC or bits of both?

It's the role of BOINC to 'schedule' the GPU applications - give instructions for which task/application pairing should be run on which GPU. BOINC gave that instruction via a command line in the early versions (before OpenCL was fully supported), but we should have switched to reading init_data.xml several years ago - around December 2012, with v7.0.40: there hasn't been any change in that procedure in recent versions, so far as I can tell.

The only other possibility is that RueiKe has introduced a --device 0 directive into one or other of the many places where manual configuration is possible via a manual edit.
ID: 1797533 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3988
Credit: 85,281,665
RAC: 126
Finland
Message 1797548 - Posted: 20 Jun 2016, 11:40:08 UTC

I have the problem. Here http://setiathome.berkeley.edu/result.php?resultid=4994650280 is a task that was run today. It has the line
BOINC assigns device 0
but is missing the line
Running on device number: 0
And it was actually running on device 1 based on the reported compute units (4).

This machine has two GPUs, device 0 = GTX970 and device 1 is GTX650Ti. The application used was SoG r3472 from Lunatics 0.45b3. Boinc is 7.6.22 on Win7 64 bit.
ID: 1797548 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1797549 - Posted: 20 Jun 2016, 11:54:23 UTC - in response to Message 1797533.  

The only other possibility is that RueiKe has introduced a --device 0 directive into one or other of the many places where manual configuration is possible via a manual edit.


I have checked the following places:

mb_cmdline_win_x86_SSE2_OpenCL_ATi_HD5.txt
mb_cmdline_win_x86_SSE2_OpenCL_ATi.txt
both have this entry:
-instances_per_device 1 -no_cpu_lock -sbs 975 -period_iterations_num 4 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp


app_config.html
<app_config>
<app>
<name>setiathome_v8</name>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1.6666</cpu_usage>
</gpu_versions>
</app>
<app>
<name>astropulse_v7</name>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>1.6666</cpu_usage>
</gpu_versions>
</app>
</app_config>

cc_config.xml
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

I have used grep in almost all directories for \-device and found no suspicious occurrences.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1797549 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1797579 - Posted: 20 Jun 2016, 15:48:17 UTC - in response to Message 1797549.  

remove -no_cpu_lock - will it change anything?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1797579 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1797583 - Posted: 20 Jun 2016, 16:28:02 UTC

A quick look at your Inconclusives back May 7 it looks like the same thing.

May 5 you have an AP task that does show running on device 0.

Maybe your just thinking of different reporting with different apps.
ID: 1797583 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1797663 - Posted: 21 Jun 2016, 0:51:03 UTC - in response to Message 1797579.  

remove -no_cpu_lock - will it change anything?


I have removed it and do see a difference, though still not correct:

With -no_cpu_lock
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used

Without:
BOINC assigns device 0
1 slot of 64 used for this instance
Info: BOINC provided OpenCL device ID used

I see the slot number differing between tasks but device is still always 0. Tasks are taking much longer now, so I need to add back -no_cpu_lock.

I think 3430 is supposed to fix the issue with not using -no_cpu_lock so maybe I should try to upgrade to those apps.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1797663 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1797664 - Posted: 21 Jun 2016, 0:53:03 UTC - in response to Message 1797583.  

A quick look at your Inconclusives back May 7 it looks like the same thing.

May 5 you have an AP task that does show running on device 0.

Maybe your just thinking of different reporting with different apps.


This has been an issue for a while. I think I caused it when I attempted to install 3401 apps, but it could have happened when I installed Lunatics and just didn't notice.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1797664 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1797666 - Posted: 21 Jun 2016, 1:11:14 UTC - in response to Message 1797664.  
Last modified: 21 Jun 2016, 1:13:11 UTC

May 7 you were using r3330 and it has the same output as now.

EDIT: Scratch that comment, you still have r3330.
ID: 1797666 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1797702 - Posted: 21 Jun 2016, 4:03:57 UTC - in response to Message 1797668.  

I have not added any option that says anything about what device number should be used for which device, but despite of that the commandline which can be seen in the Task Manager/Processes tab, always have "--device 0", after the filename.


Thanks for this! I used task manager on two systems. My main desktop which just runs standard apps has --device 0 shown in the command line. The system in question doesn't have anything specified in the command line after the executable.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1797702 · Report as offensive
Profile [DPC] hansR Project Donor
Volunteer tester
Avatar

Send message
Joined: 14 Jul 00
Posts: 47
Credit: 235,829,569
RAC: 8
Netherlands
Message 1797712 - Posted: 21 Jun 2016, 5:02:07 UTC - in response to Message 1797668.  

I think you should this as: iGPU - device 0 as being the first device of type iGPU and NVIDIA - device 0 as being the first device of type NVIDIA
ID: 1797712 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1797824 - Posted: 22 Jun 2016, 0:35:02 UTC

I have upgraded to the r3430 MB app which doesn't require -no_cpu_lock to perform well, so I have now removed the option. I am getting slot numbers in the output but device is still always 0.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1797824 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1797911 - Posted: 22 Jun 2016, 13:15:33 UTC

I just verified that AP tasks show the proposer device number:
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
Running on device number: 2
CPU affinity adjustment enabled, fixed CPU 6 will be used
Number of app instances per device set to:1
Maximum single buffer size set to:768MB
Priority of worker thread raised successfully
Priority of process adjusted successfully, high priority class used
OpenCL platform detected: Advanced Micro Devices, Inc.
BOINC assigns device 2
Info: BOINC provided OpenCL device ID used

Does this give a clue to the source of the problem?
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1797911 · Report as offensive
Profile RueiKe Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 14 Feb 16
Posts: 492
Credit: 378,512,430
RAC: 785
Taiwan
Message 1813334 - Posted: 29 Aug 2016, 2:53:06 UTC

I finally figured this out. I did a nearly clean install of BOINC by uninstalling BOINC and Lunatics, and removing the project directory. I then installed BOINC. It came up working fine, but my command line options were missing. I added them back and the problem returned. I found that putting a space after the last item in the command line fixed the problem.
GitHub: Ricks-Lab
Instagram: ricks_labs
ID: 1813334 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1813352 - Posted: 29 Aug 2016, 5:05:59 UTC - in response to Message 1813334.  
Last modified: 29 Aug 2016, 5:06:45 UTC

I finally figured this out. I did a nearly clean install of BOINC by uninstalling BOINC and Lunatics, and removing the project directory. I then installed BOINC. It came up working fine, but my command line options were missing. I added them back and the problem returned. I found that putting a space after the last item in the command line fixed the problem.


I added the command line settings not long after using the Lunatics Installer Beta v4 to install the current SoG application. I don't remember if i checked the Stderr_output before adding the command line settings, but I did notice that it always reports
BOINC assigns device 0
regardless of which device it is when I did check them.

I've just put a space at the end of my command line settings in mb_cmdline_win_x86_SSE3_OpenCL_NV_SoG.txt
We'll see if that lets the correct device number be reported.
Grant
Darwin NT
ID: 1813352 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1813359 - Posted: 29 Aug 2016, 5:39:43 UTC

I just noticed this happening today with SoG r3500. I've normally been running Cuda50 on my new host 8064262 which has 4 GTX 960s. However, I took some time today to switch a handful of tasks to SoG, intermixed with the Cuda tasks so I could get some comparisons between the two apps for tasks that were split from the same files and had matching ARs. I realized later, when I was pulling the results to put into my spreadsheet, that while the matching Cuda tasks showed as being distributed across all 4 GPUs, every one of the SoG tasks (all 11 of them) showed "BOINC assigns device 0". Pretty much a statistical impossibility, I think, although I didn't actually watch the tasks run, so I didn't see the true device numbers that each task really ran on.

Just to be sure though, once I saw this thread resurface this evening, I went and swapped another group of four tasks over to SoG and this time recorded which GPU got which task, as follows:

Task 5123149033 - Device 0
Task 5123193201 - Device 1
Task 5123193448 - Device 1
Task 5123193447 - Device 3

Here's a screenshot of the properties for that last one:


The Stderr for every one shows "BOINC assigns device 0", making it impossible to know which GPU the task actually ran on. That makes it difficult to accurately match test results across app, GPU, and AR. Perhaps not such a big deal on this box, although even with 4 GTX 960s, there are 3 different clock speeds involved. However, if I wanted to do something similar on my host with a GTX 670, GTX 780, and GTX 960, I'd have to pay attention to the tasks at some point while they're actually running, and not just pick up the results later. This would also be a long-term problem if it ever became necessary to identify a specific GPU when one starts to cough up hairballs.

Today's testing was run plain vanilla, with no command line parameters for the SoG tasks. The host is on BOINC 7.6.22.

Has anybody done anything to look into this problem?
ID: 1813359 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1813375 - Posted: 29 Aug 2016, 7:11:02 UTC - in response to Message 1813359.  


Today's testing was run plain vanilla, with no command line parameters for the SoG tasks. The host is on BOINC 7.6.22.

put some option into command line and place spacebar after it or just put some spaces into cmd line - will it help?
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1813375 · Report as offensive
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : BOINC assigns device X - Problem


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.