Need help with "mb_cmdline-8.00_windows_intel_opencl_ati5_sah" settings

Message boards : Number crunching : Need help with "mb_cmdline-8.00_windows_intel_opencl_ati5_sah" settings
Message board moderation

To post messages, you must log in.

AuthorMessage
Stick Project Donor
Volunteer tester

Send message
Joined: 26 Feb 00
Posts: 100
Credit: 5,283,449
RAC: 5
United States
Message 1766338 - Posted: 19 Feb 2016, 23:27:57 UTC

Task 4729691359 on Computer 3049387 is consistently causing my display driver to stop responding and then hangs up the task. It's a problem that this computer has occasionally had in the past and, until now, has been solved by tinkering with -sbs and -period_iterations_num values in the appropriate cmdline file. (I first reported the problem over a year ago in Message 1593206 and received advice relating to these settings at that time.)

But on this task, I have tried -sbs values ranging from 1 to 256 and -period_iterations_num values from 100 to 6400 and the task has always hungup at the same point - 81.078%.

I have 2 questions: 1/ What are the practical limits and range of settings for -sbs and -period_iterations_num values? and 2/ Are there any other cmdline settings I should try tinkering with?


BTW: If anyone reads my original post and its responses and wonders if I might have screwed up again adjusting these settings, I can assure you that, this time, I am doing it correctly. How do I know? Because Task 4730351237 was having similar problems which I was able to solve by adjusting -sbs and -period_iterations_num values and its Stderr output file documents those adjustments.
ID: 1766338 · Report as offensive
Stick Project Donor
Volunteer tester

Send message
Joined: 26 Feb 00
Posts: 100
Credit: 5,283,449
RAC: 5
United States
Message 1768203 - Posted: 27 Feb 2016, 21:57:52 UTC

I've pretty much concluded that certain calculations required to analyze Task 4729691359 are beyond the capability of my GPU. Since I posted the first message, I experimented a little more with -sbs and -period_iterations_num values. I also experimented a little with -spike_fft_thresh and -no_caching. And I also changed to the equivalent Lunatics app and tried tweaking all these settings again with it. All to no avail.

Then I downloaded Task 4744139436 and pretty quickly ran into the same problem.

Last year, problem WU's like these were pretty rare and were almost always fixable by tweaking -sbs, etc. Now it seems like most WU's have this problem and a lot of them can't be fixed. I can think of 3 big differences from last year. 1/ I updated from Win 7 to Win 10; 2/ Seti updated to MB 8.0 from MB 7; and, 3/ Seti has expanded from Arecibo to multiple sources and a wider sky search. I wonder which of these factors is most responsible for the change.
ID: 1768203 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1768404 - Posted: 28 Feb 2016, 18:53:32 UTC - in response to Message 1768203.  

I think main change is f..ed Windows 10 and its drivers.
I had no issues with HD6950 set to period_iterations_num 1 (!) for years under Vista x86.
Now I evaluating Win10 x64 - VLAR task causes driver restart even with 500 iterations....

I would recommend just to disable that damned watchdog in Windows registry if available under Win10. Will post how to if find it for own host.
ID: 1768404 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1768407 - Posted: 28 Feb 2016, 19:10:06 UTC - in response to Message 1768404.  

I think main change is f..ed Windows 10 and its drivers.
I had no issues with HD6950 set to period_iterations_num 1 (!) for years under Vista x86.
Now I evaluating Win10 x64 - VLAR task causes driver restart even with 500 iterations....

I would recommend just to disable that damned watchdog in Windows registry if available under Win10. Will post how to if find it for own host.

I think the registry values you are wanting to adjust are TdrDelay & TdrLevel.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1768407 · Report as offensive
Stick Project Donor
Volunteer tester

Send message
Joined: 26 Feb 00
Posts: 100
Credit: 5,283,449
RAC: 5
United States
Message 1768458 - Posted: 28 Feb 2016, 22:14:56 UTC - in response to Message 1768404.  

I would recommend just to disable that damned watchdog in Windows registry if available under Win10. Will post how to find it for own host.

I forgot to mention that I had tried that as well (after rereading the advice HAL9000 had given in reply to my original post last year). I didn't disable it but tried extending the delay by a factor of ten or more. It had no effect on how far the calculations went before stopping - it only extended the time before screen flicker and recovery. As for -period_iterations_num, I tried values up into the thousands before giving up.

FYI: I aborted both tasks today and reinstalled Lunatics - this time without the GPU option for MB 8.0.
ID: 1768458 · Report as offensive
Stick Project Donor
Volunteer tester

Send message
Joined: 26 Feb 00
Posts: 100
Credit: 5,283,449
RAC: 5
United States
Message 1768502 - Posted: 29 Feb 2016, 1:15:08 UTC
Last modified: 29 Feb 2016, 1:17:41 UTC

First of all, I want to make this very clear: I might know how to read Stderr output files but I do NOT pretend to understand what I am reading. That being said, after I aborted the tasks I talked about in the previous posts here, I read through their Stderr output files and something doesn't seem to add up.

The most obvious thing was the fact, that in both files and in all of the entries reflecting the numerous task restarts, none indicate that -sbs or -period_iterations_num ever changed. That is, every entry states "Single buffer allocation size: 128MB" and "period_iterations_num=50".

This made me wonder if I had somehow screwed up again. I thought I had been setting the proper cmdline file with a numerical variations of "-sbs 128 -period_iterations_num 100" each time I restarted. Could I have been adjusting the wrong file? Or, did I screw up on the format again? (I really don't think so.)

Then I noticed this, near the bottom of both Stderr files and directly underneath the last entry of "period_iterations_num=50": "Not using mb_cmdline.txt-file, using commandline options".

I would think that, if I were adjusting the wrong file, this "Not using mb_cmdline.txt-file, using commandline options" message would come after every restart, not just after one of them. And, I would think that if I had used the wrong format, a different message might be generated.

But, as I said at the beginning, I do NOT pretend to understand what I have read. I hope someone who does understand can explain.
ID: 1768502 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1768569 - Posted: 29 Feb 2016, 10:23:16 UTC - in response to Message 1768502.  

If app receive valid command from user (via any of possible ways) it report about it in stderr.
Hence, your commands were invalid ones. Perhaps you edited wrong Mb_cmdline* file, there are few of them - each for different plan class's app.
ID: 1768569 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1768682 - Posted: 29 Feb 2016, 21:56:48 UTC - in response to Message 1768502.  

First of all, I want to make this very clear: I might know how to read Stderr output files but I do NOT pretend to understand what I am reading. That being said, after I aborted the tasks I talked about in the previous posts here, I read through their Stderr output files and something doesn't seem to add up.

The most obvious thing was the fact, that in both files and in all of the entries reflecting the numerous task restarts, none indicate that -sbs or -period_iterations_num ever changed. That is, every entry states "Single buffer allocation size: 128MB" and "period_iterations_num=50".

This made me wonder if I had somehow screwed up again. I thought I had been setting the proper cmdline file with a numerical variations of "-sbs 128 -period_iterations_num 100" each time I restarted. Could I have been adjusting the wrong file? Or, did I screw up on the format again? (I really don't think so.)

Then I noticed this, near the bottom of both Stderr files and directly underneath the last entry of "period_iterations_num=50": "Not using mb_cmdline.txt-file, using commandline options".

I would think that, if I were adjusting the wrong file, this "Not using mb_cmdline.txt-file, using commandline options" message would come after every restart, not just after one of them. And, I would think that if I had used the wrong format, a different message might be generated.

But, as I said at the beginning, I do NOT pretend to understand what I have read. I hope someone who does understand can explain.

Which driver release are you using? I just noticed your OpenCL version is 1800.11, but I don't know what driver that belongs to. I have seen a few others that have had issues with the same OpenCL version. It could be unrelated, but figured I'd mention it.
Cat 15.7 = 1800.5
Cat 15.7.1 = 1800.8
Cat 15.11 = 1912.5
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1768682 · Report as offensive
rics

Send message
Joined: 20 May 08
Posts: 1
Credit: 6,604,190
RAC: 5
Japan
Message 1768787 - Posted: 1 Mar 2016, 7:22:34 UTC - in response to Message 1768682.  

It is probably the driver provided by Windows Update of Windows10.

Driver version 15.201.1151-150821a
Catalyst version 2015.0821.2209.38003 (CAT 15.8?)
ID: 1768787 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1768802 - Posted: 1 Mar 2016, 8:58:15 UTC

Another possibility to deal with this issue - to use binary kernels generated for this GPU model under different driver/OS.
This was used already to deal with broken AMD drivers before.
ID: 1768802 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1768836 - Posted: 1 Mar 2016, 16:26:52 UTC - in response to Message 1768787.  

It is probably the driver provided by Windows Update of Windows10.

Driver version 15.201.1151-150821a
Catalyst version 2015.0821.2209.38003 (CAT 15.8?)

That is probably where it is coming from, but I didn't want to dig through the Windows Update Catalog to find it.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1768836 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1768837 - Posted: 1 Mar 2016, 16:38:15 UTC - in response to Message 1768802.  

Another possibility to deal with this issue - to use binary kernels generated for this GPU model under different driver/OS.
This was used already to deal with broken AMD drivers before.

Should removing VM & Wisdom files be one of the first steps in troubleshooting? Wisdom files take sometime to be rebuilt so maybe just VM files?
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1768837 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1768839 - Posted: 1 Mar 2016, 16:39:43 UTC - in response to Message 1768837.  

what VM ??
files *.bin* should be replaced with taken from working driver version (renamed to the names original one have). Wisdom file can be leaved as is.
ID: 1768839 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1768843 - Posted: 1 Mar 2016, 17:03:20 UTC - in response to Message 1768839.  

what VM ??
files *.bin* should be replaced with taken from working driver version (renamed to the names original one have). Wisdom file can be leaved as is.

Sorry, my brain likes to think of them as VM file rather than .bin because of the way they end. But these files:
MultiBeam_Kernels_r3330.clHD5_Hawaii.bin_V7_19125VM
MB_clFFTplan_Hawaii_8_gr64_lr16_wg256_tw0_ls1024_bn16_cw16_r3330.bin_19125VM
...
MB_clFFTplan_Hawaii_524288_gr64_lr16_wg256_tw0_ls1024_bn16_cw16_r3330.bin_19125VM

Also I just noticed MultiBeam_Kernels_r3330.clHD5_Hawaii.bin_V7_19125VM. Is that meant to be MB version or something different?
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1768843 · Report as offensive
Stick Project Donor
Volunteer tester

Send message
Joined: 26 Feb 00
Posts: 100
Credit: 5,283,449
RAC: 5
United States
Message 1768865 - Posted: 1 Mar 2016, 23:05:47 UTC - in response to Message 1768787.  

It is probably the driver provided by Windows Update of Windows10.

Driver version 15.201.1151-150821a
Catalyst version 2015.0821.2209.38003 (CAT 15.8?)

I had updated to Catalyst 15.8 while still using Windows 7. And that setup was working fine (using cmdline settings). The problems definitely started after updating to Windows 10. And, come to think of it, really started last month after a rash of problems installing some Windows 10 updates. (I gave up on trying "MS fixit, etc." and finally just reinstalled Windows 10 to get going again.) Not sure I am up to trying the .bin file approach but I will definitely keep checking the AMD website for updates.

If app receive valid command from user (via any of possible ways) it report about it in stderr.
Hence, your commands were invalid ones. Perhaps you edited wrong Mb_cmdline* file, there are few of them - each for different plan class's app.

It's possible that I picked the wrong cmdline file after I installed Lunatics. But I am pretty sure I was doing it correctly while I was running the standard MB 8.0 app (see the thread title). Was the Stderr file erased/reset by the change to Lunatics?
ID: 1768865 · Report as offensive

Message boards : Number crunching : Need help with "mb_cmdline-8.00_windows_intel_opencl_ati5_sah" settings


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.