SETI@home v8.12 Windows GPU applications support thread

Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1814981 - Posted: 4 Sep 2016, 9:26:54 UTC - in response to Message 1814980.  
Last modified: 4 Sep 2016, 9:27:47 UTC

the GPU temperature is still 72°C

TThrottle will let you limit the GPU temperature (by automatically making small (ms) pauses in the app computing threads if temperature reaches the limit set by you)

Actually temperature regulation for GPU is open field AFAIK.
Making pauses in CPU feeding thread can't be enough for GPU throttling in case of long async kernel chains. And VHAR + NV SoG app combination provide dozens seconds long such chains.
More robust method would be just decrease GPU frequency and then - voltage.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1814981 · Report as offensive
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 1815035 - Posted: 4 Sep 2016, 17:50:20 UTC

I'm now seeing a GPU temperature of 62 C, but only if I leave Afterburner running from the console. Afterburner does not seem to run properly in the background, at least for the fan profile portion.
ID: 1815035 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1815044 - Posted: 4 Sep 2016, 18:21:01 UTC - in response to Message 1815035.  

I might suggest you try SIV for your fan control. SIV is mainly a monitoring tool but it also can control a variety of hardware too. Kinda a Swiss Army Knife application. If you invoke it with the -GPUCTL parameter, you get a nice 6 point fan control which allows you to set fan speed curves against graphic card temperatures. After starting it with the -GPUCTL parameter, you will find the fan control under [Machine] > [GPU Points]. I just leave it running on the Desktop all the time and have it autostarted. Give it a look, you might become a fan as I am.

SIV64
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1815044 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1815142 - Posted: 5 Sep 2016, 6:59:32 UTC - in response to Message 1815035.  

I'm now seeing a GPU temperature of 62 C, but only if I leave Afterburner running from the console. Afterburner does not seem to run properly in the background, at least for the fan profile portion.


. . I have it running constantly in the background and pop up the graph display from time to time to check things. So the fan control is continuous. But there is a little trick to it. When you have set things as you want click the detach button on the graph part of the GUI interface and the graphs will display in their own separate window, now right click on the left hand edge of the top bar of this window and a menu will appear where one option is "Always on top". Untick this option then you can minimise that window. Minimise the main Afterburner window and it will just be an icon on the task bar.

Stephen
ID: 1815142 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1815193 - Posted: 5 Sep 2016, 15:55:10 UTC - in response to Message 1815142.  

Sounds like a neat trick with Afterburner and that you got it to do what you want.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1815193 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1815266 - Posted: 5 Sep 2016, 23:12:31 UTC - in response to Message 1815193.  

Sounds like a neat trick with Afterburner and that you got it to do what you want.


Hi Keith,

. . Yep, I am pretty happy with the way afterburner is working for me. I have it on all three rigs and it is a big part how I measure the "success" of any changes I do to those rigs. It tells me a lot.

Stephen

.
ID: 1815266 · Report as offensive
Garry

Send message
Joined: 7 Jul 02
Posts: 40
Credit: 535,102
RAC: 1
United States
Message 1815289 - Posted: 6 Sep 2016, 1:23:25 UTC

Continuing a thread at http://setiathome.berkeley.edu/forum_thread.php?id=80174&postid=1815198; Raistmer asked me to bring it here. Glad to.


I scanned the Event Log. It includes an entry there for downloading the subject task. No later entry looks like a driver restart, unless I don't understand the wording of one of those.


System Event log, not BOINC's one.


Still not sure I have the right log. I found BOINC/stdoutdae.txt. Same thing as I said before: It includes an entry there for downloading the subject task. No later entry looks like a driver restart, unless I don't understand the wording of one of those.


Before the above, the contents of mb_cmdline-8.12_windows_intel__opencl_ati5_SoG.txt was "-sbs 192 -period_iterations_num 100". Unless I hear another recommendation from you, I'll drop the 100 to 10, see what happens, and maybe build up slowly from there.

I wrote that default is 500. You run at 100. And now you want to go to 10? Wrong direction.

(Blush.) My mistake. I've set it for closer to 250 and resumed. Keyboard latency is sometimes long, but I'll keep experimenting and watching the time data, too.

Thanks for being so kind to the inexperienced. I'm on some sites where the experts make the welcome a little thin. This is better.
ID: 1815289 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1815298 - Posted: 6 Sep 2016, 2:33:40 UTC - in response to Message 1815289.  
Last modified: 6 Sep 2016, 2:34:27 UTC

Continuing a thread at http://setiathome.berkeley.edu/forum_thread.php?id=80174&postid=1815198; Raistmer asked me to bring it here. Glad to.


I scanned the Event Log. It includes an entry there for downloading the subject task. No later entry looks like a driver restart, unless I don't understand the wording of one of those.


System Event log, not BOINC's one.


Still not sure I have the right log. I found BOINC/stdoutdae.txt. Same thing as I said before: It includes an entry there for downloading the subject task. No later entry looks like a driver restart, unless I don't understand the wording of one of those.

The "Event Log" being eluded too is in the "Windows Event Viewer". Expand "Windows Logs" in the left plain, then highlight 1 at a time both "Applications" and "System" and check each for listed errors. ;-)

Cheers.
ID: 1815298 · Report as offensive
Garry

Send message
Joined: 7 Jul 02
Posts: 40
Credit: 535,102
RAC: 1
United States
Message 1815302 - Posted: 6 Sep 2016, 3:43:48 UTC - in response to Message 1815298.  

Continuing a thread at http://setiathome.berkeley.edu/forum_thread.php?id=80174&postid=1815198; Raistmer asked me to bring it here. Glad to.

Still not sure I have the right log. I found BOINC/stdoutdae.txt. Same thing as I said before: It includes an entry there for downloading the subject task. No later entry looks like a driver restart, unless I don't understand the wording of one of those.

The "Event Log" being eluded too is in the "Windows Event Viewer". Expand "Windows Logs" in the left plain, then highlight 1 at a time both "Applications" and "System" and check each for listed errors. ;-)

Cheers.


Thanks kindly for the assist, Wiggo! (You and I must be similar in age. I have more hair on top, but less in the beard. I don't know which is better!)

Raistmer or others: I found the log Wiggo pointed me to. I think Raistmer was asking about drivers stopping. If he means the display driver, my logs tell me I've had 290 of those events (ID 4101) since 1/25/2016. Two were today, one morning and one evening. The evening event was associated with a three-minute interruption of the computer. Something took exclusive hold of the computer for that time before returning processing to the OS. As part of the computer recovery, I got the dialog box saying the display driver had crashed and recovered.

I consider it reasonable that I've had 290 display driver crashes in that time. I can't estimate the portion of them tied to BOINC processing. Even tonight's: I don't have evidence whether it was BOINC or something else that grabbed the computer.
ID: 1815302 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1815342 - Posted: 6 Sep 2016, 9:06:41 UTC - in response to Message 1815302.  


I consider it reasonable that I've had 290 display driver crashes in that time. I can't estimate the portion of them tied to BOINC processing.

Even if driver restarted for some non-BOINC reason (hardly so), effect on SETI MultiBeam OpenCL app will be just the same - it will hang with zero CPU consumption and zero progress over time until restarted.
More probably it's the OpenCL app's kernels that cause driver restarts by OS.

Try to remove all modificators/options from command line of MB app, let it run with default config. Will driver restarts remain?
If so, post again and we continue troubleshooting (with posts please provide links to particular results).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1815342 · Report as offensive
Garry

Send message
Joined: 7 Jul 02
Posts: 40
Credit: 535,102
RAC: 1
United States
Message 1815404 - Posted: 6 Sep 2016, 20:42:18 UTC - in response to Message 1815342.  

Raistmer:

I consider it reasonable that I've had 290 display driver crashes in that time. I can't estimate the portion of them tied to BOINC processing.

Even if driver restarted for some non-BOINC reason (hardly so), effect on SETI MultiBeam OpenCL app will be just the same - it will hang with zero CPU consumption and zero progress over time until restarted.
More probably it's the OpenCL app's kernels that cause driver restarts by OS.

Try to remove all modificators/options from command line of MB app, let it run with default config. Will driver restarts remain?
If so, post again and we continue troubleshooting (with posts please provide links to particular results).

By "command line of MB app", do you mean the five files BOINC\projects\setiathome.berkeley.edu\mb_cmdline-8.12_windows_intel*.txt? If so, I'll say all the 290 faults that occurred before around Aug 24 (http://setiathome.berkeley.edu/forum_thread.php?id=80174&postid=1812193) are the answer. Before that, all five files had zero length; I'm guessing that's running "with default config".

About that time, Mike (http://setiathome.berkeley.edu/show_user.php?userid=9826) told me how to experiment with values for -period_iterations_num. Now that I know I can pause the S@H task, change the file, and collect valid information, I'll collect data faster. I didn't get that from Mike; I wish I'd asked.
ID: 1815404 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1815446 - Posted: 6 Sep 2016, 22:03:33 UTC - in response to Message 1815404.  

Raistmer:

I consider it reasonable that I've had 290 display driver crashes in that time. I can't estimate the portion of them tied to BOINC processing.

Even if driver restarted for some non-BOINC reason (hardly so), effect on SETI MultiBeam OpenCL app will be just the same - it will hang with zero CPU consumption and zero progress over time until restarted.
More probably it's the OpenCL app's kernels that cause driver restarts by OS.

Try to remove all modificators/options from command line of MB app, let it run with default config. Will driver restarts remain?
If so, post again and we continue troubleshooting (with posts please provide links to particular results).

By "command line of MB app", do you mean the five files BOINC\projects\setiathome.berkeley.edu\mb_cmdline-8.12_windows_intel*.txt? If so, I'll say all the 290 faults that occurred before around Aug 24 (http://setiathome.berkeley.edu/forum_thread.php?id=80174&postid=1812193) are the answer. Before that, all five files had zero length; I'm guessing that's running "with default config".

About that time, Mike (http://setiathome.berkeley.edu/show_user.php?userid=9826) told me how to experiment with values for -period_iterations_num. Now that I know I can pause the S@H task, change the file, and collect valid information, I'll collect data faster. I didn't get that from Mike; I wish I'd asked.

yes, that's "command line" too.
Clear the log and watch if you will have new driver restarts with -period_iterations_num 500
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1815446 · Report as offensive
Garry

Send message
Joined: 7 Jul 02
Posts: 40
Credit: 535,102
RAC: 1
United States
Message 1815952 - Posted: 9 Sep 2016, 17:41:01 UTC - in response to Message 1815446.  

Clear the log and watch if you will have new driver restarts with -period_iterations_num 500


I had no display driver crashes Mon 9/5 to today (Thu 9/9). This morning, I had two. I was processing a task from opencl_ati5_cat132 (task https://setiathome.berkeley.edu/result.php?resultid=5145532737; work unit https://setiathome.berkeley.edu/workunit.php?wuid=2259686777). When the task arrived, mb*opencl_ati5_SoG.txt was
-sbs 192 -period_iterations_num 406

I had a freeze of a over a minute, apparently including a driver crash. I changed the 406 to 442, thinking I'd consider 500 an outside upper bound.

Within 20 minutes, I had another freeze of a over a minute, again including a driver crash. I changed the 442 to 787, thinking I'll need to consider 1000 an upper bound and hoping to come down again from the current value. That task continues to run, now with intermittent slight latency (a big improvement!)
ID: 1815952 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1815959 - Posted: 9 Sep 2016, 17:59:50 UTC - in response to Message 1815952.  

Another thing to try is to change driver for some other version.
Some of AMD drivers though working generate extremely bad binary for PulseFind kernel (excessive register spilling).
Your APU should be faster than my C-60 and on C-60 I can use much lower values w/o driver restart...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1815959 · Report as offensive
Garry

Send message
Joined: 7 Jul 02
Posts: 40
Credit: 535,102
RAC: 1
United States
Message 1816147 - Posted: 10 Sep 2016, 14:39:08 UTC - in response to Message 1815959.  

Another thing to try is to change driver for some other version.

The AMD site ran a piece of code on my computer and reported I already have the latest driver. Sigh. Good idea.

I'm kinda thinking I'm making an error somewhere that we haven't found yet.

I suspended BOINC. The mouse and keyboard latency entirely went away.

I resumed BOINC and suspended S@H. The mouse and keyboard latency entirely went away.

I resumed S@H. The mouse and keyboard latency returned.

The new file I'm processing (for another minute) is for opencl_ati5_cat132.

Is mb_cmdline-8.12_windows_intel__opencl_ati5_sah.txt the right file to control that app?

I'll guess not: I have it set to -sbs 192 -period_iterations_num 3000 and I'm still getting long mouse and keyboard latency.

If that's the right file, what other error (maybe a simple one) could I be making?

After I change that file, can I get the app to read it by using BOINC Manager >> Activity >> Suspend, wait something like 10 seconds, and BOINC Manager >> Activity >> Resume?

The next task is for opencl_ati5_SoG_cat132. Will the right file to control it be mb_cmdline-8.12_windows_intel__opencl_ati5_SoG.txt?

That file is
task https://setiathome.berkeley.edu/result.php?resultid=5147003185
work unit https://setiathome.berkeley.edu/workunit.php?wuid=2260374396

Thanks. This is a small pain ...
ID: 1816147 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1816170 - Posted: 10 Sep 2016, 17:17:21 UTC - in response to Message 1816147.  

best way to ensure app received your options is to look into stderr.txt file in corresponding slot folder under BOINC data directory.
App informs about every setting it understood and implemented.
And if even set to 3000 (actually, too big to be useful) did not help then I'll propose just to set GPU to suspend in case of user activity. Some other kernel causes delay, not Partial PulseFind.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1816170 · Report as offensive
Garry

Send message
Joined: 7 Jul 02
Posts: 40
Credit: 535,102
RAC: 1
United States
Message 1816268 - Posted: 11 Sep 2016, 1:34:04 UTC - in response to Message 1816170.  

best way to ensure app received your options is to look into stderr.txt file in corresponding slot folder under BOINC data directory.

Thanks! Good info. I have three such slot directories. Other projects are using slots 0 and 2; S@H has slot 1.

Using BOINC Manager, I just suspended and resumed S@H. C:\ProgramData\BOINC\slots\1\stderr.txt didn't change date/time group (from about 5 hours ago) and reflects the content of C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\mb_cmdline-8.12_windows_intel__opencl_ati5_SoG.txt when the currently running task started.

It seems the app looks at the mb file at startup. It seems the app doesn't look at it for a suspend/resume cycle.

In coming days, I'll check about a full hibernation cycle and about an operating system restart.

I collected information about the previous task thinking the suspend/resume cycle is enough. Let's ignore that data. 3000 is a bogus number. That task caused problems at 350; I'll call that an experimental lower bound. I don't have a valid attempt demonstrating an upper bound.

The current task is running at 1000; I'm looking for its upper bound.

Thanks for the time. Your programming on S@H is valuable to me. I hope for all of us that we one day find something interesting.
ID: 1816268 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1816355 - Posted: 11 Sep 2016, 12:12:25 UTC - in response to Message 1816268.  

It seems the app looks at the mb file at startup. It seems the app doesn't look at it for a suspend/resume cycle.

On every start of the app .exe the app reads the cmdline file, no matter if it is a new task (0% Progress) or restarting work on 'old' task.

If you are not sure which mb_cmdline file to use - put the same line with switches in all of them.
Maybe with small difference, e.g. -period_iterations_num 2500 in one and -period_iterations_num 3000 in another - to see which one will be used.
You will see in stderr.txt:
Single buffer allocation size: 192MB
period_iterations_num=3000

Some people report that you need to put Space in the begin and end of the line with switches like: " -period_iterations_num 500 -sbs 192 "


"Suspend/resume cycle" is easy done by Check/Uncheck "Snooze GPU" from BOINC Manager icon Right-Click
Wait a few seconds after "Snooze GPU" (or check in Windows Task Manager, Process Explorer, ...) for app to really exit.


S@H has slot 1

Every new started task will use different slot # - the first which is free.
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1816355 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1816377 - Posted: 11 Sep 2016, 13:02:45 UTC - in response to Message 1816355.  

Some people report that you need to put Space in the begin and end of the line with switches like: " -period_iterations_num 500 -sbs 192 "

I believe this to be an urban myth - not that there are reports, but that either space is needed.

When the parameter list is passed as an actual command line, BOINC provides padding (two spaces) after the executable name to guard the subsequent list.

When the commands are read from a separate file, there's no evident need for a leading or trailing guard space - and I've just tested on a machine here that it makes no difference.

What is important is that every separate element - both the commands (starting '-') and the numeric parameters - is separated by a space. Because of that, it can be helpful to put a space at the *end* of the file, so that if you come back later and paste an extra command on the end of the line, you don't forget the space at that point...

Every new started task will use different slot # - the first which is free.

Once a task has been allocated a slot number at initial launch, it will retain that number throughout the run, no matter how many times it's paused and restarted during the run: some non-volatile information, like elapsed time, progress and checkpoints, is preserved in the slot directory until the task has completed. Most times, the next task to start will be allocated the same slot number, but not always.

If you want to inspect the data for a running or paused task, highlight it in BOINC Manager and click the 'properties' command button: the working directory is among the properties, in the format 'slots/n'.
ID: 1816377 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1816620 - Posted: 12 Sep 2016, 10:16:49 UTC

I don't think I've seen this message in an NV SoG r3528 result before:

Triplet: peak=10.79628, time=85.92, period=3.185, d_freq=1420626577.79, chirp=-53.559, fft_len=128
ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_triplet_result_flag)' call failed (-36) in file ..\analyzePoT.cpp near line 3407.
Waiting 30 sec before restart...

Task is 5150526259: it restarted after the next task had finished with the GPU, and completed normally.

stderr says 'Restarted at 32.05 percent', but it was much later in the run (beyond 80%) when it paused and drew attention to itself, and although I didn't watch this particular task restart, I've noticed before that when SoG tasks are paused for any reason, they resume showing a much earlier progress point for a second or two, then jump back almost to the pause point. Purely cosmetic, but it looks odd.
ID: 1816620 · Report as offensive
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 · Next

Message boards : Number crunching : SETI@home v8.12 Windows GPU applications support thread


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.