Message boards :
Number crunching :
SETI@home v8.12 Windows GPU applications support thread
Message board moderation
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
the GPU temperature is still 72°C Actually temperature regulation for GPU is open field AFAIK. Making pauses in CPU feeding thread can't be enough for GPU throttling in case of long async kernel chains. And VHAR + NV SoG app combination provide dozens seconds long such chains. More robust method would be just decrease GPU frequency and then - voltage. SETI apps news We're not gonna fight them. We're gonna transcend them. |
robertmiles Send message Joined: 16 Jan 12 Posts: 213 Credit: 4,117,756 RAC: 6 |
I'm now seeing a GPU temperature of 62 C, but only if I leave Afterburner running from the console. Afterburner does not seem to run properly in the background, at least for the fan profile portion. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I might suggest you try SIV for your fan control. SIV is mainly a monitoring tool but it also can control a variety of hardware too. Kinda a Swiss Army Knife application. If you invoke it with the -GPUCTL parameter, you get a nice 6 point fan control which allows you to set fan speed curves against graphic card temperatures. After starting it with the -GPUCTL parameter, you will find the fan control under [Machine] > [GPU Points]. I just leave it running on the Desktop all the time and have it autostarted. Give it a look, you might become a fan as I am. SIV64 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I'm now seeing a GPU temperature of 62 C, but only if I leave Afterburner running from the console. Afterburner does not seem to run properly in the background, at least for the fan profile portion. . . I have it running constantly in the background and pop up the graph display from time to time to check things. So the fan control is continuous. But there is a little trick to it. When you have set things as you want click the detach button on the graph part of the GUI interface and the graphs will display in their own separate window, now right click on the left hand edge of the top bar of this window and a menu will appear where one option is "Always on top". Untick this option then you can minimise that window. Minimise the main Afterburner window and it will just be an icon on the task bar. Stephen |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Sounds like a neat trick with Afterburner and that you got it to do what you want. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Sounds like a neat trick with Afterburner and that you got it to do what you want. Hi Keith, . . Yep, I am pretty happy with the way afterburner is working for me. I have it on all three rigs and it is a big part how I measure the "success" of any changes I do to those rigs. It tells me a lot. Stephen . |
Garry Send message Joined: 7 Jul 02 Posts: 40 Credit: 535,102 RAC: 1 |
Continuing a thread at http://setiathome.berkeley.edu/forum_thread.php?id=80174&postid=1815198; Raistmer asked me to bring it here. Glad to.
Still not sure I have the right log. I found BOINC/stdoutdae.txt. Same thing as I said before: It includes an entry there for downloading the subject task. No later entry looks like a driver restart, unless I don't understand the wording of one of those.
(Blush.) My mistake. I've set it for closer to 250 and resumed. Keyboard latency is sometimes long, but I'll keep experimenting and watching the time data, too. Thanks for being so kind to the inexperienced. I'm on some sites where the experts make the welcome a little thin. This is better. |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
Continuing a thread at http://setiathome.berkeley.edu/forum_thread.php?id=80174&postid=1815198; Raistmer asked me to bring it here. Glad to. The "Event Log" being eluded too is in the "Windows Event Viewer". Expand "Windows Logs" in the left plain, then highlight 1 at a time both "Applications" and "System" and check each for listed errors. ;-) Cheers. |
Garry Send message Joined: 7 Jul 02 Posts: 40 Credit: 535,102 RAC: 1 |
Continuing a thread at http://setiathome.berkeley.edu/forum_thread.php?id=80174&postid=1815198; Raistmer asked me to bring it here. Glad to. Thanks kindly for the assist, Wiggo! (You and I must be similar in age. I have more hair on top, but less in the beard. I don't know which is better!) Raistmer or others: I found the log Wiggo pointed me to. I think Raistmer was asking about drivers stopping. If he means the display driver, my logs tell me I've had 290 of those events (ID 4101) since 1/25/2016. Two were today, one morning and one evening. The evening event was associated with a three-minute interruption of the computer. Something took exclusive hold of the computer for that time before returning processing to the OS. As part of the computer recovery, I got the dialog box saying the display driver had crashed and recovered. I consider it reasonable that I've had 290 display driver crashes in that time. I can't estimate the portion of them tied to BOINC processing. Even tonight's: I don't have evidence whether it was BOINC or something else that grabbed the computer. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Even if driver restarted for some non-BOINC reason (hardly so), effect on SETI MultiBeam OpenCL app will be just the same - it will hang with zero CPU consumption and zero progress over time until restarted. More probably it's the OpenCL app's kernels that cause driver restarts by OS. Try to remove all modificators/options from command line of MB app, let it run with default config. Will driver restarts remain? If so, post again and we continue troubleshooting (with posts please provide links to particular results). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Garry Send message Joined: 7 Jul 02 Posts: 40 Credit: 535,102 RAC: 1 |
Raistmer: I consider it reasonable that I've had 290 display driver crashes in that time. I can't estimate the portion of them tied to BOINC processing. By "command line of MB app", do you mean the five files BOINC\projects\setiathome.berkeley.edu\mb_cmdline-8.12_windows_intel*.txt? If so, I'll say all the 290 faults that occurred before around Aug 24 (http://setiathome.berkeley.edu/forum_thread.php?id=80174&postid=1812193) are the answer. Before that, all five files had zero length; I'm guessing that's running "with default config". About that time, Mike (http://setiathome.berkeley.edu/show_user.php?userid=9826) told me how to experiment with values for -period_iterations_num. Now that I know I can pause the S@H task, change the file, and collect valid information, I'll collect data faster. I didn't get that from Mike; I wish I'd asked. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Raistmer: yes, that's "command line" too. Clear the log and watch if you will have new driver restarts with -period_iterations_num 500 SETI apps news We're not gonna fight them. We're gonna transcend them. |
Garry Send message Joined: 7 Jul 02 Posts: 40 Credit: 535,102 RAC: 1 |
Clear the log and watch if you will have new driver restarts with -period_iterations_num 500 I had no display driver crashes Mon 9/5 to today (Thu 9/9). This morning, I had two. I was processing a task from opencl_ati5_cat132 (task https://setiathome.berkeley.edu/result.php?resultid=5145532737; work unit https://setiathome.berkeley.edu/workunit.php?wuid=2259686777). When the task arrived, mb*opencl_ati5_SoG.txt was -sbs 192 -period_iterations_num 406 I had a freeze of a over a minute, apparently including a driver crash. I changed the 406 to 442, thinking I'd consider 500 an outside upper bound. Within 20 minutes, I had another freeze of a over a minute, again including a driver crash. I changed the 442 to 787, thinking I'll need to consider 1000 an upper bound and hoping to come down again from the current value. That task continues to run, now with intermittent slight latency (a big improvement!) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Another thing to try is to change driver for some other version. Some of AMD drivers though working generate extremely bad binary for PulseFind kernel (excessive register spilling). Your APU should be faster than my C-60 and on C-60 I can use much lower values w/o driver restart... SETI apps news We're not gonna fight them. We're gonna transcend them. |
Garry Send message Joined: 7 Jul 02 Posts: 40 Credit: 535,102 RAC: 1 |
Another thing to try is to change driver for some other version. The AMD site ran a piece of code on my computer and reported I already have the latest driver. Sigh. Good idea. I'm kinda thinking I'm making an error somewhere that we haven't found yet. I suspended BOINC. The mouse and keyboard latency entirely went away. I resumed BOINC and suspended S@H. The mouse and keyboard latency entirely went away. I resumed S@H. The mouse and keyboard latency returned. The new file I'm processing (for another minute) is for opencl_ati5_cat132. Is mb_cmdline-8.12_windows_intel__opencl_ati5_sah.txt the right file to control that app? I'll guess not: I have it set to -sbs 192 -period_iterations_num 3000 and I'm still getting long mouse and keyboard latency. If that's the right file, what other error (maybe a simple one) could I be making? After I change that file, can I get the app to read it by using BOINC Manager >> Activity >> Suspend, wait something like 10 seconds, and BOINC Manager >> Activity >> Resume? The next task is for opencl_ati5_SoG_cat132. Will the right file to control it be mb_cmdline-8.12_windows_intel__opencl_ati5_SoG.txt? That file is task https://setiathome.berkeley.edu/result.php?resultid=5147003185 work unit https://setiathome.berkeley.edu/workunit.php?wuid=2260374396 Thanks. This is a small pain ... |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
best way to ensure app received your options is to look into stderr.txt file in corresponding slot folder under BOINC data directory. App informs about every setting it understood and implemented. And if even set to 3000 (actually, too big to be useful) did not help then I'll propose just to set GPU to suspend in case of user activity. Some other kernel causes delay, not Partial PulseFind. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Garry Send message Joined: 7 Jul 02 Posts: 40 Credit: 535,102 RAC: 1 |
best way to ensure app received your options is to look into stderr.txt file in corresponding slot folder under BOINC data directory. Thanks! Good info. I have three such slot directories. Other projects are using slots 0 and 2; S@H has slot 1. Using BOINC Manager, I just suspended and resumed S@H. C:\ProgramData\BOINC\slots\1\stderr.txt didn't change date/time group (from about 5 hours ago) and reflects the content of C:\ProgramData\BOINC\projects\setiathome.berkeley.edu\mb_cmdline-8.12_windows_intel__opencl_ati5_SoG.txt when the currently running task started. It seems the app looks at the mb file at startup. It seems the app doesn't look at it for a suspend/resume cycle. In coming days, I'll check about a full hibernation cycle and about an operating system restart. I collected information about the previous task thinking the suspend/resume cycle is enough. Let's ignore that data. 3000 is a bogus number. That task caused problems at 350; I'll call that an experimental lower bound. I don't have a valid attempt demonstrating an upper bound. The current task is running at 1000; I'm looking for its upper bound. Thanks for the time. Your programming on S@H is valuable to me. I hope for all of us that we one day find something interesting. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
It seems the app looks at the mb file at startup. It seems the app doesn't look at it for a suspend/resume cycle. On every start of the app .exe the app reads the cmdline file, no matter if it is a new task (0% Progress) or restarting work on 'old' task. If you are not sure which mb_cmdline file to use - put the same line with switches in all of them. Maybe with small difference, e.g. -period_iterations_num 2500 in one and -period_iterations_num 3000 in another - to see which one will be used. You will see in stderr.txt: Single buffer allocation size: 192MB period_iterations_num=3000 Some people report that you need to put Space in the begin and end of the line with switches like: " -period_iterations_num 500 -sbs 192 " "Suspend/resume cycle" is easy done by Check/Uncheck "Snooze GPU" from BOINC Manager icon Right-Click Wait a few seconds after "Snooze GPU" (or check in Windows Task Manager, Process Explorer, ...) for app to really exit. S@H has slot 1 Every new started task will use different slot # - the first which is free. Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Some people report that you need to put Space in the begin and end of the line with switches like: " -period_iterations_num 500 -sbs 192 " I believe this to be an urban myth - not that there are reports, but that either space is needed. When the parameter list is passed as an actual command line, BOINC provides padding (two spaces) after the executable name to guard the subsequent list. When the commands are read from a separate file, there's no evident need for a leading or trailing guard space - and I've just tested on a machine here that it makes no difference. What is important is that every separate element - both the commands (starting '-') and the numeric parameters - is separated by a space. Because of that, it can be helpful to put a space at the *end* of the file, so that if you come back later and paste an extra command on the end of the line, you don't forget the space at that point... Every new started task will use different slot # - the first which is free. Once a task has been allocated a slot number at initial launch, it will retain that number throughout the run, no matter how many times it's paused and restarted during the run: some non-volatile information, like elapsed time, progress and checkpoints, is preserved in the slot directory until the task has completed. Most times, the next task to start will be allocated the same slot number, but not always. If you want to inspect the data for a running or paused task, highlight it in BOINC Manager and click the 'properties' command button: the working directory is among the properties, in the format 'slots/n'. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I don't think I've seen this message in an NV SoG r3528 result before: Triplet: peak=10.79628, time=85.92, period=3.185, d_freq=1420626577.79, chirp=-53.559, fft_len=128 Task is 5150526259: it restarted after the next task had finished with the GPU, and completed normally. stderr says 'Restarted at 32.05 percent', but it was much later in the run (beyond 80%) when it paused and drew attention to itself, and although I didn't watch this particular task restart, I've noticed before that when SoG tasks are paused for any reason, they resume showing a much earlier progress point for a second or two, then jump back almost to the pause point. Purely cosmetic, but it looks odd. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.