Posts by Jeff Buck

1) Message boards : Number crunching : ubuntu Install (Message 1896151)
Posted 6 hours ago by Profile Jeff Buck
Post:
Well, I still can not access any pull down menus of BOINC Manager which is definitely a problem. I installed the GPU driver, but it wont start any tasks since I can't change compute preferences. Anyone have an idea why I am not seeing pull down menus for BOINC manager? Thanks
The menu is hidden in the title bar. You have to hover your mouse over the title bar to make it appear. If it still doesn't, you need to open Settings>Appearance, click on the Behavior tab, then make sure that "in the window's title bar" is selected.
2) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1896053)
Posted 22 hours ago by Profile Jeff Buck
Post:
Have you tried using the GUI Fan Control yet?
EDIT: Scratch that last question since I see you answered it before I asked it. I guess I didn't read past "it still bombs". ;^)


. . I know I downloaded it somewhere but I cannot find it. Would you care to point me to the download again please?

Stephen

??
https://www.dropbox.com/s/qj6hipjed4zjajr/gpufancnv.7z?dl=0

EDIT: And if you have any questions/comments about that app, it would probably be best to post them over in the dedicated thread, NVIDIA GPU Fan Control using GUI in Linux, since the fan control really isn't specific to the "Special App".
3) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1896050)
Posted 22 hours ago by Profile Jeff Buck
Post:
When I updated to the 387.12 driver, I did have to go back through the coolbits tweak and reset the Nvidia X-server settings. But I still have Jeff's fan control app working just as before. I wouldn't think that the fan control script works any differently. Still the same mechanism under the covers.
One of the improvements in the GUI, besides it being, well, a GUI, is that it should work with both the older and newer drivers without any internal changes. Right up front it checks to see if "GPUTargetFanSpeed" is a valid attribute for the driver in use. If it gets an error, it falls back to "GPUTargetFanSpeed" and continues on its merry way.

Hi Jeff, I guess my comment about being the same under the covers was off the mark. Wasn't aware that older or newer drivers called the fan control different names. Glad to hear your app handles both cases. I never even noticed any difference other than my overclocks disappeared for the 2nd and 3rd cards until I went back through the scripts. Your app came right back up every time after a reboot as if nothing changed.
Glad to hear it! And I see that I had a typo in my message. I meant to say that it falls back to "GPUCurrentFanSpeed" for the older drivers. Sigh. Feels like i've got ADD this afternoon!
4) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1896042)
Posted 23 hours ago by Profile Jeff Buck
Post:
. . I replaced GPUCurrentFanSpeed with GPUTargetFanSpeed and it still bombs ... :( Maybe I will have to get the GUI version set up. Now where did I put that download?

Stephen

??
Without knowing what error message(s) you're getting, it's impossible to even attempt a diagnosis. Have you tried using the GUI Fan Control yet?

EDIT: Scratch that last question since I see you answered it before I asked it. I guess I didn't read past "it still bombs". ;^)
5) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1896036)
Posted 1 day ago by Profile Jeff Buck
Post:
When I updated to the 387.12 driver, I did have to go back through the coolbits tweak and reset the Nvidia X-server settings. But I still have Jeff's fan control app working just as before. I wouldn't think that the fan control script works any differently. Still the same mechanism under the covers.
One of the improvements in the GUI, besides it being, well, a GUI, is that it should work with both the older and newer drivers without any internal changes. Right up front it checks to see if "GPUTargetFanSpeed" is a valid attribute for the driver in use. If it gets an error, it falls back to "GPUTargetFanSpeed" and continues on its merry way.
6) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1896030)
Posted 1 day ago by Profile Jeff Buck
Post:
. . Since the fan control script doesn't work with the 384.90 driver
What error message are you getting?

Are you using "GPUTargetFanSpeed" (for newer drivers) or "GPUCurrentFanSpeed (for older drivers)?
7) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1895798)
Posted 2 days ago by Profile Jeff Buck
Post:
I'm going to post this Inconclusive for a couple of reasons. To begin with, it's the first I've seen of zi3xs3, which is apparently Petri's latest version of the Special App. It actually appears to report signals that, except for the very last one, seem to match SoG pretty closely, albeit with Spikes and Autocorrs being in a different order in several places.

It's that different order that I assume resulted in the last reported signal before the overflow being a Spike in SoG and an Autocorr in zi3xs3.

There is also a disagreement about Best Autocorr, but the odd thing here (at least to me) is that while zi3xs3 reports a Best Autocorr that matches one of the reported signals, the Best Autocorr reported by SoG is not found among the 3 reported signals. Hmmm.....

Anyway, one of my Windows hosts is assigned the tiebreaker, which should run with the same r3584 SoG app as the first host.

Workunit 2711811794 (blc05_2bit_guppi_57903_63524_HIP22812_0048.19015.409.17.26.200.vlar)
Task 6094658928 (S=27, A=3, P=0, T=0, G=0, BG=0) v8.22 (opencl_nvidia_SoG) windows_intelx86
Task 6095383686 (S=26, A=4, P=0, T=0, G=0, BG=0) x41p_zi3xs3, Cuda 9.00 special
8) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1895604)
Posted 3 days ago by Profile Jeff Buck
Post:
Are we still in pre-production testing for the CUDA 9.0 apps? Or are they usable in production now?

[bump]
The impression I got from Petri's original post was that any "pre-production testing" was something he expected each individual user to simply perform with offline runs on their own configurations. Then, it would be a judgment call for each user as to when, or whether, the switch-over to production could commence. So, basically, you have to make your own call as to how much offline testing is sufficient before going live. Not an ideal approach, but it is what it is. :^)
9) Message boards : Number crunching : Problem with SoG......waiting for GPU memory. (Message 1895211)
Posted 5 days ago by Profile Jeff Buck
Post:
I have another XP 32bit rig with a single 1GB stick.
And it is supporting 3 GPUs.
I assume that's the one with the 560s. I don't think those require the same memory mapping as the 7xx and 9xx cards. The rig that I ran into the problem with was very happy running 2 660s and 2 750Tis. However, it was probably running right at the memory limit and as soon as I upgraded one of the 660s to a 960 I started getting the "postponed" messages. Checking Resource Monitor, I found that the 960 required an extra 256MB "Hardware Reserved" chunk of the physical memory. So, I reduced the number of CPU tasks to free up some memory. That worked. After a while, though, I replaced the other 660 with a 750Ti and ran into exactly the same problem, another 256MB of "Hardware Reserved". No way around it that I could find, so I reduced my number of CPU tasks again so that I could keep the GPUs at max.
10) Message boards : Number crunching : Problem with SoG......waiting for GPU memory. (Message 1895193)
Posted 5 days ago by Profile Jeff Buck
Post:
If you have Resource Monitor in your version of Windows (type it into the Search box), it will give you a breakdown of the physical memory. The "Hardware Reserved" is what has bitten me, causing similar error messages, although that was in 32-bit Windows. I had to cut back on the number of concurrent tasks in order to squeak by.
11) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1895122)
Posted 6 days ago by Profile Jeff Buck
Post:
My comment wasn't intended to suggest that the problem didn't need to be fixed. I was just noting that I've seen it appear with the stock Cuda apps, in particular the Cuda50 running on my daily driver. And I was passing along your analysis that it wasn't just a processing order issue.

Ideally, Jason would probably be the one to try to track it down in the current Cuda codebase, but he has been absent for awhile, so if it can be fixed in the Special App, I would expect that it could be ported back to the more widely used Cuda apps.

As I think I've expressed multiple times previously, just because a WU overflows doesn't mean that it's worthless. That 30 signal cutoff was based on storage considerations, not the value of the scientific data. The apps need to report consistently and let the scientists sort through the results and make any "noise bomb" determination. Anyway, what appears to be a noise bomb to one person might actually turn out to be an alien ABBA concert to another. ;^)
12) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1895069)
Posted 6 days ago by Profile Jeff Buck
Post:
This is a Typical Invalid Overflow, https://setiathome.berkeley.edu/workunit.php?wuid=2708379644
The way I remember it, the CUDA App looks for Triplets First. If a Task starts with many Triplets the Overflow result will be 30 Triplets.
If the App looks for Something Else First, such as the SoG App, then the results will most likely be 30 of whatever it is looking for, i.e. Not Triplets.
It actually may not be that simple. During an email exchange I had with Petri about a month ago regarding this problem, he said "I looked at my code and the pulses are checked before triplets. So it is not so an easy fix I thought. I will have to debug why my code misses many pulses on noisy packets and then some on 'normal' data."

To complicate it further, it seems to be a problem that already exists in the older Cuda apps, as I noted previously, so it may be in some code that Petri's app actually inherited from the stock Cuda code. It just never surfaced until the 4-bit WUs started to flow.
13) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894764)
Posted 7 days ago by Profile Jeff Buck
Post:
Hmmm, where has the problem with the SoG Best Gaussian been thoroughly discussed other than this thread? I must have missed that. The Best Pulse problem with the CUDA App has been run into the ground,
Why does it need to be discussed elsewhere? If any topic was run into the ground, I'd say it was the Best Gaussian. I know. I was pretty deeply involved in that back in June, spending a lot of time combing through my Inconclusives and posting many, many examples. But it seems to me that the ultimate conclusion was that it might not be exclusive to SoG. That, in fact, there was some divergence in the code paths that went back several years. As I recall, Jason had intended to dig into it some more but he's been kind of disengaged since then. Perhaps when he drifts back in, the discussion can pick up. In the meantime, unless something changes, it seems rather pointless to keep posting additional examples.

And whether or not the Best Pulse problem with the CUDA app has been "run into the ground", the fact that a new version of the Special App has started showing up makes it eminently reasonable to take a close look at the results to see if anything has changed, been improved, or fixed. Simply because it's a few seconds faster doesn't mean we should all bow down, close our eyes and just give thanks. If a new SoG app had arrived, the same scrutiny should be applied. The difference there is that new SoG apps (and just about every other new app) hit Beta first, whereas the Special App just goes straight to Broadway. Hence, close scrutiny is even more important.
14) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894718)
Posted 8 days ago by Profile Jeff Buck
Post:
As far as I know, the previously discovered Best Gaussian problem discovered with the Windows SoG App DOES cross validate, and STILL EXISTS. You don't seem very concerned about that problem, and it's actually more troublesome than an occasional race condition with the Best Pulse.
It's not that problems like that don't still concern me, it's just that it's been pretty thoroughly discussed and enough examples of the issue have already been posted. I try to keep an eye out for issues that appear in the latest versions of the apps, whether they're new problems, or continuing ones that I would hope would have been fixed in those newer versions. The two Best Pulse examples that I posted are the first I've seen with the Cuda 9.0 app and, since I'm not the one running that version, they only show up to me when one of my wingmen is running it and then that WU shows up in my Inconclusives list.
15) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894708)
Posted 8 days ago by Profile Jeff Buck
Post:
Hi,

zi3t2 may report wrong pulses time to time. It should not be used.
Petri
Does that apply to zi3t2b as well? I have that version running on 2 of my Linux boxes, but have zi3v running on the other one. The reason I haven't moved all of them to zi3v is that annoying problem with restarted tasks spewing out phantom spikes or triplets after the restart until the task overflows, resulting in the task getting marked Invalid. About 20% of my restarted tasks on the zi3v box end up that way, while in all the months that I've been running zi3t2b, I think I've only seen one single task behave that way.

EDIT: Meant to say 15%, or about 3 per week out of 20 restarted tasks.
16) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894682)
Posted 8 days ago by Profile Jeff Buck
Post:
Do we have this task offline?
Here you go, Raistmer: WU2705262578
17) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894571)
Posted 8 days ago by Profile Jeff Buck
Post:
Keep in mind this packet has no reportable pulses. The best non reportable is eye candy. They are so faint "signals" that they are most probably noise or so near the computational precision that any different summation order of floating point values gives always a different result.

There is a reason they are not reported as found pulses.

Petri
But there is apparently also a reason why they are reported as Best Pulses and need to be validated accordingly. Once again, that's a decision that was made by the project administrators/scientists, and it's certainly not up to application developers to arbitrarily ignore whatever is the existing standard, simply to squeeze out a little more speed. Accuracy and consistency come first, speed comes second.
18) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894547)
Posted 8 days ago by Profile Jeff Buck
Post:
Quite a significant difference in the Best Pulse on this WU.

Workunit 2705262578 (07ap07aa.16319.13160.7.34.221)
Task 6080947466 (S=1, A=0, P=0, T=9, G=0, BG=0) v8.20 (opencl_ati5_SoG_mac) x86_64-apple-darwin
Task 6080947467 (S=1, A=0, P=0, T=9, G=0, BG=0) x41p_zi3xs2, Cuda 9.00 special

One of my machines holds the tiebreaker.
So much for tiebreaking. My host showed yet another significantly different Best Pulse. The three apps and their reported Best Pulses are:

v8.20 (opencl_ati5_SoG_mac) x86_64-apple-darwin: peak=7.699861, time=103.2, period=0.5112, d_freq=1419657277.7, score=0.9625, chirp=11.364, fft_len=256
x41p_zi3xs2, Cuda 9.00 special: peak=0.751317, time=13.42, period=0.02444, d_freq=1419661865.23, score=0.7804, chirp=0, fft_len=8
x41p_zi3v, Cuda 8.00 special: peak=0.6058947, time=41.94, period=0.01732, d_freq=1419654541.02, score=0.8102, chirp=0, fft_len=8

The WU is now in the hands of a fourth host. Not good.
To finish this one off, the 4th host has reported, matched the 1st one, and everybody got validated in the end, even though both versions of the Special App appear to have missed the mark by quite a bit.

v8.22 (opencl_nvidia_SoG): peak=7.699859, time=103.2, period=0.5112, d_freq=1419657277.7, score=0.9625, chirp=11.364, fft_len=256

Keep in mind, this was not an overflow WU. This was a high AR Arecibo WU that ran to full term.
19) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894537)
Posted 8 days ago by Profile Jeff Buck
Post:
On this one the CUDA 9 App was given canonical, Workunit 2705643325
canonical result 6081742892 : Task 6081742892 = Computer 6906726
Yeah, the differences seemed pretty minor to begin with. The x41p_zi3xs2 and the v8.08 (alt) ended up the closest, but the x41p_zi3t2b got credit, too. If I have time tomorrow, perhaps I'll make offline Windows CPU runs with both this WU and the other one.
20) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894526)
Posted 8 days ago by Profile Jeff Buck
Post:
Quite a significant difference in the Best Pulse on this WU.

Workunit 2705262578 (07ap07aa.16319.13160.7.34.221)
Task 6080947466 (S=1, A=0, P=0, T=9, G=0, BG=0) v8.20 (opencl_ati5_SoG_mac) x86_64-apple-darwin
Task 6080947467 (S=1, A=0, P=0, T=9, G=0, BG=0) x41p_zi3xs2, Cuda 9.00 special

One of my machines holds the tiebreaker.
So much for tiebreaking. My host showed yet another significantly different Best Pulse. The three apps and their reported Best Pulses are:

v8.20 (opencl_ati5_SoG_mac) x86_64-apple-darwin: peak=7.699861, time=103.2, period=0.5112, d_freq=1419657277.7, score=0.9625, chirp=11.364, fft_len=256
x41p_zi3xs2, Cuda 9.00 special: peak=0.751317, time=13.42, period=0.02444, d_freq=1419661865.23, score=0.7804, chirp=0, fft_len=8
x41p_zi3v, Cuda 8.00 special: peak=0.6058947, time=41.94, period=0.01732, d_freq=1419654541.02, score=0.8102, chirp=0, fft_len=8

The WU is now in the hands of a fourth host. Not good.


Next 20


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.