Message boards :
Number crunching :
Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation
Previous · 1 . . . 36 · 37 · 38 · 39 · 40 · 41 · 42 . . . 83 · Next
Author | Message |
---|---|
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Another re-re-processing could be done. But I really would like to know why it happens in the first place. I could also just stop reporting any pulses at the exact time/fft/..., just pretend they did not happen. But I do not want to. The bug is bugging me. Peak, Time, Period and Score + fft_len always the same. Freq and chirp wary. The first pulsefind in the code (8k len). Not the l2m version. My suspect is a memory overflow, a misbehaved pointer/memory management, overheating, bad VRAM, uninitialized variable/mem area, an error in the chirp code/fft lib/my code or then it is an alien trying to hide its existence. Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286407.06, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286418.24, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286429.41, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286440.59, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286451.76, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286462.94, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286474.12, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286485.29, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286496.47, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286507.64, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286518.82, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286529.99, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286541.17, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286552.35, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286563.52, score=4.562, chirp=-83.442, fft_len=8k Pulse: peak=41.66666, time=46.17, period=23.26, d_freq=2323286574.7, score=4.562, chirp=-83.442, fft_len=8k To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Hmmm....that looks very different from the ones I was getting, although the "time" value is quite close. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
.. Agreed. The pulse race before was challenging to visualise and describe, but serialised reprocessing was one correct way to handle it. This other odd thing I don't have a similar clear idea on yet. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Still don't know how to do a Cuda bench run, but I did run what I think is the Windows stock CPU app today, setiathome_8.00_windows_intelx86. The numbers in the results file (<peak_power>0.46856832504272</peak_power>, etc.) appear to match the opencl_ati_cat132 and r3330 results (allowing for the fact that I don't know how to convert that "time" value, and the "score" has a value of 0).The only way to tell for sure is to run the task with your CPU and compare the results. You should give that a try, you can run a CPU task in the benchmark App while running BOINC. Just reduce the CPU usage by One in BOINC and remove any Apps from the APPS folder in the Benchmark package. The CPU App in the REF_APPS folder will search the WU folder and run any task it doesn't have results for. The Benchmark tool is here, KWSN Linux MB Bench v2.01.08. Extract the KWSN-Bench-Linux-MBv7_v2.01.08.7z to your Home folder and run it from there.Okay, I tried running it with the Windows CPU app that I use here on my daily driver. It almost perfectly matches the v8.22 (opencl_ati_cat132) result. FWIW, today's run reminded me of why I haven't run stock CPU in a long time. It took about 6 hours and 45 minutes, versus 3 hours and 13 minutes for the r3330 I ran yesterday! |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Still don't know how to do a Cuda bench run, but I did run what I think is the Windows stock CPU app today, setiathome_8.00_windows_intelx86. The numbers in the results file (<peak_power>0.46856832504272</peak_power>, etc.) appear to match the opencl_ati_cat132 and r3330 results (allowing for the fact that I don't know how to convert that "time" value, and the "score" has a value of 0). Cheers! The observations will help narrow things down. Yep, win32 8.00 CPU isn't quick ;D. Cuda bench on Windows is just a matter of throwing the exe, two suitable cu DLLs, and optionally an mbcuda.cfg into the science_apps folder before running the bench. If you do get it working, luckily the CPU reference result should be cached from the prior run and skipped, so it just runs any other app comparison against that. Myself I'll probably attempt GPU-passthrough of the 780 &/or from the OSX+Linux Host to a Win10vm, scheduled for the weekend. If I do get that operational, I may script a rough automation to distribute and accumulate results from the 3 platforms, letting each OS have a batch of normal test and suspect tasks with various apps. If that works out as hoped, I'll enable some kindof facility to dump in suspects remotely for cross platform match, but that of course is further down the line. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Cuda bench on Windows is just a matter of throwing the exe, two suitable cu DLLs, and optionally an mbcuda.cfg into the science_apps folder before running the bench. If you do get it working, luckily the CPU reference result should be cached from the prior run and skipped, so it just runs any other app comparison against that.Welllll....I actually did try doing that last evening, though from the Reference folder rather than the Science_apps folder. Used Lunatics_x41zi_win32_cuda50.exe, cudart32_50_35.dll, cufft32_50_35.dll, and mbcuda.cfg. In short, everything that normally runs on this machine. That led to the most spectacular Windows meltdown I've ever seen! The script suspended BOINC just fine. But then.......the windows for all my open apps went haywire, rapidly cycling from one to another, then black screen, then back to Windows but now with an XP theme, then a disappearing task bar, followed by an empty task bar, followed by one task bar button after another reappearing (still with the old XP look), and then a convincing imitation of a Cheshire cat as one Window after another rapidly disappeared, followed by the task bar again, leaving only the desktop wallpaper. Several seconds later, even that disappeared, but eventually the Windows welcome/login screen showed up, which then simply allowed me to click the icon and start a new Windows session, with apparently no permanent harm done (other than the time it took me to re-launch the applications that had been running pre-meltdown. Sooooo.....I kinda figured I must've missed some key element there. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I've got several new Inconclusives this evening featuring Best Gaussians. None of them have reportable Gaussian signals, however. Workunit 2585507075 (23no16ab.17660.365297.10.37.52) Task 5830849142 (S=1, A=0, P=0, T=0, G=0, BG=5.342847) x41p_zi3v, Cuda 8.00 special Task 5830849143 (S=1, A=0, P=0, T=0, G=0, BG=6.369327) v8.22 (opencl_nvidia_SoG) windows_intelx86 Workunit 2585604974 (24no16ab.21342.5384.7.34.159) Task 5831052801 (S=6, A=0, P=8, T=0, G=0, BG=5.469572) x41p_zi3v, Cuda 8.00 special Task 5831052802 (S=6, A=0, P=8, T=0, G=0, BG=4.944084) v8.22 (opencl_nvidia_SoG) windows_intelx86 Workunit 2586487892 (30oc16ab.3155.106788.15.42.241) Task 5832884399 (S=19, A=0, P=1, T=0, G=0, BG=3.664569) v8.22 (opencl_nvidia_SoG) windows_intelx86 Task 5832884400 (S=19, A=0, P=1, T=0, G=0, BG=3.816209) x41p_zi3t2b, Cuda 8.00 special Workunit 2586487990 (30oc16ab.3216.106890.16.43.244) Task 5832884260 (S=1, A=0, P=0, T=2, G=0, BG=3.187616) x41p_zi3t2b, Cuda 8.00 special Task 5832884261 (S=1, A=0, P=0, T=2, G=0, BG=3.505734) v8.22 (opencl_nvidia_SoG) windows_intelx86 |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
So far, I've only noted one task with a problem using zi3v, and I'm pretty sure that's an isolated incident. Task 5828724704 originally was started on a GTX 750 Ti but, following a reboot, restarted on the GTX 960. Before the restart, it looks like it was running fine, but afterwards it went haywire, identifying 25 bogus Triplets with non-numeric peaks (i.e, "peak=-nan"). I imagine that it's just some sort of restart timing issue, though perhaps on a restart like that the memory usage spikes in some way. Only if it happens again will I really be concerned. Anyway, that task is currently in an Inconclusive state but I expect it to go Invalid once the tie-breaker reports in.Unfortunately, this was not an isolated incident. I've now had two more tasks, 5832726585 and 5832726581, which went haywire with zi3v following a restart. The first one originally ran on a GTX 750 Ti and restarted on the GTX 960, at which point it reported 15 bogus Triplets with "peak=-nan", similar to the one from Friday. The second one also started on a GTX 750 Ti but restarted on a different 750 Ti, this time quickly reporting 17 bogus Spikes. Both tasks seemed to be running fine before the shutdown and restart. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Some definite headscratchers in the last few posts :D Will think about those while planning the attack. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I've got several new Inconclusives this evening featuring Best Gaussians. None of them have reportable Gaussian signals, however.I just tested a Windows SoG App on My Mac against zi3v. Since the other machine didn't have any other Inconclusives, I decided to give it a try. My CPU says zi3v is correct on this task, http://setiathome.berkeley.edu/workunit.php?wuid=2586601005 So, since you are able to test those tasks with a Windows CPU, I'd say do that. One of the tasks I've been waiting on at Beta finally finished, it is basically the same as the task I just tested, https://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=9809103 The CPU App says the SoG App is wrong. I've been watching another task at Beta, https://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=9796907 Since it's still not validated, I downloaded it and now testing it on my CPU since it's the New CPU App destined for Main. If you look at it, you can see the Best Gaussian is different than the Reported Gaussian which is how my CPU and zi3v behave. Anyway, after this next test I'm not going to worry about Gaussians anymore, unless some change is made that needs to be investigated. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Another re-re-processing could be done. But I really would like to know why it happens in the first place. Sure, it would be workaround not fix, but would allow to move on next stage faster.
No, that's not good way to go at all. It would be just as introducing RFI manually on particular area of parameter space. If signal there inconclusives will arise anyway. But correct ones will be swarmed by GPU because of better performance, not because of validity.
Please explain more verbose here. First pulsefind is done on zero chirp and its length 8, not 8k. What did you mean by fist and 8k here?
??sorry?
From all this I would go further with uninitialized variable/mem area Same value for particular field could be just as attempt to interpret NaN or 0xDEAD
So, not very first but at 8k. It's strange indeed cause can't devise any pekuliarity of 8k. 16k would mean "square" matrice for gausfit if I recal correctly (not quite square but w/o scaling), but 8k and for Pulse... no idea right now. I would propose "heavy debugging" here. That is, to print data arrays. If it's 0xDEAD indeed you will notice it immediately. If not then should be compared with working OK version. And most important - can this be reproduced in offline run? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
The first one originally ran on a GTX 750 Ti and restarted on the GTX 960, at which point it reported 15 bogus Triplets with "peak=-nan", similar to the one from Friday. The second one also started on a GTX 750 Ti but restarted on a different 750 Ti, this time quickly reporting 17 bogus Spikes. Both tasks seemed to be running fine before the shutdown and restart. Damage in checkpointing. Most probably, some missing re-initialization of GPU-side signal buffers after restart. "Proper" fix would be to add those re-initializations but if app's target only really fast GPUs adequate solution will be just skip checkpointing at all. This will add overhead on restart (to reprocess from beginning) but simplifies code and avoids this issue completely (positive side effect could be some small speedup on all non-checkpointing tasks) SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Sooooo.....I kinda figured I must've missed some key element there. Better try from science dir or edit script. ref has -verb option by default and not sure how CUDA binary will react on this one. (or just try to remove additional switches from beginning of script). SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I've got several new Inconclusives this evening featuring Best Gaussians. None of them have reportable Gaussian signals, however. That important. SoG (actually, all OpenCL MB) have different processing for case with already found reportable Gaussian and before such event. So, your run indicates that issue on no reportable gaussian path . Another possibility - just boundary precision issue. Please try to grab 1-2 tasks and reprocess them offline with SoG vsv non-SoG binaries. If they will disagree then report back and provide task as testcase - that requires bugfixing. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
BTW, can Petri's app run on such GPUs (mobile ones) ?: NVIDIA GeForce 940MX and NVIDIA GeForce 820M SETI apps news We're not gonna fight them. We're gonna transcend them. |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
BTW, can Petri's app run on such GPUs (mobile ones) ?: This is what i've found out: "The executable is version zi3t2b and it can be run on sm_35, 50, 52, and 61. (750,780,980,1080 and likes). With 1 Mb of GPU ram you need -unroll 1. Other can use -unroll autotune. Use -bs to reduce CPU usage. Set -pfb to 8, 16 or 32." https://en.wikipedia.org/wiki/CUDA EDIT: 820M seems out of luck.. CC2.1 only but the 940MX seems to work CC50 _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
-= Vyper =- Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537 |
How do we know that the CPU portion of latest code isn't effected by sporadic errors when running Hyperthreading enabled?! Has anybody run tests with and without HT on Skylake and Kaby Lake computers?! https://setiathome.berkeley.edu/forum_thread.php?id=81641 _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
And the results are... My OSX CPU Agrees with the CPU SETI@home v8 v8.06 (alt) windows_x86_64 here, https://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=9796907 Pretty close match, the important parts are; 8.06 (alt) windows Gaussian: peak=2.961493, mean=0.5020308, ChiSq=1.415568, time=17.62, d_freq=1420573507.32, score=0.3145776, null_hyp=2.268677, chirp=-98.677, fft_len=16k Best gaussian: peak=3.659641, mean=0.5301717, ChiSq=1.25771, time=67.95, d_freq=1420577155.89, score=0.794832, null_hyp=2.202944, chirp=-65.055, fft_len=16k SSE4.1xjf OS X r 3344 Gaussian: peak=2.961488, mean=0.5020312, ChiSq=1.415569, time=17.62, d_freq=1420573507.32, score=0.3145256, null_hyp=2.268674, chirp=-98.677, fft_len=16k Best gaussian: peak=3.659646, mean=0.5301709, ChiSq=1.257715, time=67.95, d_freq=1420577155.89, score=0.7949486, null_hyp=2.202953, chirp=-65.055, fft_len=16k Of course this means the SoG App is Wrong...again. Seems the SoGs have a propensity to report the reported Gaussian as Best. One of them is wrong. All those people testing these Apps at Beta and no one picked this up? Nevermind. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
|
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
...All those people testing these Apps at Beta and no one picked this up? Nevermind. This far into the noise floor, there is no shame, only things to learn. Nobody's been here before. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.