Posts by Raistmer

1) Message boards : Number crunching : Vega Frontier Edition - MB Options Tuning (Message 1895473)
Posted 5 days ago by Profile Raistmer
Post:

Seems like the optimization on a guppi unit resulted in a degradation for Arecibo WUs. I will redo the optimization DOEs using both types of WUs. Let me know of any other recommendations.


As processing chain relating from AR it's recommended to use PG* set of tasks for benchmarking.
Maybe, with additional inclusion of GUPPI VLAR.
2) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1895145)
Posted 7 days ago by Profile Raistmer
Post:
To find the best that is not reported is time consuming in a parallel world.

Yes. indeed. But still it's part of algorithm.
Until algorithm will be changed best should be found correctly (some sort of reduction from best per CU to single best could be used to reduce slowdown from serialization.
Regarding overflows - yep, early versions of SoG had same issue too. The more distributed task computation is the bigger amount of signals one should store to properly reorder on reporting. At some point it will too costly indeed. But if you still doing one icfft per kernel call amount of signals to keep should be not too huge.
3) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1895065)
Posted 7 days ago by Profile Raistmer
Post:
Ever so often one of the Instant Overflows is given an Invalid.

Does wingman report overflow (different set but overflow) also?
4) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1895064)
Posted 7 days ago by Profile Raistmer
Post:
@TBar
Lets continue discussion on required Mac ATi apps modification on beta site.
I posted some questions there please respond (there).
5) Message boards : Number crunching : Vega Frontier Edition - MB Options Tuning (Message 1895052)
Posted 7 days ago by Profile Raistmer
Post:
As I recall few -tune lines can be provided one for each kernel. But not sure more than 1 implemented for MB.
6) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894836)
Posted 8 days ago by Profile Raistmer
Post:
When I tested it in Linux all I had to do was use the Non-SoG version to solve the Best Gaussian problem. If you look at the Apps page it shows Windows as the only platform that doesn't have a Non-SoG nVidia App. I would suggest you build a Non-SoG Windows App and see if it has the Gaussian problem, I'd wager it won't.

To use another app doesn't mean to fix the bug. Any bug in any software could be "fixed" by abandon app usage ;)


You can see the Mac results at Beta, along with the posts in the Questions & Answers section,
http://setiweb.ssl.berkeley.edu/beta/setiathome_v8_x86_64-apple-darwin__opencl_ati5_mac.html
http://setiweb.ssl.berkeley.edu/beta/setiathome_v8_x86_64-apple-darwin__opencl_ati5_SoG_mac.html
SoG OS 17.0.0 : hosts_success 0.5000 : results_success 0.6091
Non SoG OS 17.0.0 : hosts_success 1.0000 : results_success 0.8854
This Host just changed to the Non SoG App after posting in Q & A, https://setiathome.berkeley.edu/results.php?hostid=8248108&offset=40

I'll take detailed look, thanks.
7) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894789)
Posted 8 days ago by Profile Raistmer
Post:

But, we know the source of the Gaussian problem, it could be fixed very simply.

Really? Send me the patch then please. I'll add it to repo from old host.

BTW Raistmer, do you think you could help convince Eric to switch the Mac ATI App over to the non-SoG version? As more people update to the new OS the number of Failed tasks are going to be impressive.

Think it's possible. Just abandon some of plan classes for Mac. Post few such hosts please.
8) Message boards : Number crunching : "BOINC portable" for Windows hosts (Message 1894729)
Posted 9 days ago by Profile Raistmer
Post:
There's Tools -> Retry pending transfers in Manager and "--network_available" in boinccmd.

Thanks, but I don't use GUI part at all for this endeavor and editing XML once would be preferable than to introduce boinccmd.
9) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894727)
Posted 9 days ago by Profile Raistmer
Post:

To the pulse issue that is not a pulse (Not a reported one): Do not look at the peak. Look at the score. Score is used to determine if a pulse should be reported. The s2 sometimes misses one but that is a rare occasion.

The single signal if I recall correctly where score depends not only from peak is Gaussian.
In all other peak and score correlate monotonically.


Then, if it is said by the administration that a pulse should be reported the it will -- and they allow half of them to be wrong.

Very bad point of view and approach in general. For result to be correct ALL signals should match in ~1% tolerance. No any "half to be wrong" at all! Where "half" appears is just credits awards to encourage users to continue participation, nothing more. Devs should never take this into account. We are here not to collect credits.

If the score is less than a given threshold then it is reported as best so far just to make the screen saver happy and to make an educated guesses of a sequential apps inner workings.

Nope. To check processing correctness even on relatively silent data sets.

The is no scientific meaning in those not reported but best anyway still pulses. They are there to prevent faking. One could say that no pulses were found without scanning through all possibilities. The best but not reported is a sanity check. If my app fails that sometimes it is not so big a deal.

So it has meaning. Yes, some sort of CRC for processing pipeline to be short.
And I'm working on it.

That's good part. Either processing logic for "best" in your app greatly differs from "reportable" or issue with best could show itself on reportable ones too. So better to explore this more.



The bigger problem is that there are people running zi3t2 that is faster but does not sometimes report all true pulses. The t2 has a parallel only pulse search (it is fast) but it is not valid. The s2 is far much better. When it finds a suspect best or a true pulse it reverts back to sequential search. The t2 does not.

That's usual issue with any open testing. Not all testers follow guidlines. And that's why we all agreed so far that Windows builds should be postponed.

My SW does all the work needed. No faking. Everything is computed.

Nobody suspect anything else AFAIK. But your code just as stock one is GPL so available. And nor you not anyone else could prevent to use it in malice way. So as baseline we should not neglect "CRC checking".

The problem is in (storing intermediate results on same PoT) the reporting, my lack of time during the weeks I have to go to the work and the day having only 24 hours in it during the weekends.

And that would require additional serialization perhaps. And any serialization quite slow on big GPUs. That will constitute an issue when you will face the need to replace "fast but wrong some time" to "rock-stable but a little slower". Especially taking into account that "half could be wrong" part and obsession with credits for some participants.
10) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894685)
Posted 9 days ago by Profile Raistmer
Post:
Do we have this task offline?
Here you go, Raistmer: WU2705262578

Got it, thanks.
Still had to restore building environment to hunt any bugs in OpenCL and very limited on free time to setup Linux host to help with Petri's app bughunting but TestCase reserved for the future...
11) Message boards : Number crunching : "BOINC portable" for Windows hosts (Message 1894683)
Posted 9 days ago by Profile Raistmer
Post:
Depends if you have a fixed schedule for letting the offline hosts go online - although you mention the situation in your opening post, you don't explain the circumstances or give any idea of timing.

If you, perhaps, take them all home so they can be online 'through night', then set up a networking schedule to allow network use say half an hour after your normal connection setup, and to disallow network use half an hour before you dismantle, that should cover it.

Interesting idea, thanks.
Offline hosts are offline forever. Data directory physically transfered onto another device for network communications.
Such network schedule can be implemented into global options override XML file, correct?
12) Message boards : Number crunching : "BOINC portable" for Windows hosts (Message 1894675)
Posted 9 days ago by Profile Raistmer
Post:

And if it's offline? The reporting requests will go into extended backoff, and take no time at all.

So,when data directory will be taken to internet host - would that backoff remain? Or re-launching boinc.exe on internet host will trigger immediate connection attempt again?

Project communication deferrals are kept when restarting BOINC. I would guess that file transfers and other backoff times are retained as well.

Then cc_config will not help :/


If you don't want to use boinccmd -project url update
It looks like you could update <rsc_backoff_time> and <rsc_backoff_interval> in the client_state.xml for each project.
File transfer backoff timers would likely be similar. I imagine some kind of script could be run to reset all of the xml values.

Sounds not much easy then handling boinccmd. In both cases additional scripting required and at least I have your examples for boinccmd.
Well, keeping internet host online through night almost solves issue.
13) Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use (Message 1894674)
Posted 9 days ago by Profile Raistmer
Post:
It would seem it's the Same old Race condition that's been present since the Unroll was added. Since the two CUDA Apps didn't Validate, the chance of Cross Validation in Low. It's been like this for a while and was labeled harmless by other developers some time ago. As far as I know, the previously discovered Best Gaussian problem discovered with the Windows SoG App DOES cross validate, and STILL EXISTS. You don't seem very concerned about that problem, and it's actually more troublesome than an occasional race condition with the Best Pulse.

It's not harmless.
If it's same PulseFind issue as before it's true bug. 7.6 and 0.7 are too different to be explained by limited computational precision.
And no need to refer on one bug to hide/diminish another. Bug is bug anyway.
Do we have this task offline?

[quote]Quite a significant difference in the Best Pulse on this WU.

Workunit 2705262578 (07ap07aa.16319.13160.7.34.221)
Task 6080947466 (S=1, A=0, P=0, T=9, G=0, BG=0) v8.20 (opencl_ati5_SoG_mac) x86_64-apple-darwin
Task 6080947467 (S=1, A=0, P=0, T=9, G=0, BG=0) x41p_zi3xs2, Cuda 9.00 special

One of my machines holds the tiebreaker.
So much for tiebreaking. My host showed yet another significantly different Best Pulse. The three apps and their reported Best Pulses are:

v8.20 (opencl_ati5_SoG_mac) x86_64-apple-darwin: peak=7.699861, time=103.2, period=0.5112, d_freq=1419657277.7, score=0.9625, chirp=11.364, fft_len=256
x41p_zi3xs2, Cuda 9.00 special: peak=0.751317, time=13.42, period=0.02444, d_freq=1419661865.23, score=0.7804, chirp=0, fft_len=8
x41p_zi3v, Cuda 8.00 special: peak=0.6058947, time=41.94, period=0.01732, d_freq=1419654541.02, score=0.8102, chirp=0, fft_len=8

The WU is now in the hands of a fourth host. Not good.
To finish this one off, the 4th host has reported, matched the 1st one, and everybody got validated in the end, even though both versions of the Special App appear to have missed the mark by quite a bit.

v8.22 (opencl_nvidia_SoG): peak=7.699859, time=103.2, period=0.5112, d_freq=1419657277.7, score=0.9625, chirp=11.364, fft_len=256

Keep in mind, this was not an overflow WU. This was a high AR Arecibo WU that ran to full term.

14) Message boards : Number crunching : "BOINC portable" for Windows hosts (Message 1894396)
Posted 11 days ago by Profile Raistmer
Post:

And if it's offline? The reporting requests will go into extended backoff, and take no time at all.

So,when data directory will be taken to internet host - would that backoff remain? Or re-launching boinc.exe on internet host will trigger immediate connection attempt again?
15) Message boards : Number crunching : "BOINC portable" for Windows hosts (Message 1894337)
Posted 11 days ago by Profile Raistmer
Post:
Thanks, Jord and Richard.
What I meant about behavior of offline host is if it will incur additional overhead for boinc process to constantly attempt to connect and fail. Or not?
16) Message boards : Number crunching : "BOINC portable" for Windows hosts (Message 1894318)
Posted 11 days ago by Profile Raistmer
Post:
err - cc_config has an option for <report_results_immediately>. Why would that not do?

I use7.6.33. And that option marked as "new in 7.7".
17) Message boards : Number crunching : "BOINC portable" for Windows hosts (Message 1894317)
Posted 11 days ago by Profile Raistmer
Post:
err - cc_config has an option for <report_results_immediately>. Why would that not do?

How they will behave on offline host?
18) Message boards : Number crunching : "BOINC portable" for Windows hosts (Message 1894162)
Posted 12 days ago by Profile Raistmer
Post:
I hoped to avoid boinccmd usage (this would be much simplier - no need to specify/use particular ports) . Issue with overcommitting of internet host was solved by --start_delay 60000 switch. Butafter few iterations I found that some instances not reporting results even being online few hours. Strange. I launch ~10 instances. After few hours online some reported all results and downloaded new tasks but some still have many result files in project directory.

Seems forced project update is required and it in turn requires boinccmd usage to order boinc do communication with project:/

P.S. NowI do upload/download directly from flash that eliminates one copy. Updating offline hosts still time consuming though.
19) Message boards : Number crunching : Anything relating to AstroPulse tasks (Message 1892594)
Posted 20 days ago by Profile Raistmer
Post:
Any updates on RX480 fixes testing?
20) Message boards : Number crunching : "BOINC portable" for Windows hosts (Message 1892175)
Posted 22 days ago by Profile Raistmer
Post:
And now my "portable boinc" experiment has new bottleneck - time to put data from internet host to flash drive.
Estimated as ~16 minutes... Will be inconvenient on regular usage.
But cause I move whole data directory all executables/binary caches are moven back and forth too - quite considerable share of whole data transfer actually.
Here some optimisation should be possible....


Next 20


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.