Message boards :
SETI@home Enhanced :
Status report...
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Linux version 5.17 has been released. If you are having problems with beta coexisting with public running with an anonymous platform/app_info.xml application, that was a boinc bug that has been fixed in the most recent alpha release. You can get it here. We've gotten quite a few bug fixes (most in the BOINC code) so we'll probably release 5.18 fairly quickly. Eric ![]() |
![]() ![]() Send message Joined: 14 Aug 06 Posts: 22 Credit: 190,000 RAC: 0 ![]() |
Hi Eric, thanks for the status update. I've got a question - rather, a favour - to ask: could you please update the tarballs at http://setiathome.berkeley.edu/~korpela/build/ as well? Your home dirs seem more current than CVS, which (yesterday, anyway) still had the most recent X86-related edit somewhen in July. It would be really nice to have a more recent image of the build dirs (and not have to mirror the whole dir with wget or the like and suck up bandwidth). Thank you for your time, Simon. |
![]() Send message Joined: 16 Jun 05 Posts: 47 Credit: 147,346 RAC: 0 ![]() |
Linux version 5.17 has been released. one of the most called Question is at what time we got the New WUs When will we get data from the Multi-Beam Receiver? ... btw @Eric thanks for your Work here... and the Information that is given... Greetings from Germany NRW Ulli ![]() ![]() |
![]() ![]() Send message Joined: 14 Nov 05 Posts: 296 Credit: 13,874 RAC: 0 ![]() |
Eric How is the work going on 5.18? With your last statement I kinda thought we would see it before now. Hope you have not run into any problems that are causing a delay. I'm just a little anxious to get back to work here since I can't do any units at the moment on win98se. Steve 98SE XP2500+ @ 2.1 GHz Boinc v5.8.8 ![]() ![]() |
Send message Joined: 25 Mar 06 Posts: 100 Credit: 61,559 RAC: 0 ![]() |
Hello? Is there anybody out there? |
![]() Send message Joined: 14 Jun 05 Posts: 200 Credit: 68,273 RAC: 0 |
Hello? Is there anybody out there? nope. LOL |
![]() ![]() Send message Joined: 15 Jun 05 Posts: 399 Credit: 16,571,350 RAC: 0 ![]() |
For those who are wondering about the status. Eric is now in a sort of "maniac" mode, optimizing the cruncher for several architectures. But he doesn't seem to be so maniac as to use GPU for calculation. He is writing hand-assembly code for SIMD. I'm not sure which is faster, his code or ICC's code in each function. Either way, new cruncher chooses faster routines, so the effort will be rewarded. As for Intel processors, I asked him to add a flag for new "CORE" processors with IPP. If you use cvs, you can see what's going on. Luckiest in the world. WMD = Weapon of Mass Distraction |
Send message Joined: 16 Jun 05 Posts: 22 Credit: 12,583 RAC: 0 ![]() |
Eric Don't forgot Win9x please, i can't crunch if the 'annoying little problem' is not fixed. ![]() |
![]() ![]() Send message Joined: 16 Jun 05 Posts: 2531 Credit: 1,074,556 RAC: 0 ![]() |
Thanks for the update Tetsui. Mike With each crime and every kindness we birth our future. |
Send message Joined: 19 Jun 05 Posts: 42 Credit: 9,057 RAC: 0 ![]() |
I'm not sure which is faster, his code or ICC's code in each function. Either way, new cruncher chooses faster routines, so the effort will be rewarded. As for Intel processors, I asked him to add a flag for new "CORE" processors with IPP. If you use cvs, you can see what's going on. Thanks for the update Tetsui. I've 3 Conroes running 24/7 at home and would like to give a "Core 2 Duo/SSE4" app a shot. Any link to download compiled version for Windows? Thanks... |
Send message Joined: 11 Sep 05 Posts: 51 Credit: 27,831 RAC: 0 ![]() |
For those who are wondering about the status. I'm not sure if "maniacing" into asm is the right way to go, but hell if he likes it why not. Personally i'd go for "Intrinsics" which by the way will be more efficient and even more if it comes in combination with icc etc... Any update on when a final source or the "nightly tarballs" will show up again ??? |
![]() ![]() Send message Joined: 14 Aug 06 Posts: 22 Credit: 190,000 RAC: 0 ![]() |
I'm not sure which is faster, his code or ICC's code in each function. Either way, new cruncher chooses faster routines, so the effort will be rewarded. As for Intel processors, I asked him to add a flag for new "CORE" processors with IPP. If you use cvs, you can see what's going on. Thanks Tetsuji :o) Honza, I've compiled several SSE4-optimized applications and tested them vs. SSE3 ones on Core 2 and Woodcrest (Core 2 Xeon) systems - they were identical in size and speed. SSE4 adds mostly integer SIMD operations, which do not make up a lot of the total processing time. Regards, Simon. |
Send message Joined: 19 Jun 05 Posts: 42 Credit: 9,057 RAC: 0 ![]() |
Honza, I've compiled several SSE4-optimized applications and tested them vs. SSE3 ones on Core 2 and Woodcrest (Core 2 Xeon) systems - they were identical in size and speed. SSE4 adds mostly integer SIMD operations, which do not make up a lot of the total processing time.Thanks for the answer. I thought SSE4 would be of no big benefit and Core 2 would need a low-level optimalization aka akosf in order to get benefit of new architecture (instruction fussion etc.) |
![]() Send message Joined: 15 Mar 05 Posts: 1547 Credit: 27,183,456 RAC: 0 ![]() |
Hi all, Sorry again about the lack of news. I've been running around with my hair on fire as usual. The reason I created the assembly SSE versions of the power spectrum routine was that a was totally fried on everything else I was doing (primarily the multibeam splitter and the pointing correction code it requires) and I needed a diversion. If I do the same thing for too long, my brain freezes in that position. An entirely different problem can unstick it. Looking at the timings, it wasn't much of an improvement. I'm not surprised because the power spectrum calculation timing is dominated by memory access speeds. Even adding prefetch instructions didn't help. At some point I hope to get the Intel compiler and GCC 4, so things can be autovectorized for various processors. If anyone wants to write an SSE/SSE2/SSE3/SSSE3 matrix transpose and a chirp function, feel free (and please send it to me). Shouldn't be too difficult to base one on Alex Kan's chirp function. Also on the agenda is getting a function timer/validator for gaussian fitting and pulse finding so optimized versions can be used. If anyone wants to do these things, I'll gladly accept the help. Or if someone want to develop the equivalent functions in 3D-Now!, or VIS, or OpenGL shader language for that matter, it would be fine with me. Now that the code for timing and validating functions is in there, it should be fairly easy. I'm hoping recent BOINC changes have fixed some of the Windows 98 problems, but until I get a new version out, I'm uncertain. If anyone else wants to do a compile and give it out to some Win 98 testers, again, feel free. I wasn't aware that tarballs had stopped being generated. I'll try to fix it today. This week I have a couple proposals to do, and I still need to verify and test the splitter mods. We have a couple 500GB disks full on multi-beam data coming back from Arecibo tomorrow. So I probably won't get a release out this week. [edit]BTW, welcome back Crunch3r.[/edit] Eric ![]() |
Send message Joined: 12 Sep 06 Posts: 9 Credit: 0 RAC: 0 ![]() |
Looking at the timings, it wasn't much of an improvement. I'm not surprised because the power spectrum calculation timing is dominated by memory access speeds. Even adding prefetch instructions didn't help. Converting FPU to SSE vectorized can get about twice as fast (depending on FSB, memory speed, etc). Its not CPU bound, generally. If anyone wants to write an SSE/SSE2/SSE3/SSSE3 matrix transpose and a chirp function, feel free (and please send it to me). Shouldn't be too difficult to base one on Alex Kan's chirp function. Posting them over at Simon's KWSN site. Found a way not to need separate buffer or separate transpose function...same bin reordering. Also on the agenda is getting a function timer/validator for gaussian fitting and pulse finding so optimized versions can be used. If anyone wants to do these things, I'll gladly accept the help. Have a new benchmark/validator source file. Uses cpu timer tics. Currently works with find_pulse, getPeakPower, getChiSq, chirpData, f_sum_table, sumTables2 (subset of find_pulse). Detects cpu abilities (currently x86), only tests what can run. Easy to add aditional functions to test. Josef Segur found a header file that has timer ticks reading versions for perhaps 10 different CPU types (powerPC, sparc, etc.) and compiler variations for GCC, IPP, and MSVCC versions...so cpu ticks should be fine. Regarding the current verify loop, accuracy += pow(diff, 2) will square (orig[i]-test[i]), but if both orig and test are small values (1e-7 for example) then the differences can be radically wrong, but still won't add much to the accuracy total. Suggest something like accuracy += abs(1-orig[i]/test[i]) |
![]() ![]() Send message Joined: 15 Jun 05 Posts: 399 Credit: 16,571,350 RAC: 0 ![]() |
Hi all, PowerSpectrumCalculation can be drastically improved if the input arrays are in separate real/imag parts. As you notice, no shuffle instructions are required, so only 3 instructions calculates 4 spectra, (mulps mulps addps) and it's amazingly fast. But as you see, the output of an fft is in real/imag/real/imag....format...(or some are both input/output are in separate real/imag arrays) and rearrange takes time. I tried it, and overall performance was a bit worse...:( So I was looking for fft function whose input is in real/imag/real/imag...format, and whose output is separate real/imag array, but I cannot fine one. With such a fft, PowerSpectrum could be calculated very quickly. I think I know much of sse3, but unfortunately so far I cannot produce faster PowerSpectrum than icc with inline assembly. Ironically, icc is the best. I made an assembly version of v_GetPOwerSpectrum() in the era of yaoscw-8.1 for Linux, but icc was faster. That's why I stick to the original v_GetPowerSpectrum with icc, especially. I also tried this with hand-assembly, but icc beats me although it produces longer code, because it often makes functions inline (while it doesn't make functions inline assembly inline). ipp also provides GetPowerSpectrum, but icc produces faster one. So I personally conclude under most circumstances, ICC is the fatest for intel, ironically. The best way is inlining the function :) (yes it works with v_GetPowerSpectrum) And I also tried to approximate sine/cosine with polynomials (up to 6th factor) http://setiathome.berkeley.edu/forum_thread.php?id=19865&nowrap=true#171295 but icc's math library beats it in speed and also in accuracy (as a matter of course!). But glibc's math library cannot beat it. I think I referred to http://www.weblearn.hs-bremen.de/risse/RST/docs/Parallel/03-041.pdf. but I cannot afford time to do it again. regards -Tetsuji PS: Now I understand why you moved to devcpp on Window$. Luckiest in the world. WMD = Weapon of Mass Distraction |
Send message Joined: 12 Sep 06 Posts: 9 Credit: 0 RAC: 0 ![]() |
And I also tried to approximate sine/cosine with polynomials (up to 6th factor) http://setiathome.berkeley.edu/forum_thread.php?id=19865&nowrap=true#171295 but icc's math library beats it in speed and also in accuracy (as a matter of course!). But glibc's math library cannot beat it. I think I referred to http://www.weblearn.hs-bremen.de/risse/RST/docs/Parallel/03-041.pdf Alex Kan's sse3 vectorized sin/cos approximation is the fastest chirp I've seen yet. Faster than the 32MB table. And accurate too... Funcname cpu ticks fast accuracy orig_ChirpData--: 286898730 x1.00 0 TrigArray: 60655122 x4.73 1.4e+006 unrolled: 58550761 x4.90 1.4e+006 aks_sse3_chirp: 41761096 x6.87 8.9e-009 |
![]() ![]() Send message Joined: 14 Aug 06 Posts: 22 Credit: 190,000 RAC: 0 ![]() |
Hi, Eric and Tetsuji, you are of course both welcome to join in at http://lunatics.at. I had not specifically offered this to you only because I assumed your timetable is too swamped as it is (not wholly wrong there, from your posts). Kind regards, Simon. |
Send message Joined: 3 Mar 06 Posts: 261 Credit: 223,125 RAC: 0 ![]() |
I wasn't aware that tarballs had stopped being generated. I'll try to fix it today. What is a tarball? ![]() ![]() Mars 2019 Petition <-- Sign, please. |
Send message Joined: 8 Sep 05 Posts: 82 Credit: 545,522 RAC: 0 ![]() |
I believe it'a a TapeARchive file (Unix) see http://en.wikipedia.org/wiki/Tar.gz |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.