V8 online :) running SSSE3

Author	Message
Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0	Message 524770 - Posted: 1 Mar 2007, 1:04:04 UTC So, I start with the regular SSSE3 optimized code you have already, for a week, then, I ll push mine and see :) Vectorization is accomplished :) we ll see ! time to warm up the transistors! who? ID: 524770 ·

zombie67 [MM] Volunteer tester Send message Joined: 22 Apr 04 Posts: 758 Credit: 27,771,894 RAC: 0	Message 524771 - Posted: 1 Mar 2007, 1:06:01 UTC w00t! Dublin, California Team: SETI.USA ID: 524771 ·

KWSN - Chicken of Angnor Volunteer developer Volunteer tester Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0	Message 524831 - Posted: 1 Mar 2007, 3:14:29 UTC Salut! Which regular SSSE3 are you talking about, the recently released 2.2 apps? Looking forward to see your results, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information ID: 524831 ·

Alex Kan Volunteer developer Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0	Message 524888 - Posted: 1 Mar 2007, 6:27:26 UTC - in response to Message 524770. So, I start with the regular SSSE3 optimized code you have already, for a week, then, I ll push mine and see :) Vectorization is accomplished :) Why wait a week, unless the new code isn't ready? If the new code is faster, there's no reason to continue running the 2.2 apps once we can tell what kind of times it turns in on your system at different ARs. Do you have any preliminary estimates as to how your code compares to 2.2? Also, you've mentioned "Merom alignment loads" in the past, so I'm curious how useful you found them for pulse-finding, considering that shifts have to be specified at compile-time. (Give me lvsl and vperm over PALIGNR any day. :P) For that matter, which SSSE3 instructions did you wind up finding useful? ID: 524888 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51562 Credit: 1,018,363,574 RAC: 1,004	Message 524906 - Posted: 1 Mar 2007, 8:38:30 UTC Who?!! So nice to see you here again after such a long time away. If you would like to send me your code, I could do a quick test run on my X6800 or QX6700 using the benchmark application from Simon, comparing it against the latest 2.2b release on the test WUs we have been using to compare various optimization strategies. It would give us a very quick idea of what might be working better at the moment. I am very curious what you may have come up with. Simon can put you in touch with me, or let me know how I may contact you. Once again, welcome back. "Time is simply the mechanism that keeps everything from happening all at once." ID: 524906 ·

Pepo Volunteer tester Send message Joined: 5 Aug 99 Posts: 308 Credit: 418,019 RAC: 0	Message 524951 - Posted: 1 Mar 2007, 14:27:51 UTC Last modified: 1 Mar 2007, 14:29:04 UTC Nice - 4 x speedup compared to equally clocked P4 3GHz with stock app. But would be better to see it paired with some Core2 running also 2.2B. During the warmup phase. Peter ID: 524951 ·

Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0	Message 524969 - Posted: 1 Mar 2007, 15:51:47 UTC - in response to Message 524906. Last modified: 1 Mar 2007, 16:28:12 UTC Who?!! So nice to see you here again after such a long time away. If you would like to send me your code, I could do a quick test run on my X6800 or QX6700 using the benchmark application from Simon, comparing it against the latest 2.2b release on the test WUs we have been using to compare various optimization strategies. It would give us a very quick idea of what might be working better at the moment. I am very curious what you may have come up with. Simon can put you in touch with me, or let me know how I may contact you. Once again, welcome back. i ll keep my code in hand for some time, to tickle AMD ;-) hehehehe Let's see if they have some good coder too, their 40% claim will collapse, they try to pretend that they have a leap as good as conroe, from what i know, it is not the case at all, I think they will be still more than 20% behind on Seti and 30 on rosetta. If they had anything fast, they will be showing it by now. They will be using SPEC FP rate, and that' it: on the rest, bye bye ... [edited] I forgot to add, optimizing SETI is all about data locality in the L1, and all about SIMD. You thought linpack was fast ... Demo in a week ... I wait one week , because i want to be able to claim the exact improvement, so, i want to make my RAC stable before I start running it. Who? PS: This is my personal opinion, my employer is not responsable for it. ID: 524969 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51562 Credit: 1,018,363,574 RAC: 1,004	Message 524982 - Posted: 1 Mar 2007, 16:25:36 UTC - in response to Message 524969. Who?!! So nice to see you here again after such a long time away. If you would like to send me your code, I could do a quick test run on my X6800 or QX6700 using the benchmark application from Simon, comparing it against the latest 2.2b release on the test WUs we have been using to compare various optimization strategies. It would give us a very quick idea of what might be working better at the moment. I am very curious what you may have come up with. Simon can put you in touch with me, or let me know how I may contact you. Once again, welcome back. i ll keep my code in hand for some time, to tickle AMD ;-) hehehehe Let's see if they have some good coder too, their 40% claim will collapse, they try to pretend that they have a leap as good as conroe, from what i know, it is not the case at all, I think they will be still more than 20% behind on Seti and 30 on rosetta. If they had anything fast, they will be showing it by now. They will be using SPEC FP rate, and that' it: on the rest, bye bye ... Who? PS: This is my personal opinion, my employer is not responsable for it. No problem Who?. Mebbe you could download the test package from Simon and run it on your own machine if you are not ready to release the code yet. You could then get a direct comparison between your coding and the 2.2b Chicken app. I think you would have to get Simon to register you as a pre-release tester on his site to get the most current test package. Let me know the results if you do. You can post the the test results without giving away any of your code, as that would reside only on your machine. Happy crunching! "Time is simply the mechanism that keeps everything from happening all at once." ID: 524982 ·

Alex Kan Volunteer developer Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0	Message 525033 - Posted: 1 Mar 2007, 18:22:52 UTC - in response to Message 524969. Last modified: 1 Mar 2007, 18:23:44 UTC I wait one week , because i want to be able to claim the exact improvement, so, i want to make my RAC stable before I start running it. High-performance machines like the ones you've attached to the project probably won't stabilize at their final RACs within one week. The scheduled (and unscheduled) outages and variance in AR tend to make even established RACs fluctuate anyway. If you want to talk about exact improvement, you're better off making a table of WU times for different claimed credits, before and after your modifications. ID: 525033 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51562 Credit: 1,018,363,574 RAC: 1,004	Message 525045 - Posted: 1 Mar 2007, 19:02:38 UTC - in response to Message 525033. Last modified: 1 Mar 2007, 19:02:59 UTC I wait one week , because i want to be able to claim the exact improvement, so, i want to make my RAC stable before I start running it. High-performance machines like the ones you've attached to the project probably won't stabilize at their final RACs within one week. The scheduled (and unscheduled) outages and variance in AR tend to make even established RACs fluctuate anyway. If you want to talk about exact improvement, you're better off making a table of WU times for different claimed credits, before and after your modifications. That's what Simon's test routine would do...allow testing of several different apps on the same test WUs, allowing apples to apples comparison of how the different apps perform crunching the exact same work. "Time is simply the mechanism that keeps everything from happening all at once." ID: 525045 ·

Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0	Message 525059 - Posted: 1 Mar 2007, 19:42:33 UTC - in response to Message 525045. I wait one week , because i want to be able to claim the exact improvement, so, i want to make my RAC stable before I start running it. High-performance machines like the ones you've attached to the project probably won't stabilize at their final RACs within one week. The scheduled (and unscheduled) outages and variance in AR tend to make even established RACs fluctuate anyway. If you want to talk about exact improvement, you're better off making a table of WU times for different claimed credits, before and after your modifications. That's what Simon's test routine would do...allow testing of several different apps on the same test WUs, allowing apples to apples comparison of how the different apps perform crunching the exact same work. I used this test application, but i figured out that very often, the apps tells you that your application works, but in fact, it does not. It tests speed, not fonctionality. I ll wait that my RAC get more stable, I believe only in RAC. who? ID: 525059 ·

KWSN - Chicken of Angnor Volunteer developer Volunteer tester Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0	Message 525072 - Posted: 1 Mar 2007, 20:11:06 UTC Last modified: 1 Mar 2007, 20:15:09 UTC Francois, what exactly are you talking about? The released Auto-Installer really is only a very basic benchmark tool. It exists to make people's lives easier, not for the most accurate and reproducible results. However, it will with 100% accuracy tell you whether an app works on your system or not; it does this by running each app and checking whether it runs at all, then whether it also produced a valid result or not. Yes, speed is tested, but, result validity also is. So, if you're talking about that, I'd really suggest you use this benchmark package instead. It's made for real benchmarking. There are instructions enclosed. I believe I've sent you links and suggestions for benchmarking before. Please let me ask you to either present some tangible results or even just a log file of what you describe - again, as asked of you before. When you use the wrong tools, is it the tools' fault? When you do not know that you're using the wrong tools, is it then? Honestly, I'm a bit surprised. Also, you must realize that RAC is inherently unstable, and a pretty unreliable performance measurement. You may only believe in it, but that doesn't make it any more valid from either a performance or a scientific viewpoint. Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information ID: 525072 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 21790 Credit: 7,508,002 RAC: 20	Message 525178 - Posted: 2 Mar 2007, 0:38:44 UTC - in response to Message 525072. Last modified: 2 Mar 2007, 0:39:11 UTC ...Also, you must realize that RAC is inherently unstable, and a pretty unreliable performance measurement. You may only believe in it, but that doesn't make it any more valid from either a performance or a scientific viewpoint. Interesting emphasis there from "Who?". Is this due to the primary emphasis being on whatever score is gained rather than chasing good science (and good programming)? Who? Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 525178 ·

Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0	Message 525184 - Posted: 2 Mar 2007, 0:49:13 UTC - in response to Message 525178. ...Also, you must realize that RAC is inherently unstable, and a pretty unreliable performance measurement. You may only believe in it, but that doesn't make it any more valid from either a performance or a scientific viewpoint. Interesting emphasis there from "Who?". Is this due to the primary emphasis being on whatever score is gained rather than chasing good science (and good programming)? Who? Regards, Martin wow! what was that ??? when it is about good science and good testing, i am your man! so, let's explain a little my statement: SETI work loads varies quiet a lot, and testing 1 set does not mean you have a good benchmark of the entier world. I am more for realistic test. I don t believe in synthetic test, i like real world stuff ... Memory test and all of those kind of junk, I let people what don t understand them using them ... who? ID: 525184 ·

Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0	Message 525187 - Posted: 2 Mar 2007, 0:57:06 UTC - in response to Message 525072. Francois, what exactly are you talking about? The released Auto-Installer really is only a very basic benchmark tool. It exists to make people's lives easier, not for the most accurate and reproducible results. However, it will with 100% accuracy tell you whether an app works on your system or not; it does this by running each app and checking whether it runs at all, then whether it also produced a valid result or not. Yes, speed is tested, but, result validity also is. So, if you're talking about that, I'd really suggest you use this benchmark package instead. It's made for real benchmarking. There are instructions enclosed. I believe I've sent you links and suggestions for benchmarking before. Please let me ask you to either present some tangible results or even just a log file of what you describe - again, as asked of you before. When you use the wrong tools, is it the tools' fault? When you do not know that you're using the wrong tools, is it then? Honestly, I'm a bit surprised. Also, you must realize that RAC is inherently unstable, and a pretty unreliable performance measurement. You may only believe in it, but that doesn't make it any more valid from either a performance or a scientific viewpoint. Regards, Simon. The previous version had some severe problems, I ll try this one and let you know. but generally, nothing better than a real work load, synthetic benchmark are good to be manipulated, look at Sandra memory test ... it is abused again and again by AMD, as soon as CSI will show up , AMD will tell you it is a bad benchmark ... hehehhe .... as they said recently about the game FPS measurement. In the case of FPS, I still believe it is a bad matrix, but people use it. the vTune profile of the SETI workloads is wild, depending on the unit and the months, what i develop will be immuned to those variations, and will be fast. I used more than 250 workloads to train the compiler and the baby HMM that does my tricks who? ID: 525187 ·

KWSN - Chicken of Angnor Volunteer developer Volunteer tester Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0	Message 525193 - Posted: 2 Mar 2007, 1:21:26 UTC Last modified: 2 Mar 2007, 1:30:08 UTC Okay, let's try once more :o) I'm not interested in your beliefs, Francois. I'm interested in your facts, methods and resulting data. Please share them. As for using real-world data - included in the benchmark package I linked for you are several real-world WUs of varying angle ranges. Granted, there are only 8 included, but nobody's keeping you from expanding the WU cache. So far, we've concentrated on source-level optimizations rather than profiling massive amounts of work processed. This has resulted in various new ways to process data vs. just trying to make old ways quicker. Given that the 2.2 (and the 2.0) apps process data based on system-specific benchmarks, profiling them is no simple task. A different machine may not even exhibit the "bottleneck" your profile run may have identified because it simply runs different portions of the same code, uses a different amount/mix of resources, ... An example of this would be the various new chirp functions, as well as pulse folding. Also, none of the people working on S@H optimization have found any useful SSSE3 instructions in regards to crunching speed. We've asked you repeatedly how you utilized it in your code; we're still interested, but no more informed (by you) than ever. My main problem with this is that you seem quick to either belittle or misrepresent other people's efforts while only hinting at the grandness of your own and not supplying any tangible evidence. One last time: back it up with numbers please, not with belief. Use reproducible methods, tell people about them. Let people form their own opinion. Until then, I'm out. Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information ID: 525193 ·

Francois Piednoel Send message Joined: 14 Jun 00 Posts: 898 Credit: 5,969,361 RAC: 0	Message 525216 - Posted: 2 Mar 2007, 2:35:36 UTC - in response to Message 525193. Last modified: 2 Mar 2007, 2:39:21 UTC Okay, let's try once more :o) I'm not interested in your beliefs, Francois. I'm interested in your facts, methods and resulting data. Please share them. As for using real-world data - included in the benchmark package I linked for you are several real-world WUs of varying angle ranges. Granted, there are only 8 included, but nobody's keeping you from expanding the WU cache. So far, we've concentrated on source-level optimizations rather than profiling massive amounts of work processed. This has resulted in various new ways to process data vs. just trying to make old ways quicker. Given that the 2.2 (and the 2.0) apps process data based on system-specific benchmarks, profiling them is no simple task. A different machine may not even exhibit the "bottleneck" your profile run may have identified because it simply runs different portions of the same code, uses a different amount/mix of resources, ... An example of this would be the various new chirp functions, as well as pulse folding. Also, none of the people working on S@H optimization have found any useful SSSE3 instructions in regards to crunching speed. We've asked you repeatedly how you utilized it in your code; we're still interested, but no more informed (by you) than ever. My main problem with this is that you seem quick to either belittle or misrepresent other people's efforts while only hinting at the grandness of your own and not supplying any tangible evidence. One last time: back it up with numbers please, not with belief. Use reproducible methods, tell people about them. Let people form their own opinion. Until then, I'm out. Simon. He! don't get me wrong,I love what you are doing ... Just I prefert my own way to measure performance. I will let my RAC get stable, then , I ll push the software in. who? ID: 525216 ·

Pappa Volunteer tester Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0	Message 525237 - Posted: 2 Mar 2007, 3:57:11 UTC Last modified: 2 Mar 2007, 4:56:23 UTC Sorry, FALSE ALARM! I do not see any remarkable machine(s) connected to Seti under your account... What you imply is you have an active machine that is working... Not testing Offline... IF You are failing to answer Simon (and Crew), who has Eric's Ear for the most compatible Seti Application, YOU ARE WRONG! To fail to use the most "accepted" Benchmark, You are WRONG! Everything about this Thread is WRONG! So at this point everything that you have stated points to a FRAUD! No Points No Fame! Not to mention your RAC Bears out this thread is a FRAUD! Then I have to figure out why should I take away from Donations to have to tell everyone this thread is a Fraud? You live in the UK, so Donations should be easy... If you truly work for Intel and have disregarded the plea for Hardware Donations... Including the on for the DEAD CPU on Sideous... What does this tell everyone? CAN YOU SAY FRAUD! Pappa Edit: If you are going to hide your name Please at least have the courtesy to change the Avitar, that gives you away... It Insults all those that have honest questions and advances! Please consider a Donation to the Seti Project. ID: 525237 ·

Alex Kan Volunteer developer Send message Joined: 4 Dec 03 Posts: 127 Credit: 29,269 RAC: 0	Message 525263 - Posted: 2 Mar 2007, 5:19:08 UTC Last modified: 2 Mar 2007, 5:30:18 UTC who?: I should clarify my previous statement about making a table of WU times vs. claimed credit. What I meant was that you should do this for the actual WUs crunched by your machines, instead of using benchmark suites. I think we see eye-to-eye on the distinction between benchmark performance and real-world performance. Where we don't see eye-to-eye, however, is the reliance on RAC to gauge performance, because of all the factors I mentioned in my previous post. To be honest, I don't know why you're choosing to measure the performance gain this way, instead of attaching machines with similar hardware configurations running two different apps. You seem to have no shortage of them still attached to Rosetta. Hopefully, they'll be running WUs of similar origins, so that should facilitate comparison of individual units. Besides, we mere mortals still don't have access to the processors in the 8-core machine you attached, so it's kind of hard to get a feel for what the RAC of that machine means, anyway. You're not going to disappear without showing off any of your work again, are you? Pappa: A little harsh, don't you think? Refusing to use the benchmark infrastructure that the Simon, Ben and Josef's apps use is hardly grounds for accusing someone of fraud. I don't use that benchmarking system either, and I feel that benchmarking in general (whether it's comparing apps or functions) is not always able to characterize real-world performance. If you want to see principle in action, consider the "new" chirp functions, which benchmark about the same as the array-based chirps but perform better in practice. Low RAC isn't necessarily grounds, either. I haven't actually attached a machine or crunched a WU in months, but that doesn't stop me from actively developing the Mac app. On the other hand, if weeks go by and no evidence of this app's existence ever materializes...I'd say that's grounds for accusations of fraud. :P Simon: who? makes an interesting point about the benchmark's ability to verify correctness. Although it's been a long time since I last checked, I seem to remember seeing that some of the test WUs' signal reporting thresholds haven't been lowered, and so the app doesn't report any signals besides the the best-of signals. Kind of makes it hard to test correctness when you have only one signal of each type to test with. ID: 525263 ·

KWSN - Chicken of Angnor Volunteer developer Volunteer tester Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0	Message 525269 - Posted: 2 Mar 2007, 5:34:33 UTC Last modified: 2 Mar 2007, 5:35:07 UTC Alex, that's a valid point. However, from what I understood, Francois meant the Auto-Installer tool (not the Knabench-derived .cmd script package). The Auto-Installer tool is definitely not meant for meaningful benchmarking, only to make sure the app installed runs on the host it's intended for, and to try to make a useful suggestion, with reasonable accuracy in a small amount of time. As for real-world vs. benchmark performance, we're all aware of the difference. Still; for reproducible results, it seems the most useful approach. I'm interested in possible avenues of improvement as far as performance testing goes - you know where to find me. Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information ID: 525269 ·

©2025 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.