Message boards :
Number crunching :
Strange Invalid MB Overflow tasks with truncated Stderr outputs...
Message board moderation
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
- Use communication mechanisms for 'asking' and 'negotiating' with OS and applications, instead of 'commanding'. Issuing imperative orders on systems stressed by your own (boinc client) doing is likely to end in tears. In general, I think the same approach works for human beings, too. It's a useful philosophy to keep in mind as we move towards the implementation/incorporation/deployment/distribution phases of both the BOINC fixes that are being examined - CreditNew and API. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
PS - just called up the the properties for stderr.txt for slot 0 - which is where my cuda apps tend to run. It's saying The Old New Thing: The apocryphal history of file system tunnelling |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
PS - just called up the the properties for stderr.txt for slot 0 - which is where my cuda apps tend to run. It's saying Now on top of that, you can thrown in Logical disk volume management layers, that can concatenate and reorder operations. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Testing Day 7. Seems the truncated Stderr outputs are history. A recent run of the targeted short overflows with a spike count less than 30 has been completed without an instant invalid encountered. http://setiathome.berkeley.edu/results.php?hostid=6979629&offset=40&show_names=0&state=0&appid=11 Is that singing I hear? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Cheers!, Alright, probably got enough ammunition with respect to the current and long term pre-existing boincapi limitations. "We have the technology to rebuild him...". I'll factor all that in early in x42 (which all this poking around was actually planned as part of phase 1 consolidation). I will have to try find GCC/Linux/Mac type equivalent procedures along the way, prior to presenting anything to Boinc for inclusion (or not). Back into my secret laboratory for a month or so. If you hear screams it's probably just me poking at Boinc code. In the meantime if anyone else experiences similar instant invalids, you can just point them to the workaround builds. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Back into my secret laboratory for a month or so. If you hear screams it's probably just me poking at Boinc code. A scene from the movie Swordfish comes to mind; compile...compile...COMPILE... I will have to try find GCC/Linux/Mac type equivalent procedures along the way, prior to presenting anything to Boinc for inclusion (or not). A new Mac version? You mean I might need to put the 250 back in the Mac for awhile? I could do that, we have the ability... |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Back into my secret laboratory for a month or so. If you hear screams it's probably just me poking at Boinc code. LOL - Sorry but i can´t loose the oportunity... Just imagine Jason´s working on x42 in his secret lab... Hope his beer stock is at max capacity. :) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
A new Mac version? You mean I might need to put the 250 back in the Mac for awhile? I could do that, we have the ability... That's down the line, and I know Edward's wrestling with Cuda SDK library issues there. I'm trying to gradually tie up the platforms, find their weak points, so that in later x42 mixing processing nodes of different types will be feasible. Linux is messy but workable. To be of better use and try help get every platform in line somehow consistently, I'm considering getting hold of a refurbished nv equipped iMac. A bit pricey for me even for the old versions, but a gaping hole in my development lab at the moment. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I never was a fan of all-in-ones. They reminded me of a hard to carry laptop. My sister had an iMac. The first time I pulled it apart to add memory I discovered it was a laptop...in a hard to carry case. I do have a very old G4 laptop ;-) It would be nice to have a Mac CUDA App that was close to the same speed as the Linux version. The one I tested for two days was rather disappointing in the speed department, otherwise, it ran fine. Happy Keyboards. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Aaaarrrggghh! I just got my first "instant" Invalid on a CPU task (3353106284) with a truncated Stderr. Name 11au13ab.9802.15790.438086664203.12.212.vlar_0 Workunit 1411606537 Created 25 Jan 2014, 22:33:45 UTC Sent 26 Jan 2014, 2:35:17 UTC Received 27 Jan 2014, 2:31:37 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 6915017 Report deadline 20 Mar 2014, 7:34:59 UTC Run time 37,043.39 CPU time 28,236.88 Validate state Invalid Credit 0.00 Application version SETI@home v7 Anonymous platform (CPU) Stderr output <core_client_version>7.2.33</core_client_version> <![CDATA[ <stderr_txt> setiathome_v7 7.00 DevC++/MinGW/g++ 4.5.2 libboinc: 7.1.0 Work Unit Info: ............... WU true angle range is : 0.010816 Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_vGetPowerSpectrumUnrolled 0.000779 0.00000 sse1_ChirpData_ak8h 0.036948 0.00000 v_vTranspose4x8ntw 0.023501 0.00000 BH SSE folding 0.005089 0.00000 </stderr_txt> ]]> Wingman got results of: Spike count: 21 Autocorr count: 0 Pulse count: 6 Triplet count: 3 Gaussian count: 0 so that fits with the pattern that we identified with the Cuda and ATI tasks. This happened on my old P4 laptop, 6915017, which works so very hard (10+ hours on this task before it overflowed) for every little bit of productivity it can scratch out! ;^) The irony is that, after running stock apps for 11 months, I just switched to Lunatics 2 days ago because there was a rare AP in the queue and I wanted to see if it would cut down the 4+ day run time (but it got a 30/30 overflow, so my little test was moot). I think for MB it's still actually running stock (Lunatics didn't seem to recognize SSE2), and I've seen Stderr truncation on the machine before, just never an instant Invalid. Well, obviously this is a situation that Jason's Cuda efforts won't address, so I sure hope that the simple fix to the validator that Joe identified can be implemented! |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
... That's actually a Pentium M laptop. The add-on for the installer uses CPU detection code from the x264 codec, and the authors of that codec found that the implementation of SSE on Pentium M was slower for their purposes so specifically turned off the detection. That's why the installer only offered the stock app. You may want to do a manual upgradse to the Lunatics SSE2 app, it should be about 20% better. Well, obviously this is a situation that Jason's Cuda efforts won't address, so I sure hope that the simple fix to the validator that Joe identified can be implemented! I guess we've pretty well covered the various conditions which affect the issue, so I'll send Eric an email with the suggestion this week. Joe |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
The Pentium M is actually a Pentium III Tualatin with the front side bus of a P4 added to it. http://en.wikipedia.org/wiki/Pentium_M Cheers. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
The Pentium M is actually a Pentium III Tualatin with the front side bus of a P4 added to it. Partially true, but the max L2 cache on a Tualatin was 512K, while the first generation Banias Pentium M's like my host 2818173 have 1M, and Jeff's is a Dothan which should have 2M. It's curious that BOINC 7.2.33 isn't showing the cache size. That cache also is more efficiently organized than a Pentium III. And of course the upgrade from SSE to SSE2 adds some useful instructions. Joe |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
Yes and the link that I supplied explains all that quite well. ;-) Cheers. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
That's actually a Pentium M laptop. The add-on for the installer uses CPU detection code from the x264 codec, and the authors of that codec found that the implementation of SSE on Pentium M was slower for their purposes so specifically turned off the detection. That's why the installer only offered the stock app. You may want to do a manual upgradse to the Lunatics SSE2 app, it should be about 20% better. Good to know. A 20% boost would be nice, might even get the RAC close to 200! ;^) I've gone ahead and done the manual SSE2 install and the machine's happily running that version now (at least for MB). ...and Jeff's is a Dothan which should have 2M. Don't know about the Dothan part, but the L2 cache is only 1M. Edit: Just checked with CPU-Z. It says it's a Banias. I guess we've pretty well covered the various conditions which affect the issue, so I'll send Eric an email with the suggestion this week.Joe Great! Sure hope he agrees to put it in soon. Thanks, Joe! |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Got an "instant" Invalid on my top cruncher (7057115) today. It's task 3357983534, which ran for over 15 minutes before it overflowed: Name 25se13ab.32409.271337.438086664200.12.124_1 Workunit 1413925062 Created 28 Jan 2014, 10:33:45 UTC Sent 28 Jan 2014, 14:25:16 UTC Received 29 Jan 2014, 2:02:12 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 7057115 Report deadline 17 Mar 2014, 6:45:10 UTC Run time 918.98 CPU time 186.05 Validate state Invalid Credit 0.00 Application version SETI@home v7 v7.00 (cuda42) Stderr output <core_client_version>7.2.33</core_client_version> <![CDATA[ <stderr_txt> </stderr_txt> ]]> Wingman got: Spike count: 10 Autocorr count: 0 Pulse count: 0 Triplet count: 20 Gaussian count: 0 That's the 2nd one this month on that machine and 7th overall for January across all my rigs. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
And the hits just keep on coming, so I guess the fix isn't in yet. Today's is task 3369552268. That's already the third one this month on my host 6980751, which makes it look like the pace is picking up, although I don't know why that would be. I've also had another one on host 7057115 since my last post. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It appears William is looking for Speed tests between the new build and the old build, http://www.arkayn.us/forum/index.php?topic=163.msg3944#msg3944 If someone could preform a little testing, it would be appreciated. I've been running the new build for quite a while, no problems since using the new App. The Bench tools are here, Test Tools - MultiBeam |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
It appears William is looking for Speed tests between the new build and the old build, http://www.arkayn.us/forum/index.php?topic=163.msg3944#msg3944 People will have to be an alpha tester in order to see that thread. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
It appears William is looking for Speed tests between the new build and the old build, http://www.arkayn.us/forum/index.php?topic=163.msg3944#msg3944 So...What do you suggest? If someone wants to run the bench they should post the results here? There's really not much at that link. Just a couple people posting the Bench results. Is there any preference of which bench and WU to run? Special 'commit to disk' mode x41zc builds, http://jgopt.org/download.html |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.