Strange Invalid MB Overflow tasks with truncated Stderr outputs...

Message boards : Number crunching : Strange Invalid MB Overflow tasks with truncated Stderr outputs...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1468201 - Posted: 24 Jan 2014, 14:12:12 UTC - in response to Message 1468167.  

- Use communication mechanisms for 'asking' and 'negotiating' with OS and applications, instead of 'commanding'. Issuing imperative orders on systems stressed by your own (boinc client) doing is likely to end in tears.

In general, I think the same approach works for human beings, too. It's a useful philosophy to keep in mind as we move towards the implementation/incorporation/deployment/distribution phases of both the BOINC fixes that are being examined - CreditNew and API.
ID: 1468201 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1468237 - Posted: 24 Jan 2014, 14:58:40 UTC - in response to Message 1468129.  

PS - just called up the the properties for stderr.txt for slot 0 - which is where my cuda apps tend to run. It's saying

Created: 11 December 2013 14:02:40
Modified: 24 January 2014 11:11:03
Accessed: 24 January 2014 11:10:55

I'm going to have to think about that for a bit!

The Old New Thing: The apocryphal history of file system tunnelling
ID: 1468237 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1468247 - Posted: 24 Jan 2014, 15:18:14 UTC - in response to Message 1468237.  

PS - just called up the the properties for stderr.txt for slot 0 - which is where my cuda apps tend to run. It's saying

Created: 11 December 2013 14:02:40
Modified: 24 January 2014 11:11:03
Accessed: 24 January 2014 11:10:55

I'm going to have to think about that for a bit!

The Old New Thing: The apocryphal history of file system tunnelling


Now on top of that, you can thrown in Logical disk volume management layers, that can concatenate and reorder operations.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1468247 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1469127 - Posted: 26 Jan 2014, 14:37:15 UTC
Last modified: 26 Jan 2014, 14:41:21 UTC

Testing Day 7. Seems the truncated Stderr outputs are history. A recent run of the targeted short overflows with a spike count less than 30 has been completed without an instant invalid encountered.
http://setiathome.berkeley.edu/results.php?hostid=6979629&offset=40&show_names=0&state=0&appid=11

Is that singing I hear?
ID: 1469127 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1469139 - Posted: 26 Jan 2014, 15:03:53 UTC - in response to Message 1469127.  
Last modified: 26 Jan 2014, 15:04:16 UTC

Cheers!, Alright, probably got enough ammunition with respect to the current and long term pre-existing boincapi limitations. "We have the technology to rebuild him...".

I'll factor all that in early in x42 (which all this poking around was actually planned as part of phase 1 consolidation). I will have to try find GCC/Linux/Mac type equivalent procedures along the way, prior to presenting anything to Boinc for inclusion (or not).

Back into my secret laboratory for a month or so. If you hear screams it's probably just me poking at Boinc code.

In the meantime if anyone else experiences similar instant invalids, you can just point them to the workaround builds.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1469139 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1469145 - Posted: 26 Jan 2014, 15:18:42 UTC - in response to Message 1469139.  

Back into my secret laboratory for a month or so. If you hear screams it's probably just me poking at Boinc code.

A scene from the movie Swordfish comes to mind; compile...compile...COMPILE...

I will have to try find GCC/Linux/Mac type equivalent procedures along the way, prior to presenting anything to Boinc for inclusion (or not).

A new Mac version? You mean I might need to put the 250 back in the Mac for awhile? I could do that, we have the ability...
ID: 1469145 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1469148 - Posted: 26 Jan 2014, 15:22:12 UTC - in response to Message 1469139.  
Last modified: 26 Jan 2014, 15:23:19 UTC

Back into my secret laboratory for a month or so. If you hear screams it's probably just me poking at Boinc code.

LOL - Sorry but i can´t loose the oportunity...

Just imagine Jason´s working on x42 in his secret lab...



Hope his beer stock is at max capacity. :)
ID: 1469148 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1469151 - Posted: 26 Jan 2014, 15:26:32 UTC - in response to Message 1469145.  

A new Mac version? You mean I might need to put the 250 back in the Mac for awhile? I could do that, we have the ability...


That's down the line, and I know Edward's wrestling with Cuda SDK library issues there. I'm trying to gradually tie up the platforms, find their weak points, so that in later x42 mixing processing nodes of different types will be feasible. Linux is messy but workable. To be of better use and try help get every platform in line somehow consistently, I'm considering getting hold of a refurbished nv equipped iMac. A bit pricey for me even for the old versions, but a gaping hole in my development lab at the moment.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1469151 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1469159 - Posted: 26 Jan 2014, 15:51:20 UTC - in response to Message 1469151.  

I never was a fan of all-in-ones. They reminded me of a hard to carry laptop. My sister had an iMac. The first time I pulled it apart to add memory I discovered it was a laptop...in a hard to carry case. I do have a very old G4 laptop ;-)

It would be nice to have a Mac CUDA App that was close to the same speed as the Linux version. The one I tested for two days was rather disappointing in the speed department, otherwise, it ran fine.

Happy Keyboards.
ID: 1469159 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1469381 - Posted: 27 Jan 2014, 3:57:56 UTC

Aaaarrrggghh! I just got my first "instant" Invalid on a CPU task (3353106284) with a truncated Stderr.

Name	11au13ab.9802.15790.438086664203.12.212.vlar_0
Workunit	1411606537
Created	25 Jan 2014, 22:33:45 UTC
Sent	26 Jan 2014, 2:35:17 UTC
Received	27 Jan 2014, 2:31:37 UTC
Server state	Over
Outcome	Success
Client state	Done
Exit status	0 (0x0)
Computer ID	6915017
Report deadline	20 Mar 2014, 7:34:59 UTC
Run time	37,043.39
CPU time	28,236.88
Validate state	Invalid
Credit	0.00
Application version	SETI@home v7
Anonymous platform (CPU)
Stderr output

<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
setiathome_v7 7.00 DevC++/MinGW/g++ 4.5.2
libboinc: 7.1.0

Work Unit Info:
...............
WU true angle range is :  0.010816
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)
     v_vGetPowerSpectrumUnrolled 0.000779 0.00000 
             sse1_ChirpData_ak8h 0.036948 0.00000 
              v_vTranspose4x8ntw 0.023501 0.00000 
                  BH SSE folding 0.005089 0.00000 

</stderr_txt>
]]>

Wingman got results of:
Spike count:    21
Autocorr count: 0
Pulse count:    6
Triplet count:  3
Gaussian count: 0

so that fits with the pattern that we identified with the Cuda and ATI tasks.

This happened on my old P4 laptop, 6915017, which works so very hard (10+ hours on this task before it overflowed) for every little bit of productivity it can scratch out! ;^) The irony is that, after running stock apps for 11 months, I just switched to Lunatics 2 days ago because there was a rare AP in the queue and I wanted to see if it would cut down the 4+ day run time (but it got a 30/30 overflow, so my little test was moot). I think for MB it's still actually running stock (Lunatics didn't seem to recognize SSE2), and I've seen Stderr truncation on the machine before, just never an instant Invalid.

Well, obviously this is a situation that Jason's Cuda efforts won't address, so I sure hope that the simple fix to the validator that Joe identified can be implemented!
ID: 1469381 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1469462 - Posted: 27 Jan 2014, 9:37:29 UTC - in response to Message 1469381.  

...
This happened on my old P4 laptop, 6915017, which works so very hard (10+ hours on this task before it overflowed) for every little bit of productivity it can scratch out! ;^) The irony is that, after running stock apps for 11 months, I just switched to Lunatics 2 days ago because there was a rare AP in the queue and I wanted to see if it would cut down the 4+ day run time (but it got a 30/30 overflow, so my little test was moot). I think for MB it's still actually running stock (Lunatics didn't seem to recognize SSE2), and I've seen Stderr truncation on the machine before, just never an instant Invalid.

That's actually a Pentium M laptop. The add-on for the installer uses CPU detection code from the x264 codec, and the authors of that codec found that the implementation of SSE on Pentium M was slower for their purposes so specifically turned off the detection. That's why the installer only offered the stock app. You may want to do a manual upgradse to the Lunatics SSE2 app, it should be about 20% better.

Well, obviously this is a situation that Jason's Cuda efforts won't address, so I sure hope that the simple fix to the validator that Joe identified can be implemented!

I guess we've pretty well covered the various conditions which affect the issue, so I'll send Eric an email with the suggestion this week.
                                                                  Joe
ID: 1469462 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1469463 - Posted: 27 Jan 2014, 9:45:59 UTC

The Pentium M is actually a Pentium III Tualatin with the front side bus of a P4 added to it.

http://en.wikipedia.org/wiki/Pentium_M

Cheers.
ID: 1469463 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1469477 - Posted: 27 Jan 2014, 10:35:05 UTC - in response to Message 1469463.  

The Pentium M is actually a Pentium III Tualatin with the front side bus of a P4 added to it.

http://en.wikipedia.org/wiki/Pentium_M

Cheers.

Partially true, but the max L2 cache on a Tualatin was 512K, while the first generation Banias Pentium M's like my host 2818173 have 1M, and Jeff's is a Dothan which should have 2M. It's curious that BOINC 7.2.33 isn't showing the cache size. That cache also is more efficiently organized than a Pentium III. And of course the upgrade from SSE to SSE2 adds some useful instructions.
                                                                   Joe
ID: 1469477 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1469484 - Posted: 27 Jan 2014, 11:01:14 UTC

Yes and the link that I supplied explains all that quite well. ;-)

Cheers.
ID: 1469484 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1469626 - Posted: 27 Jan 2014, 18:40:57 UTC - in response to Message 1469462.  
Last modified: 27 Jan 2014, 18:55:24 UTC

That's actually a Pentium M laptop. The add-on for the installer uses CPU detection code from the x264 codec, and the authors of that codec found that the implementation of SSE on Pentium M was slower for their purposes so specifically turned off the detection. That's why the installer only offered the stock app. You may want to do a manual upgradse to the Lunatics SSE2 app, it should be about 20% better.

Good to know. A 20% boost would be nice, might even get the RAC close to 200! ;^)

I've gone ahead and done the manual SSE2 install and the machine's happily running that version now (at least for MB).

...and Jeff's is a Dothan which should have 2M.

Don't know about the Dothan part, but the L2 cache is only 1M.
Edit: Just checked with CPU-Z. It says it's a Banias.

I guess we've pretty well covered the various conditions which affect the issue, so I'll send Eric an email with the suggestion this week.
                                                                  Joe

Great! Sure hope he agrees to put it in soon. Thanks, Joe!
ID: 1469626 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1470608 - Posted: 30 Jan 2014, 4:06:30 UTC

Got an "instant" Invalid on my top cruncher (7057115) today. It's task 3357983534, which ran for over 15 minutes before it overflowed:
Name	25se13ab.32409.271337.438086664200.12.124_1
Workunit	1413925062
Created	28 Jan 2014, 10:33:45 UTC
Sent	28 Jan 2014, 14:25:16 UTC
Received	29 Jan 2014, 2:02:12 UTC
Server state	Over
Outcome	Success
Client state	Done
Exit status	0 (0x0)
Computer ID	7057115
Report deadline	17 Mar 2014, 6:45:10 UTC
Run time	918.98
CPU time	186.05
Validate state	Invalid
Credit	0.00
Application version	SETI@home v7 v7.00 (cuda42)
Stderr output

<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>

</stderr_txt>
]]>

Wingman got:
Spike count:    10
Autocorr count: 0
Pulse count:    0
Triplet count:  20
Gaussian count: 0

That's the 2nd one this month on that machine and 7th overall for January across all my rigs.
ID: 1470608 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1474062 - Posted: 8 Feb 2014, 3:27:20 UTC

And the hits just keep on coming, so I guess the fix isn't in yet. Today's is task 3369552268. That's already the third one this month on my host 6980751, which makes it look like the pace is picking up, although I don't know why that would be. I've also had another one on host 7057115 since my last post.
ID: 1474062 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1481781 - Posted: 26 Feb 2014, 5:23:49 UTC
Last modified: 26 Feb 2014, 5:26:25 UTC

It appears William is looking for Speed tests between the new build and the old build, http://www.arkayn.us/forum/index.php?topic=163.msg3944#msg3944

If someone could preform a little testing, it would be appreciated. I've been running the new build for quite a while, no problems since using the new App.

The Bench tools are here, Test Tools - MultiBeam
ID: 1481781 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1481794 - Posted: 26 Feb 2014, 6:29:44 UTC - in response to Message 1481781.  

It appears William is looking for Speed tests between the new build and the old build, http://www.arkayn.us/forum/index.php?topic=163.msg3944#msg3944

If someone could preform a little testing, it would be appreciated. I've been running the new build for quite a while, no problems since using the new App.

The Bench tools are here, Test Tools - MultiBeam


People will have to be an alpha tester in order to see that thread.

ID: 1481794 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1481805 - Posted: 26 Feb 2014, 6:54:47 UTC - in response to Message 1481794.  
Last modified: 26 Feb 2014, 7:00:50 UTC

It appears William is looking for Speed tests between the new build and the old build, http://www.arkayn.us/forum/index.php?topic=163.msg3944#msg3944

If someone could preform a little testing, it would be appreciated. I've been running the new build for quite a while, no problems since using the new App.

The Bench tools are here, Test Tools - MultiBeam


People will have to be an alpha tester in order to see that thread.

So...What do you suggest? If someone wants to run the bench they should post the results here? There's really not much at that link. Just a couple people posting the Bench results. Is there any preference of which bench and WU to run?

Special 'commit to disk' mode x41zc builds, http://jgopt.org/download.html
ID: 1481805 · Report as offensive
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · Next

Message boards : Number crunching : Strange Invalid MB Overflow tasks with truncated Stderr outputs...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.