What's wrong with the forums?

Message boards : Number crunching : What's wrong with the forums?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 324000 - Posted: 2 Jun 2006, 23:18:11 UTC - in response to Message 323965.  
Last modified: 2 Jun 2006, 23:21:33 UTC

I don't think anyone is saying that there are no issues, just that it's hard to work on the facts with all the rhetoric flying around.


Yes, I know...from BOTH sides.


This one crashed. Speaking as a programmer, this is the only one here that really bothers me because the science application ought not to crash. Did it crash because it's overclocked, or is the hardware failing, or is there a conflict with other software -- or is it just a bug?


Well, as you have seen if you've read the computation error thread, I'm overclocked, but am turning in valid results all the time. Could it be that my overclocking is still flakey? Perhaps. I haven't done a super-intensive /long duration check with Prime95, but SuperPi is stable. Another mitigating factor in my case is the fact that others are also having problems with the WU in question. The WU in my stats right now shows the same traits as this one for cya2... My crash mentioned a possible stack corruption...

StackWalk(): ERROR_INVALID_ADDRESS (487) - Possible stack corruption.

Dunno what is going on...


Edit: Forgot to comment on the following


One work unit doesn't tell us a whole lot about BOINC in general. We need to look at others -- especially those running standard apps on the latest standard version of BOINC.


That's why I held off mentioning something for a bit, considering I have a "non-standard" BOINC. The science app is standard, but the BOINC manager is not. I was going to install Crunch3r's optimized science app, but I wasn't really sure it was making a big difference, so I remained with the standard app. In my case though, people who appear to be running all standard stuff also had a problem...

ID: 324000 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 324015 - Posted: 2 Jun 2006, 23:27:25 UTC - in response to Message 324000.  


Well, as you have seen if you've read the computation error thread, I'm overclocked, but am turning in valid results all the time. Could it be that my overclocking is still flakey? Perhaps. I haven't done a super-intensive /long duration check with Prime95, but SuperPi is stable. Another mitigating factor in my case is the fact that others are also having problems with the WU in question. The WU in my stats right now shows the same traits as this one for cya2... My crash mentioned a possible stack corruption...

StackWalk(): ERROR_INVALID_ADDRESS (487) - Possible stack corruption.

Dunno what is going on...

Stack corruption (and invalid addresses) can be software bugs, and they can be caused by memory that is almost reliable.

... you might have a few bits in RAM that are a tiny bit slow, so the problem does not show up often.

... or it could just plain be a fluke. I'd run Memtest86, and think about running one of the other tests as well.


ID: 324015 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 324028 - Posted: 2 Jun 2006, 23:40:36 UTC - in response to Message 324015.  


Stack corruption (and invalid addresses) can be software bugs, and they can be caused by memory that is almost reliable.

... you might have a few bits in RAM that are a tiny bit slow, so the problem does not show up often.

... or it could just plain be a fluke. I'd run Memtest86, and think about running one of the other tests as well.



What's really going to tickle your brain is...perhaps it should've crashed given the circumstances, otherwise it would've gone on to "hang" and say it was "successful" while requesting 0.11 credit.

To give you an idea about what memory I'm using, it is OCZ PC4000EB Platinum, basically their top of the line DDR1 memory (and likely their last enthusiast DDR1 effort). It is supposed to run DDR-500 at 3-3-2-8. I am running at 3-3-3-8, just to loosen up the timing just a smidge...

I'll try some tests, but it is doubtful that the memory is bad. If it is anything, it is the heat being generated by the CPU and the on-die memory controller... I've been trying to decide on just which cpu cooler I want to get. I was after the Thermalright SI-120, but now I'm thinking about their new Ultra-90 or Ultra-120... I think right now I'm hitting 60C at times, which is near the upper end of the thermal limits of this processor...
ID: 324028 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 324040 - Posted: 2 Jun 2006, 23:50:41 UTC - in response to Message 324028.  
Last modified: 2 Jun 2006, 23:51:15 UTC


Stack corruption (and invalid addresses) can be software bugs, and they can be caused by memory that is almost reliable.

... you might have a few bits in RAM that are a tiny bit slow, so the problem does not show up often.

... or it could just plain be a fluke. I'd run Memtest86, and think about running one of the other tests as well.



What's really going to tickle your brain is...perhaps it should've crashed given the circumstances, otherwise it would've gone on to "hang" and say it was "successful" while requesting 0.11 credit.

To give you an idea about what memory I'm using, it is OCZ PC4000EB Platinum, basically their top of the line DDR1 memory (and likely their last enthusiast DDR1 effort). It is supposed to run DDR-500 at 3-3-2-8. I am running at 3-3-3-8, just to loosen up the timing just a smidge...

I'll try some tests, but it is doubtful that the memory is bad. If it is anything, it is the heat being generated by the CPU and the on-die memory controller... I've been trying to decide on just which cpu cooler I want to get. I was after the Thermalright SI-120, but now I'm thinking about their new Ultra-90 or Ultra-120... I think right now I'm hitting 60C at times, which is near the upper end of the thermal limits of this processor...

Well, "memory" is more than just the RAM chips. It includes the bus, the controller, etc.

It takes a little bit of time for a signal to go from 0 to 1, and as you overclock you move away from a stable flat signal, and on to the slope during the transition. Instead of 0's and 1's, you have .2's and .8's -- or worse.

So, yeah, the RAM chips could be at spec., and it'd still be a "memory error"

I haven't looked for other -9 (noisy) work units, but I wonder if they're being under-reported.

[edit]...and yes, temperature can be a big factor.[/edit]
ID: 324040 · Report as offensive
Steve MacKenzie
Volunteer tester
Avatar

Send message
Joined: 2 Jan 00
Posts: 146
Credit: 6,504,803
RAC: 1
United States
Message 324783 - Posted: 3 Jun 2006, 9:58:25 UTC

Well .... I guess CYA is all set now.
Never did do the science of figuring out the issue.
Too many unresponsive or evasive answers to simple questions.

I guess I won't try that tact again.

So.....

See Ya CYA

Steve

PS: Ned. Thanks for the assist.


ID: 324783 · Report as offensive
Michael Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 21 Aug 99
Posts: 4608
Credit: 7,427,891
RAC: 18
United States
Message 325007 - Posted: 3 Jun 2006, 15:30:08 UTC - in response to Message 323601.  


HANS WROTE:

As soon as you post something that might be controversial, your posts or the entire thread will disappear......



That only applies if the mod does not like you. He/she will ignore his friends and give them free reign.

100%, all the time.



ID: 325007 · Report as offensive
Profile Es99
Volunteer tester
Avatar

Send message
Joined: 23 Aug 05
Posts: 10874
Credit: 350,402
RAC: 0
Canada
Message 325127 - Posted: 3 Jun 2006, 17:38:44 UTC - in response to Message 325007.  


HANS WROTE:

As soon as you post something that might be controversial, your posts or the entire thread will disappear......



That only applies if the mod does not like you. He/she will ignore his friends and give them free reign.

100%, all the time.



If that's the case why have I had so many of my effing posts deleted? I'll show you a screenshot of my post list if you don't believe me. 100% is just a plain lie and you should apologise.
Reality Internet Personality
ID: 325127 · Report as offensive
Profile Bymark
Avatar

Send message
Joined: 30 Dec 04
Posts: 29
Credit: 700,896
RAC: 0
Finland
Message 325300 - Posted: 3 Jun 2006, 21:17:19 UTC - in response to Message 324783.  
Last modified: 3 Jun 2006, 21:22:07 UTC

Well .... I guess CYA is all set now.
Never did do the science of figuring out the issue.
Too many unresponsive or evasive answers to simple questions.

I guess I won't try that tact again.

So.....

See Ya CYA

Steve

PS: Ned. Thanks for the assist.



Yep, now doing http://predictor.scripps.edu/ !

Thanks anyway.......... cya2

ID: 325300 · Report as offensive
Previous · 1 · 2 · 3

Message boards : Number crunching : What's wrong with the forums?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.