Message boards :
Number crunching :
Not sure what's going on... New cruncher saga
Message board moderation
Author | Message |
---|---|
rob smith Send message Joined: 7 Mar 03 Posts: 22149 Credit: 416,307,556 RAC: 380 |
OK, I've got my new cruncher working, sort of... Its an AMD 8350 (8 core) with 16Gb of RAM sitting in an Asus Crosshair V with an Asus GTX 690 - so it shouold be "no slouch", but not necessarily the fastest kid on the block. First, after a lot of faffing around I got it to load Windoze7 - issues with the motherboard's BIOS needing to be updated. I let it run for a few hours just on its own, and all was sweetness and light, but the wrong GPU drivers loaded (that is if you count "none" as wrong). So I loaded the drivers off the disk, and let it run for a few hours more, again all was OK. So I loaded BOINC (version 7.0.28), and tried to download the current apps - they took so long I was trashing work because the apps weren't there, but there was a queue of work downloading (great when BOINC gets its bits in a twist....) Eventually the CPU app downloaded and a couple of tasks ran... So I grabbed a copy of Lunatics that I have lying around and loaded that - and now things start to resemble a pear... After a few minutes the whole lot stops. No excuses, no messing STOPS, display frozen and no response to the keyboard or mouse. So I think, hmm, lets update the GPU driver - so I download the latest from the Nvidia site (310.90) (I had to suspend all S@H processing to get it to download). Latest drivers, and away we go, for about 10 minutes and (later inspection showed) a number of "computation errors". Frozen again.... Reboot, read the notes that are posted here about what to do about errant 6xx GPU. OK, get that, let's try setting setting environment variable. Restart, suspend S@H again - big dump of updates from MS (again...), so let them through, reboot, and check the environment variable is set. Resume S@H, and a few minutes later nothing is happening, more WU end in errors... Restart, clear up the mess from the last crash, and now download an older "good" version of the drivers. And, by now you should see the pattern - after a few minutes the machine stops responding. And I'm getting confused and frustrated.... (Hmm, just had a look at the errors, most appear to have come from the CPU not the GPU...) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
The only words of wisdom i can suggest are: Monitor the CPU temps, try only CPU crunching leaving the GPU usage suspended, If that's all O.K, perhaps your PSU isn't up supplying the CPU and GPU at the same time, If it still seems unstable, perhaps your CPU cooler isn't up to it, Make sure you installed the AMD Optimised app, the Intel ones don't work on AMDs any more, Claggy |
rob smith Send message Joined: 7 Mar 03 Posts: 22149 Credit: 416,307,556 RAC: 380 |
Not thought about running GPU only. But I've had another look at the errors, a couple from the GPU, but most are from the CPU Here's the first bit of a typical stderr list: Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> Windows optimized S@H Enhanced application by Alex Kan Version info: v8b2-SSE3x (AMD/Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSE3x Win32 Build 386 , Ported by : Jason G, Raistmer, JDWhale CPUID: AMD FX(tm)-8350 Eight-Core Processor Speed: 0 x 4207 MHz Cache: L1=64K L2=2048K Features: MMX SSE SSE2 SSE3 Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 1.016984 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x004448B0 read attempt to address 0x439D6D20 Engaging BOINC Windows Runtime Debugger... Both CPU and GPU temps have been low (about 50-60C) so not a problem, but I've just noticed that the CPU is somewhat overclocked - I'll reset that and see what happens. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
rob smith Send message Joined: 7 Mar 03 Posts: 22149 Credit: 416,307,556 RAC: 380 |
After a coffee break - a quick update Claggy, to confirm I am running the "correct" AMD version of the Lunatics app, albeit a 32bit not 64bit one - I must have a dig and see if I can find a 64bit copy somewhere for both my main crunchers.... Having removed the overclocking (not sure where that came from, but probably a result of the "fun" getting Windoze to install...) The beast looks to be behaving less badly (I won't say "well" until I've seen it rumble though a few more tasks without errors) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Mike Send message Joined: 17 Feb 01 Posts: 34250 Credit: 79,922,639 RAC: 80 |
What PSU are you using ? With each crime and every kindness we birth our future. |
cdemers Send message Joined: 18 May 99 Posts: 30 Credit: 17,235,002 RAC: 0 |
My system ran 'funny' too till i found the AMD scheduler patches for windows 7. I have a AMD 8150, which now after those patches and latest drivers has been running smooth. Another good idea is to run memtest86 (or whatever your favorite memory test program is) if your seeing things unstable, I had to send back one of my 4GB modules of my 16 gig kit because it was bad. |
Tazz Send message Joined: 5 Oct 99 Posts: 137 Credit: 34,342,390 RAC: 0 |
Maybe check to see if the Turbo "feature" is enabled in the BIOS, and while your there check on the power saving settings too. I had to manually set the clock multipliers and turn some other settings off on my 8150 because the CPU speed was jumping up and down all by itself - even under the full load of S@H. I couldn't find any concrete numbers for the max temp for the 8150 either, but 60 deg was the popular number. When crunching it was running around 61-63. I put a water cooler on it and now I start getting a little concerned when it gets up to 35 degrees. ;) </Tazz> |
Mike Send message Joined: 17 Feb 01 Posts: 34250 Credit: 79,922,639 RAC: 80 |
Depends on the motherboard you are using. Disable C1 and C3 as well as turbo. Check for CPU load line calibration. With each crime and every kindness we birth our future. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I know this Error. Here's what BOINC says about it; This error can be the result of a programming error in the code for: Since it's a new machine, running the MemTest wouldn't be a bad idea. However, sometimes even memory tests won't find bad memory. You have to run the machine a while and see if it's a general problem, not associated with a single program. Try not to do it on a Tuesday... |
Ex: "Socialist" Send message Joined: 12 Mar 12 Posts: 3433 Credit: 2,616,158 RAC: 2 |
I also want to say it looks like a RAM issue. But lots of different problems portray themselves with errors like that. #resist |
rob smith Send message Joined: 7 Mar 03 Posts: 22149 Credit: 416,307,556 RAC: 380 |
Thanks for all the advise and tips. It would appear to be a "speed related" issue with the CPU. Having removed the overclocking (which the kind man who built the machine put on for me - unrequested) it has run quite happily over night. Next time its down I'll run memtest again (always one of the first things I do on a new PC) Answering question about the PSU - its rated at 1500W so should be well within its limits with one GPU on board. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Mike Send message Joined: 17 Feb 01 Posts: 34250 Credit: 79,922,639 RAC: 80 |
Since you are using a Assus board i suggest to adjust CPU Load line calibration to 40%. Also set CPU phase control to extreme. I guess you have a proggy called AI suite. You can have a look there about those settings. With each crime and every kindness we birth our future. |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
That's not unusual, on my i7-2600K @4.7GHz it only shows the Stock speed: Windows optimized S@H Enhanced application by Alex Kan Claggy |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for all the advise and tips. I'm sure a 1500W is fine, looking at your tasks, you'll want to upgrade to x41zc_Cuda5 as soon as possible, x41g_Cuda32 predates Keplers by some time, you won't get optium speed from eithier x41g or from a Cuda32 app, grab the files from jgopt.org, unpack them into your project directory, then with Boinc shut down, run aimerge.cmd, check your app_info looks O.K, then restart Boinc, (But you'll also need Cuda5 drivers installed for that, otherwise try the x41zc_Cuda42 version instead) Claggy |
rob smith Send message Joined: 7 Mar 03 Posts: 22149 Credit: 416,307,556 RAC: 380 |
I'm letting it "bed in" with what I've got. I was about to download x41zc when they got pulled from the Lunatics site :-( (Has anyone got a prognosis on when they will be restarting distributing their wares again??) Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Tim Send message Joined: 19 May 99 Posts: 211 Credit: 278,575,259 RAC: 0 |
I'm letting it "bed in" with what I've got. Don’t go to Lunatics. Claggy provide you the link to Jason site. Go to downloads and… there it is. Tim |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.