Message boards :
Number crunching :
BSOD with Seti, tried reinstall of everything...
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
Another thing to keep in mind and I know it sounds stupid. But my anti-virus would not scan if someone is using the computer. I found out that after ~30 minutes of inactivity it would start doing hard drive scans. Once it hit the seti file it would try to gain exclusive control of the file and cause the machine to BSOD instantly. So make sure you add your BOINC data folder to the exemption list in your AV solution. Seems Seti@Home and NOD32 don't care too much for each other. It boggled me for awhile as I could sit at the computer and it would crunch right along for hours. Get up and leave the house and come back big fat BSOD. After pouring through the mini dumps from the crashed finally nailed it down to the BSOD was being caused by a service my AV was using. Traveling through space at ~67,000mph! |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
And here I thought that BSOD was a service automatically launched by Billy G. to bolster sales. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
lupo Send message Joined: 29 Aug 10 Posts: 91 Credit: 4,736,407 RAC: 0 |
This is how strange this issue is: 1. I can run other CUDA tasks like DNET, max each of the three GPU's at 100% for HOURS and no BSOD. 2. I can enable SLI and play games for HOURS without the BSOD It only seems to be BOINC related SETI stuff that BSOD the system after anywhere from 30 minutes to a few hours. So strange. |
lupo Send message Joined: 29 Aug 10 Posts: 91 Credit: 4,736,407 RAC: 0 |
I tried posting this in another thread but it seems to have been lost: I want to try downgrading to an earlier version of BOINC but I don't want to loose all my WU. Can someone please instruct me how to save my WU so I can run them under the earlier version of boinc once I have it installed or will this create problems? can I just copy the seti folder and move it back? |
Helli_retiered Send message Joined: 15 Dec 99 Posts: 707 Credit: 108,785,585 RAC: 0 |
I tried posting this in another thread but it seems to have been lost: This Thread? ;-) http://setiathome.berkeley.edu/forum_thread.php?id=62562 ..or is it this one? http://setiathome.berkeley.edu/forum_thread.php?id=62431 Helli A loooong time ago: First Credits after SETI@home Restart |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Did you ever say what blue screen of death you get? As there are quite a few of them, it's not just one. http://msdn.microsoft.com/en-us/library/ms681381%28v=vs.85%29.aspx will show just a couple, to give you an idea. Many of them indicate a problem on your machine. If you want specific help, you'll have to tell (some of) us the specific BSOD message. Without it, it's like going to the garage with your car and saying there's a problem with it, while refusing to specify what the problem is, where it is located etc. |
soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 |
I tried posting this in another thread but it seems to have been lost: I think Steve gave you the right answer to this problem Lupo. Turn off your hyperthreading if you are crunching SETI. Other projects it seems to work, but ScimanSteve had the same issue until he turned off hyperthreading. Janice |
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
If you can post a link to your mini dump as well we can break it down further than the generic error and tell you exactly what's going on. Traveling through space at ~67,000mph! |
lupo Send message Joined: 29 Aug 10 Posts: 91 Credit: 4,736,407 RAC: 0 |
I'll do one better: I had another group of guys working with me on this. I had posted the mini-dumps to their systems so they could look through them. If you want to see the dump I can find a hosting site and put it there but here's a link to all the info on the dump: http://www.sevenforums.com/crashes-debugging/133102-new-sudden-bsod-dxgmms1-sys.html#post1145587 |
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
Have you checked the temps on your video card? If you have reinstalled the drivers (The issue causing your BSOD) then it could be the actual card getting unstable causing it to fail. Another issue is a weak power supply. I know all these things have already been stated in this thread, but do you happen to know what amperage your rail(s) are in that power supply? One other thing to keep in mind too it could be a failing video card. Have you tried taking them out and testing one at a time? Traveling through space at ~67,000mph! |
lupo Send message Joined: 29 Aug 10 Posts: 91 Credit: 4,736,407 RAC: 0 |
I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes. I guess I could try disabling hyper threading but then I'm loosing half of what my 980 can do :( I also tried disabling that C1 something setting that was suggested on the other forum and that didn't help. If I run BOINC with the GPU's disabled it can run all 12 cores of the processor constantly without a BSOD. Like I also said before, it's strange because my XPS system with the Q6600 and 2 GTX 470's runs for days on end without issue. |
-BeNt- Send message Joined: 17 Oct 99 Posts: 1234 Credit: 10,116,112 RAC: 0 |
I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes. Yeah C1E is the dynamic throttling on the processor. Remember also 12 threads are not the same as 12 cores, your processor only has 6 cores. Either way though disabling HT I don't believe will hurt or help you either. Because like you said yourself it runs fine with the gpu's disabled. It's something to do with one of your cards or the power supply. With you commenting that you ran stress testing on the cards and it didn't crash the system then I don't believe it would be a power supply issue. I think the next logical step would be to test each card one at a time running Boinc with GPU's enabled. See if any of them make the machine crash by its self as Seti induces much higher power and stress loads on a machine than gaming or most benchmarks. Good luck! Traveling through space at ~67,000mph! |
lupo Send message Joined: 29 Aug 10 Posts: 91 Credit: 4,736,407 RAC: 0 |
I can't see how it can be the cards. Google search distributed.net and see what it does. When it's running it's running all 3 cards at 99-100% and it NEVER crashes.It's doing almost 2 billion keys per sec. I also moved some of the cards to the XPS and there were no issues. i really think it's nothing with BOINC and this system. Such a strange problem. I configured seti to run 2 WU per card and it still only runs each card at 80-90%. Temps are around 70c stable. If I wanted to run each card individually, how do I configure that in cc_config file? Oh, and the PSU is a top end antec 1200W job. I can check my APC UPS and see that it's only drawing like 600w when everything is maxed. |
SciManStev Send message Joined: 20 Jun 99 Posts: 6652 Credit: 121,090,076 RAC: 0 |
I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes. You are not losing what your 980 can do. You system should be dominated by the GPU's. I have the same processor, and although it worked fine at Einstein with hyperthreading, it would not work with SETI, regardless of clock speeds. Turn off hyperthreading, and see what happens. It seems to be the combination of some 980 chips, and multiple Fermi's running SETI. Look at my RAC, hyperthreading is shut off, and I am running only two GPU's. Your's should out perform mine with 3 GPU's. It seems that in this situation, hyperthreading is slowing you down, not speeding you up. My system was crashing constantly until I turned off hyperthreading. Your CPU crunch times will decrease, and you can get more clock speed to make up for the lack of the 12 CPU threads. It is certainly worth an experiment. Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
lupo Send message Joined: 29 Aug 10 Posts: 91 Credit: 4,736,407 RAC: 0 |
I'll try turning off ht to see if it helps. For now I just let my old xps run 4 threads on the CPU and 2 gtx 470'S and run SETI on all 12 threads on the new 980x CPU and use the 3 gpu's for dnet. (I can still hit about a billion keys per sec just on the gpu's alone.) |
lupo Send message Joined: 29 Aug 10 Posts: 91 Credit: 4,736,407 RAC: 0 |
I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes. I can't believe it, but it looks like Steve was right. I disabled HT and I have been running SETI trouble free without any BSODs. I am missing 6 threads of processing, but at least the CPU 6 cores work with the 3 GTX 470's without crashing every 20-3 hours so it's a plus. Now, what would cause this? Is this a bug is the BOINC software? Do you think a later release will fix this? Thanks, Adam |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
This thread has some interesting info in it on the subject. http://setiathome.berkeley.edu/forum_thread.php?id=62657 |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.