BSOD with Seti, tried reinstall of everything...


log in

Advanced search

Message boards : Number crunching : BSOD with Seti, tried reinstall of everything...

Previous · 1 · 2
Author Message
-BeNt-
Avatar
Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1061162 - Posted: 30 Dec 2010, 1:37:37 UTC

Another thing to keep in mind and I know it sounds stupid. But my anti-virus would not scan if someone is using the computer. I found out that after ~30 minutes of inactivity it would start doing hard drive scans. Once it hit the seti file it would try to gain exclusive control of the file and cause the machine to BSOD instantly. So make sure you add your BOINC data folder to the exemption list in your AV solution. Seems Seti@Home and NOD32 don't care too much for each other.

It boggled me for awhile as I could sit at the computer and it would crunch right along for hours. Get up and leave the house and come back big fat BSOD. After pouring through the mini dumps from the crashed finally nailed it down to the BSOD was being caused by a service my AV was using.
____________
Traveling through space at ~67,000mph!

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37311
Credit: 499,415,600
RAC: 509,746
United States
Message 1061163 - Posted: 30 Dec 2010, 1:40:14 UTC

And here I thought that BSOD was a service automatically launched by Billy G. to bolster sales.
____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Profile lupo
Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1061358 - Posted: 30 Dec 2010, 10:37:23 UTC

This is how strange this issue is:

1. I can run other CUDA tasks like DNET, max each of the three GPU's at 100% for HOURS and no BSOD.

2. I can enable SLI and play games for HOURS without the BSOD

It only seems to be BOINC related SETI stuff that BSOD the system after anywhere from 30 minutes to a few hours. So strange.

Profile lupo
Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1061363 - Posted: 30 Dec 2010, 10:41:48 UTC

I tried posting this in another thread but it seems to have been lost:

I want to try downgrading to an earlier version of BOINC but I don't want to loose all my WU. Can someone please instruct me how to save my WU so I can run them under the earlier version of boinc once I have it installed or will this create problems?

can I just copy the seti folder and move it back?

Profile Helli
Volunteer tester
Avatar
Send message
Joined: 15 Dec 99
Posts: 697
Credit: 77,397,466
RAC: 74,643
Germany
Message 1061364 - Posted: 30 Dec 2010, 10:50:02 UTC - in response to Message 1061363.
Last modified: 30 Dec 2010, 11:16:21 UTC

I tried posting this in another thread but it seems to have been lost:
...


This Thread? ;-)

http://setiathome.berkeley.edu/forum_thread.php?id=62562

..or is it this one?

http://setiathome.berkeley.edu/forum_thread.php?id=62431

Helli
____________

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12128
Credit: 2,519,827
RAC: 270
Netherlands
Message 1061391 - Posted: 30 Dec 2010, 12:34:54 UTC

Did you ever say what blue screen of death you get? As there are quite a few of them, it's not just one. http://msdn.microsoft.com/en-us/library/ms681381%28v=vs.85%29.aspx will show just a couple, to give you an idea. Many of them indicate a problem on your machine.

If you want specific help, you'll have to tell (some of) us the specific BSOD message. Without it, it's like going to the garage with your car and saying there's a problem with it, while refusing to specify what the problem is, where it is located etc.
____________
Jord

Loving awareness is free.

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,216,480
RAC: 183
United States
Message 1061471 - Posted: 30 Dec 2010, 16:03:13 UTC - in response to Message 1061363.

I tried posting this in another thread but it seems to have been lost:

I want to try downgrading to an earlier version of BOINC but I don't want to loose all my WU. Can someone please instruct me how to save my WU so I can run them under the earlier version of boinc once I have it installed or will this create problems?

can I just copy the seti folder and move it back?



I think Steve gave you the right answer to this problem Lupo. Turn off your hyperthreading if you are crunching SETI. Other projects it seems to work, but ScimanSteve had the same issue until he turned off hyperthreading.
____________

Janice

-BeNt-
Avatar
Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1061496 - Posted: 30 Dec 2010, 16:39:03 UTC

If you can post a link to your mini dump as well we can break it down further than the generic error and tell you exactly what's going on.
____________
Traveling through space at ~67,000mph!

Profile lupo
Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1061880 - Posted: 31 Dec 2010, 11:26:30 UTC

I'll do one better:

I had another group of guys working with me on this. I had posted the mini-dumps to their systems so they could look through them. If you want to see the dump I can find a hosting site and put it there but here's a link to all the info on the dump:


http://www.sevenforums.com/crashes-debugging/133102-new-sudden-bsod-dxgmms1-sys.html#post1145587

-BeNt-
Avatar
Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1061883 - Posted: 31 Dec 2010, 11:34:10 UTC

Have you checked the temps on your video card? If you have reinstalled the drivers (The issue causing your BSOD) then it could be the actual card getting unstable causing it to fail. Another issue is a weak power supply. I know all these things have already been stated in this thread, but do you happen to know what amperage your rail(s) are in that power supply? One other thing to keep in mind too it could be a failing video card. Have you tried taking them out and testing one at a time?
____________
Traveling through space at ~67,000mph!

Profile lupo
Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1061885 - Posted: 31 Dec 2010, 11:37:22 UTC

I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes.

I guess I could try disabling hyper threading but then I'm loosing half of what my 980 can do :(

I also tried disabling that C1 something setting that was suggested on the other forum and that didn't help.

If I run BOINC with the GPU's disabled it can run all 12 cores of the processor constantly without a BSOD.

Like I also said before, it's strange because my XPS system with the Q6600 and 2 GTX 470's runs for days on end without issue.

-BeNt-
Avatar
Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1061888 - Posted: 31 Dec 2010, 11:41:49 UTC - in response to Message 1061885.

I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes.

I guess I could try disabling hyper threading but then I'm loosing half of what my 980 can do :(

I also tried disabling that C1 something setting that was suggested on the other forum and that didn't help.

If I run BOINC with the GPU's disabled it can run all 12 cores of the processor constantly without a BSOD.

Like I also said before, it's strange because my XPS system with the Q6600 and 2 GTX 470's runs for days on end without issue.


Yeah C1E is the dynamic throttling on the processor. Remember also 12 threads are not the same as 12 cores, your processor only has 6 cores. Either way though disabling HT I don't believe will hurt or help you either. Because like you said yourself it runs fine with the gpu's disabled. It's something to do with one of your cards or the power supply. With you commenting that you ran stress testing on the cards and it didn't crash the system then I don't believe it would be a power supply issue. I think the next logical step would be to test each card one at a time running Boinc with GPU's enabled. See if any of them make the machine crash by its self as Seti induces much higher power and stress loads on a machine than gaming or most benchmarks. Good luck!

____________
Traveling through space at ~67,000mph!

Profile lupo
Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1061898 - Posted: 31 Dec 2010, 12:22:20 UTC - in response to Message 1061888.
Last modified: 31 Dec 2010, 12:23:35 UTC

I can't see how it can be the cards. Google search distributed.net and see what it does. When it's running it's running all 3 cards at 99-100% and it NEVER crashes.It's doing almost 2 billion keys per sec. I also moved some of the cards to the XPS and there were no issues.

i really think it's nothing with BOINC and this system. Such a strange problem. I configured seti to run 2 WU per card and it still only runs each card at 80-90%. Temps are around 70c stable.

If I wanted to run each card individually, how do I configure that in cc_config file?

Oh, and the PSU is a top end antec 1200W job. I can check my APC UPS and see that it's only drawing like 600w when everything is maxed.

Profile SciManStev
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4665
Credit: 77,241,149
RAC: 42,111
United States
Message 1061914 - Posted: 31 Dec 2010, 13:19:39 UTC - in response to Message 1061885.

I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes.

I guess I could try disabling hyper threading but then I'm loosing half of what my 980 can do :(

I also tried disabling that C1 something setting that was suggested on the other forum and that didn't help.

If I run BOINC with the GPU's disabled it can run all 12 cores of the processor constantly without a BSOD.

Like I also said before, it's strange because my XPS system with the Q6600 and 2 GTX 470's runs for days on end without issue.


You are not losing what your 980 can do. You system should be dominated by the GPU's. I have the same processor, and although it worked fine at Einstein with hyperthreading, it would not work with SETI, regardless of clock speeds. Turn off hyperthreading, and see what happens. It seems to be the combination of some 980 chips, and multiple Fermi's running SETI. Look at my RAC, hyperthreading is shut off, and I am running only two GPU's. Your's should out perform mine with 3 GPU's. It seems that in this situation, hyperthreading is slowing you down, not speeding you up. My system was crashing constantly until I turned off hyperthreading. Your CPU crunch times will decrease, and you can get more clock speed to make up for the lack of the 12 CPU threads. It is certainly worth an experiment.

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

Profile lupo
Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1062249 - Posted: 1 Jan 2011, 2:31:15 UTC - in response to Message 1061914.

I'll try turning off ht to see if it helps. For now I just let my old xps run 4 threads on the CPU and 2 gtx 470'S and run SETI on all 12 threads on the new 980x CPU and use the 3 gpu's for dnet. (I can still hit about a billion keys per sec just on the gpu's alone.)

Profile lupo
Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1063545 - Posted: 5 Jan 2011, 1:23:08 UTC - in response to Message 1061914.

I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes.

I guess I could try disabling hyper threading but then I'm loosing half of what my 980 can do :(

I also tried disabling that C1 something setting that was suggested on the other forum and that didn't help.

If I run BOINC with the GPU's disabled it can run all 12 cores of the processor constantly without a BSOD.

Like I also said before, it's strange because my XPS system with the Q6600 and 2 GTX 470's runs for days on end without issue.


You are not losing what your 980 can do. You system should be dominated by the GPU's. I have the same processor, and although it worked fine at Einstein with hyperthreading, it would not work with SETI, regardless of clock speeds. Turn off hyperthreading, and see what happens. It seems to be the combination of some 980 chips, and multiple Fermi's running SETI. Look at my RAC, hyperthreading is shut off, and I am running only two GPU's. Your's should out perform mine with 3 GPU's. It seems that in this situation, hyperthreading is slowing you down, not speeding you up. My system was crashing constantly until I turned off hyperthreading. Your CPU crunch times will decrease, and you can get more clock speed to make up for the lack of the 12 CPU threads. It is certainly worth an experiment.

Steve



I can't believe it, but it looks like Steve was right. I disabled HT and I have been running SETI trouble free without any BSODs. I am missing 6 threads of processing, but at least the CPU 6 cores work with the 3 GTX 470's without crashing every 20-3 hours so it's a plus.


Now, what would cause this? Is this a bug is the BOINC software? Do you think a later release will fix this?


Thanks,

Adam

Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3544
Credit: 46,161,047
RAC: 30,556
United States
Message 1063554 - Posted: 5 Jan 2011, 1:56:28 UTC - in response to Message 1063545.

This thread has some interesting info in it on the subject.
http://setiathome.berkeley.edu/forum_thread.php?id=62657
____________

Previous · 1 · 2

Message boards : Number crunching : BSOD with Seti, tried reinstall of everything...

Copyright © 2014 University of California