BSOD with Seti, tried reinstall of everything...

Message boards : Number crunching : BSOD with Seti, tried reinstall of everything...

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1061162 - Posted: 30 Dec 2010, 1:37:37 UTC

Another thing to keep in mind and I know it sounds stupid. But my anti-virus would not scan if someone is using the computer. I found out that after ~30 minutes of inactivity it would start doing hard drive scans. Once it hit the seti file it would try to gain exclusive control of the file and cause the machine to BSOD instantly. So make sure you add your BOINC data folder to the exemption list in your AV solution. Seems Seti@Home and NOD32 don't care too much for each other.

It boggled me for awhile as I could sit at the computer and it would crunch right along for hours. Get up and leave the house and come back big fat BSOD. After pouring through the mini dumps from the crashed finally nailed it down to the BSOD was being caused by a service my AV was using.
Traveling through space at ~67,000mph!
ID: 1061162 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45941
Credit: 815,390,475
RAC: 124,988
United States
Message 1061163 - Posted: 30 Dec 2010, 1:40:14 UTC

And here I thought that BSOD was a service automatically launched by Billy G. to bolster sales.
Always remember.....kitties are all Angels with fur.

Have made friends in this life.
Most were cats.
ID: 1061163 · Report as offensive
Profile lupo

Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1061358 - Posted: 30 Dec 2010, 10:37:23 UTC

This is how strange this issue is:

1. I can run other CUDA tasks like DNET, max each of the three GPU's at 100% for HOURS and no BSOD.

2. I can enable SLI and play games for HOURS without the BSOD

It only seems to be BOINC related SETI stuff that BSOD the system after anywhere from 30 minutes to a few hours. So strange.
ID: 1061358 · Report as offensive
Profile lupo

Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1061363 - Posted: 30 Dec 2010, 10:41:48 UTC

I tried posting this in another thread but it seems to have been lost:

I want to try downgrading to an earlier version of BOINC but I don't want to loose all my WU. Can someone please instruct me how to save my WU so I can run them under the earlier version of boinc once I have it installed or will this create problems?

can I just copy the seti folder and move it back?

ID: 1061363 · Report as offensive
Profile Helli_retiered
Volunteer tester
Avatar

Send message
Joined: 15 Dec 99
Posts: 707
Credit: 108,785,585
RAC: 0
Germany
Message 1061364 - Posted: 30 Dec 2010, 10:50:02 UTC - in response to Message 1061363.  
Last modified: 30 Dec 2010, 11:16:21 UTC

I tried posting this in another thread but it seems to have been lost:
...


This Thread? ;-)

http://setiathome.berkeley.edu/forum_thread.php?id=62562

..or is it this one?

http://setiathome.berkeley.edu/forum_thread.php?id=62431

Helli
A loooong time ago: First Credits after SETI@home Restart
ID: 1061364 · Report as offensive
Profile Ageless
Avatar

Send message
Joined: 9 Jun 99
Posts: 13822
Credit: 3,269,733
RAC: 0
Netherlands
Message 1061391 - Posted: 30 Dec 2010, 12:34:54 UTC

Did you ever say what blue screen of death you get? As there are quite a few of them, it's not just one. http://msdn.microsoft.com/en-us/library/ms681381%28v=vs.85%29.aspx will show just a couple, to give you an idea. Many of them indicate a problem on your machine.

If you want specific help, you'll have to tell (some of) us the specific BSOD message. Without it, it's like going to the garage with your car and saying there's a problem with it, while refusing to specify what the problem is, where it is located etc.
Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!
ID: 1061391 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6438
Credit: 31,847,538
RAC: 6,683
United States
Message 1061471 - Posted: 30 Dec 2010, 16:03:13 UTC - in response to Message 1061363.  

I tried posting this in another thread but it seems to have been lost:

I want to try downgrading to an earlier version of BOINC but I don't want to loose all my WU. Can someone please instruct me how to save my WU so I can run them under the earlier version of boinc once I have it installed or will this create problems?

can I just copy the seti folder and move it back?



I think Steve gave you the right answer to this problem Lupo. Turn off your hyperthreading if you are crunching SETI. Other projects it seems to work, but ScimanSteve had the same issue until he turned off hyperthreading.

Janice
ID: 1061471 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1061496 - Posted: 30 Dec 2010, 16:39:03 UTC

If you can post a link to your mini dump as well we can break it down further than the generic error and tell you exactly what's going on.
Traveling through space at ~67,000mph!
ID: 1061496 · Report as offensive
Profile lupo

Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1061880 - Posted: 31 Dec 2010, 11:26:30 UTC

I'll do one better:

I had another group of guys working with me on this. I had posted the mini-dumps to their systems so they could look through them. If you want to see the dump I can find a hosting site and put it there but here's a link to all the info on the dump:


http://www.sevenforums.com/crashes-debugging/133102-new-sudden-bsod-dxgmms1-sys.html#post1145587

ID: 1061880 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1061883 - Posted: 31 Dec 2010, 11:34:10 UTC

Have you checked the temps on your video card? If you have reinstalled the drivers (The issue causing your BSOD) then it could be the actual card getting unstable causing it to fail. Another issue is a weak power supply. I know all these things have already been stated in this thread, but do you happen to know what amperage your rail(s) are in that power supply? One other thing to keep in mind too it could be a failing video card. Have you tried taking them out and testing one at a time?
Traveling through space at ~67,000mph!
ID: 1061883 · Report as offensive
Profile lupo

Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1061885 - Posted: 31 Dec 2010, 11:37:22 UTC

I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes.

I guess I could try disabling hyper threading but then I'm loosing half of what my 980 can do :(

I also tried disabling that C1 something setting that was suggested on the other forum and that didn't help.

If I run BOINC with the GPU's disabled it can run all 12 cores of the processor constantly without a BSOD.

Like I also said before, it's strange because my XPS system with the Q6600 and 2 GTX 470's runs for days on end without issue.
ID: 1061885 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1061888 - Posted: 31 Dec 2010, 11:41:49 UTC - in response to Message 1061885.  

I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes.

I guess I could try disabling hyper threading but then I'm loosing half of what my 980 can do :(

I also tried disabling that C1 something setting that was suggested on the other forum and that didn't help.

If I run BOINC with the GPU's disabled it can run all 12 cores of the processor constantly without a BSOD.

Like I also said before, it's strange because my XPS system with the Q6600 and 2 GTX 470's runs for days on end without issue.


Yeah C1E is the dynamic throttling on the processor. Remember also 12 threads are not the same as 12 cores, your processor only has 6 cores. Either way though disabling HT I don't believe will hurt or help you either. Because like you said yourself it runs fine with the gpu's disabled. It's something to do with one of your cards or the power supply. With you commenting that you ran stress testing on the cards and it didn't crash the system then I don't believe it would be a power supply issue. I think the next logical step would be to test each card one at a time running Boinc with GPU's enabled. See if any of them make the machine crash by its self as Seti induces much higher power and stress loads on a machine than gaming or most benchmarks. Good luck!

Traveling through space at ~67,000mph!
ID: 1061888 · Report as offensive
Profile lupo

Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1061898 - Posted: 31 Dec 2010, 12:22:20 UTC - in response to Message 1061888.  
Last modified: 31 Dec 2010, 12:23:35 UTC

I can't see how it can be the cards. Google search distributed.net and see what it does. When it's running it's running all 3 cards at 99-100% and it NEVER crashes.It's doing almost 2 billion keys per sec. I also moved some of the cards to the XPS and there were no issues.

i really think it's nothing with BOINC and this system. Such a strange problem. I configured seti to run 2 WU per card and it still only runs each card at 80-90%. Temps are around 70c stable.

If I wanted to run each card individually, how do I configure that in cc_config file?

Oh, and the PSU is a top end antec 1200W job. I can check my APC UPS and see that it's only drawing like 600w when everything is maxed.
ID: 1061898 · Report as offensive
Profile SciManStev
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 5851
Credit: 105,983,732
RAC: 1,699
United States
Message 1061914 - Posted: 31 Dec 2010, 13:19:39 UTC - in response to Message 1061885.  

I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes.

I guess I could try disabling hyper threading but then I'm loosing half of what my 980 can do :(

I also tried disabling that C1 something setting that was suggested on the other forum and that didn't help.

If I run BOINC with the GPU's disabled it can run all 12 cores of the processor constantly without a BSOD.

Like I also said before, it's strange because my XPS system with the Q6600 and 2 GTX 470's runs for days on end without issue.


You are not losing what your 980 can do. You system should be dominated by the GPU's. I have the same processor, and although it worked fine at Einstein with hyperthreading, it would not work with SETI, regardless of clock speeds. Turn off hyperthreading, and see what happens. It seems to be the combination of some 980 chips, and multiple Fermi's running SETI. Look at my RAC, hyperthreading is shut off, and I am running only two GPU's. Your's should out perform mine with 3 GPU's. It seems that in this situation, hyperthreading is slowing you down, not speeding you up. My system was crashing constantly until I turned off hyperthreading. Your CPU crunch times will decrease, and you can get more clock speed to make up for the lack of the 12 CPU threads. It is certainly worth an experiment.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1061914 · Report as offensive
Profile lupo

Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1062249 - Posted: 1 Jan 2011, 2:31:15 UTC - in response to Message 1061914.  

I'll try turning off ht to see if it helps. For now I just let my old xps run 4 threads on the CPU and 2 gtx 470'S and run SETI on all 12 threads on the new 980x CPU and use the 3 gpu's for dnet. (I can still hit about a billion keys per sec just on the gpu's alone.)
ID: 1062249 · Report as offensive
Profile lupo

Send message
Joined: 29 Aug 10
Posts: 91
Credit: 4,736,407
RAC: 0
United States
Message 1063545 - Posted: 5 Jan 2011, 1:23:08 UTC - in response to Message 1061914.  

I tried stress testing the GPU's again and ran DNET on the 3 GPU's for 18 hours straight without a single crash. I also tried running BOINC with my anti-virus turned off and it crashed within 20 minutes.

I guess I could try disabling hyper threading but then I'm loosing half of what my 980 can do :(

I also tried disabling that C1 something setting that was suggested on the other forum and that didn't help.

If I run BOINC with the GPU's disabled it can run all 12 cores of the processor constantly without a BSOD.

Like I also said before, it's strange because my XPS system with the Q6600 and 2 GTX 470's runs for days on end without issue.


You are not losing what your 980 can do. You system should be dominated by the GPU's. I have the same processor, and although it worked fine at Einstein with hyperthreading, it would not work with SETI, regardless of clock speeds. Turn off hyperthreading, and see what happens. It seems to be the combination of some 980 chips, and multiple Fermi's running SETI. Look at my RAC, hyperthreading is shut off, and I am running only two GPU's. Your's should out perform mine with 3 GPU's. It seems that in this situation, hyperthreading is slowing you down, not speeding you up. My system was crashing constantly until I turned off hyperthreading. Your CPU crunch times will decrease, and you can get more clock speed to make up for the lack of the 12 CPU threads. It is certainly worth an experiment.

Steve



I can't believe it, but it looks like Steve was right. I disabled HT and I have been running SETI trouble free without any BSODs. I am missing 6 threads of processing, but at least the CPU 6 cores work with the 3 GTX 470's without crashing every 20-3 hours so it's a plus.


Now, what would cause this? Is this a bug is the BOINC software? Do you think a later release will fix this?


Thanks,

Adam

ID: 1063545 · Report as offensive
Profile arkaynProject Donor
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4098
Credit: 51,576,341
RAC: 968
United States
Message 1063554 - Posted: 5 Jan 2011, 1:56:28 UTC - in response to Message 1063545.  

This thread has some interesting info in it on the subject.
http://setiathome.berkeley.edu/forum_thread.php?id=62657

ID: 1063554 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : BSOD with Seti, tried reinstall of everything...


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.