Help needed

Message boards : Number crunching : Help needed
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1469981 - Posted: 28 Jan 2014, 21:35:06 UTC - in response to Message 1469629.  
Last modified: 28 Jan 2014, 21:44:57 UTC

BTW, the last time I saw an AMD processor behaving as yours, it was due to overclocking. The person claimed someone else built the computer for him and he didn't even know the CPU was overclocked. I assume sometimes mixing certain hardware may also result in clocks being different than assumed. My Intel Board will set the CPU clock to 3 GHz all by itself, even though the CPU is rated at 2.4 GHz. Other than that, maybe the CPU is just defective...

This is a different CPU from before. Both seem to be giving the same errors.
(I was certain I had replied yesterday)

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1469981 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1469985 - Posted: 28 Jan 2014, 21:43:14 UTC - in response to Message 1469981.  

OK, what has happened now?????

I was running DPC, showing orange, and saying there was a problem with some driver when I lost contact the computer. The mouse would not respond and then the kybd only partially worked so I was unable to do much. In my despair I decided to just reload the same image I had put on this SSD the day before. All went well except that I now have a new computer with the totals from the old one but all the work that is being done on it is not adding to the total work.
This: http://setiathome.berkeley.edu/results.php?hostid=7200897 shows that I have 101 tasks in progress and I do not. I have turned off New Tasks and will soon be down to no tasks in progress.
I really feel stupid at this point as especially because I can not if I am still producing errors.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1469985 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1470010 - Posted: 28 Jan 2014, 22:18:57 UTC

Have you run Memtest86+ on that rig yet?

Have you tried a totally fresh install of the OS? The problem could be in the image that you are using.

Have you considered that the mobo may be your problem?

Cheers.
ID: 1470010 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1470087 - Posted: 29 Jan 2014, 3:40:26 UTC - in response to Message 1470010.  

Have you run Memtest86+ on that rig yet?

Have you tried a totally fresh install of the OS? The problem could be in the image that you are using.

Have you considered that the mobo may be your problem?

Cheers.

Yes, I ran memtest and it ran clean.

No, I have not done a new install of the OS as I do not have a disk.

I do not think it is the MB as the problem started with the previous MB and processor which I changed out end of last week. This is at this time a new HD, new MB, new CPU and a different GPU as well.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1470087 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1470088 - Posted: 29 Jan 2014, 3:47:55 UTC

So the only thing in common then is the OS image that you're using?

Cheers.
ID: 1470088 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1470097 - Posted: 29 Jan 2014, 4:32:20 UTC - in response to Message 1469985.  
Last modified: 29 Jan 2014, 4:37:52 UTC

OK, what has happened now?????

I was running DPC, showing orange, and saying there was a problem with some driver when I lost contact the computer. The mouse would not respond and then the kybd only partially worked so I was unable to do much. In my despair I decided to just reload the same image I had put on this SSD the day before. All went well except that I now have a new computer with the totals from the old one but all the work that is being done on it is not adding to the total work.
This: http://setiathome.berkeley.edu/results.php?hostid=7200897 shows that I have 101 tasks in progress and I do not. I have turned off New Tasks and will soon be down to no tasks in progress.
I really feel stupid at this point as especially because I can not if I am still producing errors.


Hmmm, the orange is enough warning to figure out there is some system issue hiding there (as if we didn't know ;) ). Is it on Wifi ? if so, what card/chip ? If not, motherboard Network ? which one ? In each case the driver, motherboard chipset or other, could need update. Some wifi needs some extra treatment, as sometimes do chipset, PCIe and hd/sdd controller need some finaggling.

Side issue not willing to rule out RAM setting issues despite solid memtest run: What's the rated voltage for the memory? Is it running with a profile some AMD equivalent of what Intel's XMP is ?, I'm unfamiliar with the AMD setup and terminology for that, though electrically there should be some setting equivalent to Vtt, the controller voltage, which can be dicey to set properly with high performance memory, and have weird stability issues left at defaults or auto.

IF not RAM/memory-controller settings, and If it's not network freaking out the system DPCs, then tracking down the specific device/driver can be a challenge, at which point the more detailed info from LatencyMon can be handy.

So it'll be a matter of listing what you've tested/tried so far, and filling in some information holes. The DPC issue and disconnect could well be either a clue, or just another symptom of an underlying issue.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1470097 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1470157 - Posted: 29 Jan 2014, 7:39:33 UTC - in response to Message 1469526.  

I would suggest you get an installable version of the OS that you are using, not an image. I have seen Windows XYZ often insist a driver X is for hardware Y just because of the image used and not having the OS installed fresh. My guess is you have the wrong driver for a piece of hardware - especially since you seem to have changed pretty much everything. Spend the $$ and get an installable version of your OS, or use Linux - its free afterall ;)

Cheers

Mark
ID: 1470157 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1470174 - Posted: 29 Jan 2014, 8:22:08 UTC - in response to Message 1470157.  

I would suggest you get an installable version of the OS that you are using, not an image. I have seen Windows XYZ often insist a driver X is for hardware Y just because of the image used and not having the OS installed fresh. My guess is you have the wrong driver for a piece of hardware - especially since you seem to have changed pretty much everything. Spend the $$ and get an installable version of your OS, or use Linux - its free afterall ;)

Cheers

Mark

Exactly my thoughts as well with the image.

Cheers.
ID: 1470174 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1470727 - Posted: 30 Jan 2014, 14:58:00 UTC - in response to Message 1470097.  
Last modified: 30 Jan 2014, 15:11:29 UTC

[quote
So it'll be a matter of listing what you've tested/tried so far, and filling in some information holes. The DPC issue and disconnect could well be either a clue, or just another symptom of an underlying issue.[/quote]
Here is what I get: https://skydrive.live.com/embed?cid=E0209B0281D305E1&resid=E0209B0281D305E1%211472&authkey=AOfri-p51OAPPNE

So far I have tried almost everything that I can think of in Device Manager. Disabling everything that I can and it has made no difference. The program certainly shows something wrong here.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1470727 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1470746 - Posted: 30 Jan 2014, 15:41:29 UTC - in response to Message 1470727.  
Last modified: 30 Jan 2014, 15:43:28 UTC

Yep. I'm inclined to read that along the lines of the other guys suggestions, that whether for image or hardware change reasons or wwhatever, things have gone wacko.

In the picture the DirectX kernel taking its time is more or less to be expected. On the other hand the 'highest reported DPC execution time' shows as tcpip.sys, which is part oif Windows' networking. (as a test) My suggestion is to disable the network adaptor and take another scan. My guess is it could then show clear. You need networking of course, but isolation is the goal here.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1470746 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1471329 - Posted: 31 Jan 2014, 21:06:23 UTC - in response to Message 1470746.  

Yep. I'm inclined to read that along the lines of the other guys suggestions, that whether for image or hardware change reasons or wwhatever, things have gone wacko.

In the picture the DirectX kernel taking its time is more or less to be expected. On the other hand the 'highest reported DPC execution time' shows as tcpip.sys, which is part oif Windows' networking. (as a test) My suggestion is to disable the network adaptor and take another scan. My guess is it could then show clear. You need networking of course, but isolation is the goal here.

Just an update here, I am now running Windows 8 on a fresh install and of course I have a new computer in BOINC http://setiathome.berkeley.edu/results.php?hostid=7202896

I turned off the network adaptor and then tried a scan and it was the same. I seem to be getting a lot of inconclusives but they seem to go to valid after a third person does the WU unit. I seem to be getting credit even when I do not agree with the other two results.

My feeling at this point is that it is something in the MB, probably the BIOS that I am just missing. I have to keep at it and perhaps something will finally flatten the graph that I continue to get.

I hope for a resolution but as I said, this is a new MB and the problem certainly seemed to follow to this one. It is a shame the three people have to process the WU when two should be enough.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1471329 · Report as offensive
Profile ausymark

Send message
Joined: 9 Aug 99
Posts: 95
Credit: 10,175,128
RAC: 0
Australia
Message 1471428 - Posted: 1 Feb 2014, 2:48:45 UTC - in response to Message 1471329.  
Last modified: 1 Feb 2014, 2:50:05 UTC

Hi Bill G

I doubt its something in the bios, unless the system is overclocked, as generally all bios functions are taken over by the OS. Though if overclocking was failing then the system would likely be having other issues like lockups/blue screens of deaths, random minor software/hardware error messages etc.

Cheers

Mark
ID: 1471428 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1471505 - Posted: 1 Feb 2014, 9:54:36 UTC - in response to Message 1471329.  
Last modified: 1 Feb 2014, 9:56:50 UTC

Yep. I'm inclined to read that along the lines of the other guys suggestions, that whether for image or hardware change reasons or wwhatever, things have gone wacko.

In the picture the DirectX kernel taking its time is more or less to be expected. On the other hand the 'highest reported DPC execution time' shows as tcpip.sys, which is part oif Windows' networking. (as a test) My suggestion is to disable the network adaptor and take another scan. My guess is it could then show clear. You need networking of course, but isolation is the goal here.

Just an update here, I am now running Windows 8 on a fresh install and of course I have a new computer in BOINC http://setiathome.berkeley.edu/results.php?hostid=7202896

I turned off the network adaptor and then tried a scan and it was the same. I seem to be getting a lot of inconclusives but they seem to go to valid after a third person does the WU unit. I seem to be getting credit even when I do not agree with the other two results.

My feeling at this point is that it is something in the MB, probably the BIOS that I am just missing. I have to keep at it and perhaps something will finally flatten the graph that I continue to get.

I hope for a resolution but as I said, this is a new MB and the problem certainly seemed to follow to this one. It is a shame the three people have to process the WU when two should be enough.


As you've got a more or less vanilla Win8 install now (?), I'd be inclined to repeatedly run LatencyMon, dig into the added information available there to work out which devices are probably performing 'OK' and which might need looking at. If there's no consistency there on multiple runs, then the problem could well be lower down than software/drivers.

If nothing specific, or an odd mixture of suspects pops up it can be some deeper hardware/BIOS thing, but you'd expect other failures.

Here's the rough sequence of events, as a comparison that probably won't resemble your situation, that brought my (nearly antique now) Core2Duo main Dev host back up to speed, after I had done a Win7 repair install to fix minor issues, and forgotten which devices needed special attention. I use this as an example, only because the i5/Z77 motherboard in the other room required no such intervention:

- Did a win7 repair install to eliminate some minor issues that had developed with system restore not saving checkpoints and safe mode not working. Not 'necessary' as I have a RAID setup and never needed those features, but wanted the peace of mind.
- Did all Windows updates after win7sp1, updated all drivers to latest manufacturer ones (manually checking the individual Intel ones stuck)
- Went through and re-blackvypered the system to cut down on services I never use.
- Some circumstances (prior to any crunching) gave slow/laggy behaviour. I checked all hardware, BIOS etc, no changes apparent, but checked all voltages , timings etc.
- Checked DPC latencies which showed periodic spikes
- tracked down that Intel's RAID management stuff needed update, which removed some spikes after reboot. I had found that through the use of LatencyMon's detailed info pointing to a specific driver module.
- Relocated modified drivers for my wifi network adaptor, also implicated by disabling/reenabling to watch more DPC spikes disappear/reappear. Using the modified driver and tweaking the settings, re-stumbled upon the same setting that eliminated the DPC spikes years ago.

So in my case, it showed the driver quality can vary a lot, and a small innocuous looking setting hiding somewhere can upset the apple cart. Only systematically going through the whole lot seemed to get the gremlins out.

Hopefully yours is something simpler than my dev host, but at least you seem to have some clues to follow.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1471505 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1471584 - Posted: 1 Feb 2014, 14:41:33 UTC - in response to Message 1471505.  

Thanks Jason. I will keep at it. What the monitor did show was that you should not even run with "cool and quiet" which is the lowest setting for performance in the BIOS. I am running at the mid one which is not supposed to be overclocking (and is not from the various graphs). You do have to pick one of the three performance settings. At times I wish I were an overclocker as I would know more about all the settings offered.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1471584 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1471586 - Posted: 1 Feb 2014, 14:44:28 UTC - in response to Message 1471428.  

Hi Bill G

I doubt its something in the bios, unless the system is overclocked, as generally all bios functions are taken over by the OS. Though if overclocking was failing then the system would likely be having other issues like lockups/blue screens of deaths, random minor software/hardware error messages etc.

Cheers

Mark


Mark, I do not want to rule out anything at this point. Overclocking is all set up in the BIOS so some setting my not be right. Setting all to Default seems to enable some things which I look at a they say overclock to me. Oh, and of course Default does not eliminate the problem.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1471586 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1474979 - Posted: 10 Feb 2014, 1:08:57 UTC - in response to Message 1471586.  
Last modified: 10 Feb 2014, 1:10:32 UTC

I am hoping that I am not jumping the gun here but it looks like I might have found something with this computer.
I have increased the mbcuda files to run at abovenormal priority and it seemed to make a difference. Using Latency Mon, it stopped telling me that there might be some drop outs. Since making that change my RAC has shot up 1000. If this continues I will load optimized aps later this week, and put in the second GPU(well one at a time).

This is from the message: http://setiathome.berkeley.edu/forum_thread.php?id=73916&postid=1469371

I have been working on this computer http://setiathome.berkeley.edu/results.php?hostid=7202896 every day trying to eliminate everything ASUS runs, like temp monitoring, fan speed, etc. and nothing ever changed. This is the only thing that has seemed to have any effect. (I even tried some different memory with much different timings with no change)
I also wanted to add that there has been no change in DPC Latency Checker's graph.

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1474979 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Help needed


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.