Bug in server affecting older BOINC clients with NVIDIA GPUs.

Message boards : News : Bug in server affecting older BOINC clients with NVIDIA GPUs.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Profile Interstel
Avatar

Send message
Joined: 29 Nov 01
Posts: 23
Credit: 2,231,105
RAC: 0
United States
Message 1276399 - Posted: 27 Aug 2012, 19:39:56 UTC - in response to Message 1275383.  

... in the next few days I will be getting in a GeForce 295GTX dual GPU 1.7GB memory card.
I would get something better but that is the top end card supported by my motherboard.

Why do you think so?
Don't expect that your motherboard docs (.pdf) will include all the video cards it can accept.
(it will list only those video cards available at the time the motherboard was manufactured)

You can use PCIe 2.0 video card in an PCIe 1.x motherboard.
(my PCIe 1.x motherboard is made 2006, my PCIe 2.0 video card in it is made 2011 (or 2012 ?))


Its one thing to put a PCIe 2.0 card in a PCIe 1.1 slot as the card will switch to downward compatibility. But I doubt putting ina PCIe 3.0 card in a 1.1 slot would be smart as the bus differences would be more than I expect it to tolerate plus their might be voltage issues also.



Was I wrong in believing I had read that you could force BOINC through the resource share option to partition out the work in the percentage that you wanted?

BOINC is 'supposed' to respect resource share in a long term (weeks?).

[/quote]

Well I look forward to as right now it seems to be running a round robin affair and the amount of SETI work I have done have dropped dramatically on this machine.

James



Joined SETI@Home in 2001
Online since ArpNET days
First activity on Honeywell 1648
Series Mainframe in 1975 at age 12.
ID: 1276399 · Report as offensive
Profile Boong

Send message
Joined: 25 Nov 08
Posts: 7
Credit: 16,943,808
RAC: 0
Australia
Message 1276636 - Posted: 28 Aug 2012, 11:29:37 UTC

Running BOINC client version 7.0.28 for windows_x86_64 , using anonymous platform , AMD Athlon II X4 640 Processor , ATI Radeon HD 5800 Series , Catalyst Version 12.8

Please help i keep getting computation errors on gpu wu's

28/08/2012 7:46:08 PM | SETI@home | Aborting task 14my12ab.21457.20108.13.10.152_1: exceeded elapsed time limit 665.45 (1902.40G/2.86G)

my brothers computer is getting the same error , two other comps in the house are ok but they running a different client version and cat version

Thanks

Andrew

Melbourne AU
ID: 1276636 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1276676 - Posted: 28 Aug 2012, 13:29:17 UTC

To Boong.

You might want to post this in the Number Crunching forum where there are lost of helpful people.

This is a specific thread about a server bug that affect older Nvidia cards and has in fact been solved.
ID: 1276676 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1276809 - Posted: 28 Aug 2012, 22:25:06 UTC - in response to Message 1273611.  

That's probably true. But we'd need more than four part time employees. A budget of 1.5 million dollars a year would help a lot towards getting 97% uptime. We'd really like a time machine to allow us to go back and put more effort into convincing the State of California that saving money by forcing the people who understood the campus power system to take early retirement was a bad idea.

Unfortunately none of those things is going to happen, unless you happen to have a billionaire in your pocket. Uptime costs money, salaries cost money, network bandwidth costs money. We don't have money. It's amazing what you can do without money, but it's equally amazing what you can't do.


+10
ID: 1276809 · Report as offensive
motugirl

Send message
Joined: 9 Feb 02
Posts: 3
Credit: 1,714,751
RAC: 0
United States
Message 1276841 - Posted: 29 Aug 2012, 1:54:13 UTC - in response to Message 1271871.  

When my computer slowed to a crawl and my CPU wound up, I took it for repair and the IT guy said it was running multiple applications of SETI. He uninstalled it and I reinstalled the latest BOINC - he said to just shut down the extra applications if it happened again. It did and I did, while I was at my computer, but when I wasn't, my computer died and now won't boot. I keep getting a command screen that says "trying to load MS NVIDIA", then it stops and nothing happens. I can't even get to the recovery app, and I can't boot from the DVD drive. How do I fix this? I am not a computer geek. SETI isn't supposed to kill its hosts!!
ID: 1276841 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1277092 - Posted: 29 Aug 2012, 14:44:54 UTC - in response to Message 1276841.  
Last modified: 29 Aug 2012, 15:33:53 UTC


You very probably didn't watch for Temperatures (for CPU, GPU)

There are many Temperature Monitoring Programs for this:
http://setiathome.berkeley.edu/forum_thread.php?id=59292
http://setiathome.berkeley.edu/forum_thread.php?id=62044

Any 'heavy' computing program (as almost all of projects running under BOINC, not only SETI) will Heat the computer if it have no adequate cooling.
Dust buildup in the heatsinks/fans (in the computer) is the most often cause for poor cooling.

The high temperature probably killed your GPU (NVIDIA)

I just made a post about TThrottle here:
http://setiathome.berkeley.edu/forum_thread.php?id=68948&postid=1277084#1277084

But it is listed at this site from long time ago if you click PARTICIPATE -> Add-on software:
http://boinc.berkeley.edu/addons.php

(No program can revive your GPU if it is already burned, but if you see some picture on your monitor (you say "I keep getting a command screen") the GPU may still be alive.
You may try to load/boot OS (e.g. Linux) directly from a LiveDVD to see is the monitor/GPU working in graphics mode.

If you "can't boot from the DVD drive" this computer fault may be a different issue (e.g. PSU problem - not enough electrical current to 'feed' CPU+GPU+HDD+DVD++).
Or you may have 2 unrelated problems: GPU burn because of heat, DVD drive fault because of dirty/weak/old laser or bad/scratched DVD disk.
)

P.S.
You may see some strange example of 'burn' here:
MSI Mainboard no good for BOINC; can/will fry at 100% use (admitted by manufacturer):
http://setiathome.berkeley.edu/forum_thread.php?id=69178


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1277092 · Report as offensive
Profile Interstel
Avatar

Send message
Joined: 29 Nov 01
Posts: 23
Credit: 2,231,105
RAC: 0
United States
Message 1277276 - Posted: 29 Aug 2012, 20:21:43 UTC - in response to Message 1277092.  


You very probably didn't watch for Temperatures (for CPU, GPU)

There are many Temperature Monitoring Programs for this:
http://setiathome.berkeley.edu/forum_thread.php?id=59292
http://setiathome.berkeley.edu/forum_thread.php?id=62044

Any 'heavy' computing program (as almost all of projects running under BOINC, not only SETI) will Heat the computer if it have no adequate cooling.
Dust buildup in the heatsinks/fans (in the computer) is the most often cause for poor cooling.

The high temperature probably killed your GPU (NVIDIA)

I just made a post about TThrottle here:
http://setiathome.berkeley.edu/forum_thread.php?id=68948&postid=1277084#1277084




I personally found out long ago that mounting several 120mm fans to help push air at the CPU's and at the GPU's helps keep the temperature down. Otherwise my system would cook itself. Even with cleaning the fans regularly I find they have to be replaced on a regular schedule so I found a good EBAY vendor that sells them at a reasonable price and always keep a supply of 120's and 50's around.

James

Joined SETI@Home in 2001
Online since ArpNET days
First activity on Honeywell 1648
Series Mainframe in 1975 at age 12.
ID: 1277276 · Report as offensive
motugirl

Send message
Joined: 9 Feb 02
Posts: 3
Credit: 1,714,751
RAC: 0
United States
Message 1277355 - Posted: 30 Aug 2012, 1:05:55 UTC - in response to Message 1277092.  

Well, uh, non-geeks don't monitor temps. In over a decade of running SETI, I've never had a problem like this (caused by running SETI). But thanks for the info.

Anyway, it will boot from the DVD but ONLY if I disconnect the HD power first. And I did have it set to boot from DVD first. So I can read the recovery wizard from DVD - does that mean the GPU is OK? And does that mean the HD is bad, the PS is bad, or the CPU is bad? I am really trying to save some info on my HD and I'm running out of acronyms. Don't want to reformat except as last resort. Ideas?
ID: 1277355 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22149
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1277404 - Posted: 30 Aug 2012, 5:11:27 UTC - in response to Message 1277355.  

That sounds like your HD has a problem.
Nowt to do with S@H, BOINC, or any other "legitimate software, just a plain old bit of hardware that is getting towards the end of its life.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1277404 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1277545 - Posted: 30 Aug 2012, 13:46:50 UTC - in response to Message 1277355.  


How 'big' is the PSU (read the label on it)?
I mean e.g.: what brand/manufacturer/model, 500 W, +5V max 30A, +12V max 20A ...

Which of these is the Computer you speak about (e.g. ID: 6715925):
http://setiathome.berkeley.edu/hosts_user.php?sort=rpc_time&rev=0&show_all=1&userid=442383

No nVidia GPU shown (which means SETI can't run on your GPU so can't damage it)


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1277545 · Report as offensive
EldRick

Send message
Joined: 21 Jan 11
Posts: 4
Credit: 135,743
RAC: 0
United States
Message 1277582 - Posted: 30 Aug 2012, 15:42:16 UTC
Last modified: 30 Aug 2012, 15:42:55 UTC

Just to be clear and possibly save some others a lot of fooling around with it, the OpenCL support is Not Available for Mac, even though other BOINC applications work nicely with the ATI GPUs.

The OpenCL support for SETI is, sadly, windows-only, it would appear.
Sigh...
ID: 1277582 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1277586 - Posted: 30 Aug 2012, 15:49:07 UTC - in response to Message 1277582.  
Last modified: 30 Aug 2012, 15:49:26 UTC

If you know a Mac developer that wants to do the port, all you need do is have him contact me.

Until that point, I'm one person with very limited time.
@SETIEric@qoto.org (Mastodon)

ID: 1277586 · Report as offensive
Profile Julie
Avatar

Send message
Joined: 15 May 12
Posts: 279
Credit: 126,042
RAC: 0
United States
Message 1277859 - Posted: 31 Aug 2012, 1:58:34 UTC
Last modified: 31 Aug 2012, 2:01:03 UTC

Hi :)
Prolly an old question, but...
What does 'Unable to connect to core client' mean?

edit: weird, I closed it and reopened it and now it works.
Yaay!
I has a MiniCity :)
http://en-ki-du.myminicity.com
ID: 1277859 · Report as offensive
motugirl

Send message
Joined: 9 Feb 02
Posts: 3
Credit: 1,714,751
RAC: 0
United States
Message 1277877 - Posted: 31 Aug 2012, 3:13:34 UTC - in response to Message 1277545.  

ID: 6017961. 1 yr old. PSU: Liteon PS-5251-08. 5v/25A, 12v/14A, 3.3v/18A, -12v/.8A, 5vsb/2A. DC pwr 250W. I don't think it's just coincidence that it died while I was having those SETI/Astropulse problems. Guess that's what the fine print on the agreement is for, eh? I just wish they had warned us sooner before my warranty expired 2 months ago. I'll bury it in the backyard and grieve for my lost data.
ID: 1277877 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1277923 - Posted: 31 Aug 2012, 4:44:15 UTC - in response to Message 1277877.  

Given the results, it could be a bad fan or a clogged heat sink on the processor. Before you toss it or swap the PSU, look for dust, make sure the fans are turning, and check that the CPU heat sink is properly seated. Those plastic push pin style heat sink legs will sometimes let go.
@SETIEric@qoto.org (Mastodon)

ID: 1277923 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1277928 - Posted: 31 Aug 2012, 5:04:58 UTC - in response to Message 1277859.  

Hi :)
Prolly an old question, but...
What does 'Unable to connect to core client' mean?

edit: weird, I closed it and reopened it and now it works.
Yaay!

Usually means that something is tying up the system and it can't sort things in a reasonable amount of time, so it errors out.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1277928 · Report as offensive
Vepide
Volunteer tester

Send message
Joined: 2 Feb 08
Posts: 6
Credit: 103,183
RAC: 0
United States
Message 1277977 - Posted: 31 Aug 2012, 7:33:33 UTC - in response to Message 1277877.  

From your Win7 DVD Recovery Options and Command Prompt:
bootrec.exe /fixmbr

also run a CHKDSK c: /f the /f means fix your file system.
and select yes on next reboot because it cannot unmount the volume while in Windows.

If you're using two hard drives in RAID 0 and you get a RAID volume error(s) at POST, and you want to recover your RAID...have a spare drive with Win7 updated and installed with the Intel RST Program interface which allows you to clear the
RAID error condition or reset the drives to NORMAL seen at POST and RAID init screens. The Intel RST utility does not reset the RAID to a NON RAID (NORMAL),
it clears the RAID error condition. All of your data remains safe. But you still may want to run a chkdsk. Be prepared to swap your data cables between the RAID drives and spare drives or have a spare hard drive data cable for this procedure. You will also have to frequent your BIOS's boot order settings while swapping drives...you should intuitively be able to advance from one step to the next in this process.
ID: 1277977 · Report as offensive
Vepide
Volunteer tester

Send message
Joined: 2 Feb 08
Posts: 6
Credit: 103,183
RAC: 0
United States
Message 1277980 - Posted: 31 Aug 2012, 7:48:48 UTC - in response to Message 1277977.  

Motugirl

You might also try a SYSTEM RESTORE from your DVD if that procedure can find valid restore points on the drive, and only if you have system restore enabled to have restore/recovery points. Pick the most recent one.

Google any further information on System Restore and study a variety of posts on the forums regarding it. You want the information that corroberates and compares well. And you can practice the procedure knowing how far to go into the procedure before backing out of it. You should immediately create a new user created restore point, but there is a max limit on restore points and windows update will ovewrite them, so you should be aware of your restore point status, and view available restore points from time to time. Otherwise it can be a big data backup and reinstall problem.
ID: 1277980 · Report as offensive
Paulie

Send message
Joined: 10 Nov 11
Posts: 9
Credit: 3,589,538
RAC: 6
United States
Message 1278438 - Posted: 31 Aug 2012, 20:11:06 UTC

GTX 550TI

Stopped receiving the multiple SETIs when the bug was found.

They started back again, almost a dozen at one time ...
ID: 1278438 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1278443 - Posted: 31 Aug 2012, 20:26:57 UTC - in response to Message 1278438.  

GTX 550TI

Stopped receiving the multiple SETIs when the bug was found.

They started back again, almost a dozen at one time ...

This thread is not about receiving multiple tasks, as that's normal. BOINC will be able to cache multiple (hundreds) of tasks.

The thread is about older, pre-BOINC 7.0 versions running multiple tasks (tens) at the same time on one GPU. That shouldn't be done, and has already been fixed.
ID: 1278443 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : News : Bug in server affecting older BOINC clients with NVIDIA GPUs.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.