Discussion of Invalid Host Messaging


log in

Advanced search

Message boards : Number crunching : Discussion of Invalid Host Messaging

Previous · 1 · 2 · 3 · 4 · 5 · Next
Author Message
Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12258
Credit: 2,545,476
RAC: 266
Netherlands
Message 1299572 - Posted: 28 Oct 2012, 0:20:37 UTC - in response to Message 1299565.
Last modified: 28 Oct 2012, 0:26:12 UTC

Good question, I just add new machines to the default profile. I have just started crunching for Einstein@Home and GPU crunching was on by default.

So probably yes.

The use of the GPU in the project preferences is on by default.
However, the default computing preferences for "Suspend GPU work while computer is in use?" is set to yes, so as not to interfere with the use of the computer. In that case, BOINC will only use the GPU when the computer is idle.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 823
Credit: 1,544,690
RAC: 366
Germany
Message 1299733 - Posted: 28 Oct 2012, 12:06:42 UTC - in response to Message 1299086.

I got a rather unusual reply from 1 person via PM today, rob99999_2 ID 129322, the owner of Computer 6797265, which produces nothing but errors from its GT 650M but here's his reply.

i am not going to make a thread when i am not going to check it. I have downloaded ths lastest software and the lastest drivers so if there is any issues SETI needs to get their shit straight.


Not a very nice attitude IMO. :(

Cheers.


Well... rob99999_2 is right.

No, he is not. He is the owner/user of his computer and only he is responsible for what this computer is doing, just like he has to watch, that it's not sending out spam mails or participate in DDoS attacs, so he has to watch what it is doing with the SETI WUs, that it gets assigned.

Just like a car driver is responsible for his car, listen to it and watch how it behaves carefully and if you suspect, that something might be wrong, stop and call help, if you can't fix it yourself.
____________
.

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 6805
Credit: 24,503,342
RAC: 26,880
United Kingdom
Message 1299757 - Posted: 28 Oct 2012, 13:21:09 UTC - in response to Message 1299733.

I got a rather unusual reply from 1 person via PM today, rob99999_2 ID 129322, the owner of Computer 6797265, which produces nothing but errors from its GT 650M but here's his reply.

i am not going to make a thread when i am not going to check it. I have downloaded ths lastest software and the lastest drivers so if there is any issues SETI needs to get their shit straight.


Not a very nice attitude IMO. :(

Cheers.


Well... rob99999_2 is right.

No, he is not. He is the owner/user of his computer and only he is responsible for what this computer is doing, just like he has to watch, that it's not sending out spam mails or participate in DDoS attacs, so he has to watch what it is doing with the SETI WUs, that it gets assigned.

Just like a car driver is responsible for his car, listen to it and watch how it behaves carefully and if you suspect, that something might be wrong, stop and call help, if you can't fix it yourself.


Nothing is wrong with his car nothing is wrong with his engine he has it serviced correctly, however the manufacturer has failed to tell him that there is a fault that will mean his engine is about to breakdown. Can the manufacturer fix it, no you have to do it yourself. Or stop using the car!

____________


Today is life, the only life we're sure of. Make the most of today.

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4199
Credit: 1,029,305
RAC: 253
United States
Message 1299800 - Posted: 28 Oct 2012, 15:40:35 UTC

A car is a deadly weapon so there is a definite moral obligation to keep it well maintained, backed most places by legal requirements.

A system crunching SaH data without producing correct results is turning electrical energy into heat energy without advancing the scientific search we're trying to do. Because the user presumably intended to help the cause, sending a heads up message when there's good evidence the user hasn't noticed the problem makes sense.

The additional load on the servers caused by systems gone bad can't be separated from cases where the user has decided to stop crunching, etc. But there's an easy way to see how much overall waste there is. If there were no waste, the ratio of "Results waiting for db purging" to "Workunits waiting for db purging" would be exactly 2. In practice the MB ratio is usually between 2.1 and 2.2 indicating waste of 5 to 10 percent.

Joe

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 823
Credit: 1,544,690
RAC: 366
Germany
Message 1299832 - Posted: 28 Oct 2012, 16:47:41 UTC - in response to Message 1299757.

I got a rather unusual reply from 1 person via PM today, rob99999_2 ID 129322, the owner of Computer 6797265, which produces nothing but errors from its GT 650M but here's his reply.

i am not going to make a thread when i am not going to check it. I have downloaded ths lastest software and the lastest drivers so if there is any issues SETI needs to get their shit straight.


Not a very nice attitude IMO. :(

Cheers.


Well... rob99999_2 is right.

No, he is not. He is the owner/user of his computer and only he is responsible for what this computer is doing, just like he has to watch, that it's not sending out spam mails or participate in DDoS attacs, so he has to watch what it is doing with the SETI WUs, that it gets assigned.

Just like a car driver is responsible for his car, listen to it and watch how it behaves carefully and if you suspect, that something might be wrong, stop and call help, if you can't fix it yourself.


Nothing is wrong with his car nothing is wrong with his engine he has it serviced correctly, however the manufacturer has failed to tell him that there is a fault that will mean his engine is about to breakdown. Can the manufacturer fix it, no you have to do it yourself. Or stop using the car!

I know that car-computer comparisons are crap, but sometimes I have not a better one. Point is: his computer fails, he should be the first who notices it and see if he can fix it or ask for help.

I have the similar situation with Milkyway right now: my old ATI HD3850 can only run the older (not really supported anymore) CAL application, I have to watch if new batches of WUs are still compatible with it, if not I have to stop crunching. It was already once the case, I had to stop crunching for about a month, than it worked again. Wether old or new hardware, you have to watch it, something might always not work as expected. Specially after any changes on the system, for example if you buy a new card or install new drivers, you have to first see that it actually works before you let it do it's work without too much attention from your side. And I'm pretty sure that most of the owners of those 560Ti cards have skipped that part. Something like "set and forget" does not exist with computers anyway, even if many think so. In best case it's "set, see that it works and hope it lasts for a while".
____________
.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 823
Credit: 1,544,690
RAC: 366
Germany
Message 1299879 - Posted: 28 Oct 2012, 18:42:31 UTC - in response to Message 1299800.

A car is a deadly weapon so there is a definite moral obligation to keep it well maintained, backed most places by legal requirements.

Not all issues with a car make it more dangerous, if it's leaking a drop of oil every now and than, it's still safe to drive but bad for the environment. And so are such hosts for the SETI environment, they waste bandwidth and eventually (if two such hosts validate against eachother like fermi cards did before) even compromise the science.
____________
.

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 6805
Credit: 24,503,342
RAC: 26,880
United Kingdom
Message 1299920 - Posted: 28 Oct 2012, 20:16:27 UTC

Everyone is correct, however it means SETI@Home is not set and forget. It should have instructions posted that explain the problems with GPU crunching and a need to check on a regular basis to see if your results are valid. Also to warn people that if they are not prepared to do this they could return invalid results and it is best they don't crunch using a GPU.

They especially will need to check when updating the GPU drivers as this has introduced several bugs in the past. Also before buying a new modern graphics card please check on the forums to see if it will work with SET@Home and or the latest drivers! If you are unsure of any of this please do not attach your computer to SETI@Home.

In real terms that is what anyone crunching needs to be aware of

Of course no one want to post that on the front page but something like that is needed. I am aware that updates are due, but who is to say that in 3 or 6 months time a new GPU or driver. won't start this whole thing off again.
____________


Today is life, the only life we're sure of. Make the most of today.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 823
Credit: 1,544,690
RAC: 366
Germany
Message 1299945 - Posted: 28 Oct 2012, 21:12:53 UTC - in response to Message 1299920.
Last modified: 28 Oct 2012, 21:40:33 UTC

GPU computing disabled by default and a red "READ THIS FIRST" link to a page with a short info like the one you posted would be IMO a good solution. CPU-only crunching might be "set and forget", GPU crunching is not.

An alternative would be a better quota system, one which counts invalids as errors and which expects something better than 1 valid result out of 50. 98% failure rate can't be "OK", even 50% would be IMHO way to much, but should be good enough to start with.

I mean it's not just SETI, I crunch also for Milkyway and Collatz and issues like that, i.e. with new hardware or drivers occur on those projects as well everytime nVidia or ATI comes up with something "revolutionary". Hence I don't see it as a fault of the project staff, if their apps don't run properly on a new hardware.
____________
.

Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 6805
Credit: 24,503,342
RAC: 26,880
United Kingdom
Message 1299961 - Posted: 28 Oct 2012, 22:01:07 UTC
Last modified: 28 Oct 2012, 22:02:37 UTC

Hence I don't see it as a fault of the project staff, if their apps don't run properly on a new hardware.


No possibly not, but if GPU crunching is not "set and forget" both current and prospective users need to know, otherwise as you say we could end up with errors validating against each other, corrupting the science!

Users like rob99999_2 need to know what they are getting into.
____________


Today is life, the only life we're sure of. Make the most of today.

Murat Adas
Send message
Joined: 9 Aug 10
Posts: 1
Credit: 1,585,782
RAC: 0
Australia
Message 1300176 - Posted: 29 Oct 2012, 11:07:29 UTC - in response to Message 1297944.

After reading NVidia driver problems which cause computation errors by Richard Haselgrove
I've changed my avanced power settings, below are the steps I used to accomplish this (Windows 7)
right clicking on desktop selecting personalize
then selected screen saver
made sure I have none selected for screen saver
then clicked on Change power settings
next I clicked Change plan settings
made sure Turn off the display and Put the computer to sleep are Never
Finally I clicked on Change advanced power settings
Under the toolbar Sleep - Allow hybrid sleep
I turned the setting to "Off". (default was On)

I hope this helps, if not please let me know weather to roll back to a previous driver?

Thanks

Profile Tron
Send message
Joined: 16 Aug 09
Posts: 180
Credit: 2,236,055
RAC: 0
United States
Message 1300272 - Posted: 29 Oct 2012, 17:13:47 UTC - in response to Message 1300176.

After reading NVidia driver problems which cause computation errors by Richard Haselgrove
I've changed my avanced power settings, below are the steps I used to accomplish this (Windows 7)
right clicking on desktop selecting personalize
then selected screen saver
made sure I have none selected for screen saver
then clicked on Change power settings
next I clicked Change plan settings
made sure Turn off the display and Put the computer to sleep are Never
Finally I clicked on Change advanced power settings
Under the toolbar Sleep - Allow hybrid sleep
I turned the setting to "Off". (default was On)

I hope this helps, if not please let me know weather to roll back to a previous driver?

Thanks


Thank You for posting. It will take a few days for the dust to settle on the invalids before you can see for sure whether the changes you made helped.
You can keep an eye on your finished tasks in the mean time , .. watch for short run times , those tend to be the -9 error you are experiencing.

Profile BilBg
Volunteer tester
Avatar
Send message
Joined: 27 May 07
Posts: 2568
Credit: 5,860,862
RAC: 2,165
Bulgaria
Message 1300629 - Posted: 31 Oct 2012, 2:32:48 UTC - in response to Message 1300176.

After reading NVidia driver problems which cause computation errors by Richard Haselgrove
I've changed my avanced power settings ...

Read again:
1) Sleeping Monitor Bug
Drivers affected: 295.51 (BETA), 295.73 and 296.10

You use driver: 306.97 so what you did was not needed.

You also don't have 'Kepler' GPU (GT 6xx and GTX 6xx) so the other (CUDA_GRID_SIZE_COMPAT) fix do not apply to you.

GTX 560 Ti problems are 'famous' and not related to 'Sleeping Monitor Bug' nor 'Kepler'

Read 'a few' threads about GTX 560 Ti problems:
http://www.google.com/#hl=en&q=560+Ti+problems+site:setiathome.berkeley.edu


____________



- ALF - "Find out what you don't do well ..... then don't do it!" :)

Thndr
Send message
Joined: 4 Mar 02
Posts: 18
Credit: 767,713
RAC: 0
United States
Message 1302137 - Posted: 4 Nov 2012, 16:32:29 UTC

I have told it not to use gpu and set power settings to never turn off monitor. let me know.

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4041
Credit: 32,692,387
RAC: 699
United Kingdom
Message 1302158 - Posted: 4 Nov 2012, 17:02:42 UTC - in response to Message 1302137.
Last modified: 4 Nov 2012, 17:04:59 UTC

I have told it not to use gpu and set power settings to never turn off monitor. let me know.


Setting power settings to never turn off your monitor won't help, you're not running 295.xx or 296.xx drivers,

Looking at your inconclusive/errored tasks, they are a mixture of CPU and GPU tasks,
all the one's i looked at all say 'Restarted at 100.00 percent.' which is strange,
then looking at the stderr.txt results, multiple tasks have the same result, eithier:

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<stderr_txt>
Restarted at 100.00 percent.

Flopcounter: 48049228806222.281000

Spike count: 1
Pulse count: 1
Triplet count: 8
Gaussian count: 0
called boinc_finish

</stderr_txt>
]]>

http://setiathome.berkeley.edu/result.php?resultid=2686739603

http://setiathome.berkeley.edu/result.php?resultid=2686762450

http://setiathome.berkeley.edu/result.php?resultid=2686762474

http://setiathome.berkeley.edu/result.php?resultid=2688750025

Or:

Spike count: 10
Pulse count: 0
Triplet count: 0
Gaussian count: 3
called boinc_finish

http://setiathome.berkeley.edu/result.php?resultid=2687146759

http://setiathome.berkeley.edu/result.php?resultid=2687140765

Or:

Spike count: 14
Pulse count: 5
Triplet count: 12
Gaussian count: 0

http://setiathome.berkeley.edu/result.php?resultid=2687140761

http://setiathome.berkeley.edu/result.php?resultid=2686771628

Looks like your slot directories aren't getting cleared for some reason,

Please post your Boinc startup messages from the Event Log, the first 30 lines will do.

Claggy

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4041
Credit: 32,692,387
RAC: 699
United Kingdom
Message 1303041 - Posted: 7 Nov 2012, 10:15:57 UTC - in response to Message 1302158.
Last modified: 7 Nov 2012, 10:16:40 UTC

Looks as if Thndr has fixed his problems with his slot directories, he's now fully completing 6.03, 6.10 (cuda_fermi), AstroPulse v6 v6.01 and AstroPulse v6 v6.04 (cuda_opencl_100) tasks,
although it would have been good if he had responded and told us what he found and did.

All tasks for computer 6432659

Claggy

Thndr
Send message
Joined: 4 Mar 02
Posts: 18
Credit: 767,713
RAC: 0
United States
Message 1305991 - Posted: 14 Nov 2012, 4:36:27 UTC - in response to Message 1303041.

Looks as if Thndr has fixed his problems with his slot directories, he's now fully completing 6.03, 6.10 (cuda_fermi), AstroPulse v6 v6.01 and AstroPulse v6 v6.04 (cuda_opencl_100) tasks,
although it would have been good if he had responded and told us what he found and did.

All tasks for computer 6432659

Claggy


Well.... to make a long story short, I scrapped the boinc software and started over... that and I changed power settings and reset the project and environment but!! I'm back to 6.10 errors again! I checked everything and gpu usage was turned back on?? how?? Clearly this is NOT just a driver problem.
____________

Thndr
Send message
Joined: 4 Mar 02
Posts: 18
Credit: 767,713
RAC: 0
United States
Message 1306124 - Posted: 14 Nov 2012, 16:45:07 UTC - in response to Message 1305991.

I have removed the boinc manager from my machine again. I will watch this thread for further developments. I can not see wasting my efforts and messing up data packets until there is a fix.
____________

Claggy
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4041
Credit: 32,692,387
RAC: 699
United Kingdom
Message 1306191 - Posted: 14 Nov 2012, 20:06:32 UTC - in response to Message 1305991.
Last modified: 14 Nov 2012, 20:09:14 UTC

Looks as if Thndr has fixed his problems with his slot directories, he's now fully completing 6.03, 6.10 (cuda_fermi), AstroPulse v6 v6.01 and AstroPulse v6 v6.04 (cuda_opencl_100) tasks,
although it would have been good if he had responded and told us what he found and did.

All tasks for computer 6432659

Claggy


Well.... to make a long story short, I scrapped the boinc software and started over... that and I changed power settings and reset the project and environment but!! I'm back to 6.10 errors again! I checked everything and gpu usage was turned back on?? how?? Clearly this is NOT just a driver problem.

What enviromental setting? If it's the one in the 'NVidia driver problems which cause computation errors' thread, please Note that is for 6** Keplar GPUs only and is not required on your GTS 450,
and you also don't need to change power settings as you're not running 295.xx or 296.xx drivers,

Uninstalling and Reinstalling the Boinc software didn't help as that only installs the program, the Boinc Data directory is left intact, and that is where your problem is,
looking at your errored tasks still shows 'Restarted at 100.00 percent.', did you go and empty all the slot directories, did you delete them? or did you not touch them?

Please post your Boinc startup messages from the Event Log, the first 20 to 30 lines will do (I've already asked you for it once before)

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
setiathome_CUDA: Found 1 CUDA device(s):
Device 1 : GeForce GTS 450
totalGlobalMem = 1073414144
sharedMemPerBlock = 49152
regsPerBlock = 32768
warpSize = 32
memPitch = 2147483647
maxThreadsPerBlock = 1024
clockRate = 1566000
totalConstMem = 65536
major = 2
minor = 1
textureAlignment = 512
deviceOverlap = 1
multiProcessorCount = 4
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTS 450 is okay
SETI@home using CUDA accelerated device GeForce GTS 450
Restarted at 100.00 percent.

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0040173F read attempt to address 0x07EC7078

Engaging BOINC Windows Runtime Debugger...


Claggy

Thndr
Send message
Joined: 4 Mar 02
Posts: 18
Credit: 767,713
RAC: 0
United States
Message 1306228 - Posted: 14 Nov 2012, 21:49:53 UTC - in response to Message 1306191.

didnt delete anything
____________

Thndr
Send message
Joined: 4 Mar 02
Posts: 18
Credit: 767,713
RAC: 0
United States
Message 1306230 - Posted: 14 Nov 2012, 21:53:33 UTC - in response to Message 1306228.

didnt delete anything


program is completely uninstalled.
____________

Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Discussion of Invalid Host Messaging

Copyright © 2014 University of California