Non-Server Strangeness & question

Message boards : Number crunching : Non-Server Strangeness & question
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1174707 - Posted: 30 Nov 2011, 18:19:01 UTC

O.k., I give up. Maybe someone else has an idea. I can only hope.

This computer:

http://setiathome.berkeley.edu/results.php?hostid=6200108

This is a "newer" build having been stuck together and fired-up October 13. I have changed processors in it since then (a week or ten days ago), but it isn't anything weird. It's a Phenom II x 6 1045 and a GTX 460 running Win 7 Home, 64-bit, loaded with BOINC 6.12.34 (like two others).

It's on the same wireless network and router as three of my other computers.

The problem:

This computer, uniquely, will not ask for and therefore does not receive enough tasks to keep it busy.

Before you ask, yes it is set to ask for all types of tasks and the cache days are set to 10. Yes, it does get CUDA work, but not very often (it doesn't ask very often).

I've sat and watched while it communicates and reports completed tasks and says "not requesting new tasks" when it has maybe five or six CPU tasks and no GPU tasks. It has accumulated 50 CPU tasks, and even 50 CPU and some GPU, but never anything approaching the "limited" maximum of 200 GPU and 50 per CPU.

By my figures, it should be able to ask for 250 CPU tasks and another 200 GPU tasks anyway, but it "reporting 5 completed tasks, not requestion new work" over and over and over.

It must be me, right? Not that I can tell.

On November 28 I stuck my old GT 240 in a computer I've converted to 6.10.60 BOINC and Lunatic's unified 39 for a while. For some reason or another it has 400+ tasks Pending although it only has a dual core Athlon II and the GT 240 in it. That's the same router, same network, etc. In fact, it is sitting on top of the computer in question.

This computer has been worrying me stupid since I built it. It just never has asked for work consistently (but it does ask occasionally) and it's never been able to gather nuts for the winter. It took seemingly forever to crunch and validate its first 100 tasks.

I'm wondering, because I've never tried such a thing before, what happens if I install 6.10.60 over what's already there, then reinstall Lunatics? If I've shut it all down correctly, will it abandon whatever few tasks it has? Will it change anything in a meaningful way?

And why is only this one acting this way? That's the real mystery. "The luck of the draw" seems a poor hypothesis since this is consistent since October 13.

Thanks for any advice or direction.



ID: 1174707 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1174709 - Posted: 30 Nov 2011, 18:24:57 UTC - in response to Message 1174707.  

http://setiathome.berkeley.edu/results.php?hostid=6200108

Go to Projects tab, highlight SETI, click properties: report what you see.

I'd be interested in the priority values (scheduling and work fetch, CPU and GPU), and DCF at the bottom.
ID: 1174709 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1174710 - Posted: 30 Nov 2011, 18:31:36 UTC - in response to Message 1174709.  
Last modified: 30 Nov 2011, 18:32:02 UTC

http://setiathome.berkeley.edu/results.php?hostid=6200108

Go to Projects tab, highlight SETI, click properties: report what you see.

I'd be interested in the priority values (scheduling and work fetch, CPU and GPU), and DCF at the bottom.


Will do. It will be late tonight.

Edit - Thank you.
ID: 1174710 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1174713 - Posted: 30 Nov 2011, 18:44:35 UTC

If nothing special turns up there (and you have excluded having accidentally set a small local cache and no stuck downloads) it's time for the BOINC entrails oracle.

Do the following please:

Stick this into cc_config.xml

<cc_config>
<log_flags>
 <work_fetch_debug>0</work_fetch_debug>
</log_flags>
</cc_config>


change to 1, save, read config file (from BM advanced menu), change back to 0, save, read config file again. You should end up with exactly one workfetchdebug output in the event log/stdoutdae.txt. You don't want it permanently on as it does one output per minute. Copy and post.

You may have to wait until I can look again. reading wfd output is very specialised knowledge.
ID: 1174713 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1174714 - Posted: 30 Nov 2011, 18:45:42 UTC - in response to Message 1174707.  

Before you ask, yes it is set to ask for all types of tasks and the cache days are set to 10.

Where - locally or online? Did you check/clear local preferences?

Gruß,
Gundolf
ID: 1174714 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1174717 - Posted: 30 Nov 2011, 18:49:07 UTC - in response to Message 1174713.  

Er, that would be

<work_fetch_debug>1</work_fetch_debug>
ID: 1174717 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1174719 - Posted: 30 Nov 2011, 19:06:48 UTC - in response to Message 1174714.  

Before you ask, yes it is set to ask for all types of tasks and the cache days are set to 10.

Where - locally or online? Did you check/clear local preferences?

Gruß,
Gundolf


I've messed-around with both. I'll bet Richard's onto the problem. Since I've never intervened in my other installations and they work, I didn't think to check this one. The DCF has probably gone completely goofy.
ID: 1174719 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1174725 - Posted: 30 Nov 2011, 19:18:30 UTC - in response to Message 1174719.  

Before you ask, yes it is set to ask for all types of tasks and the cache days are set to 10.

Where - locally or online? Did you check/clear local preferences?

Gruß,
Gundolf


I've messed-around with both. I'll bet Richard's onto the problem. Since I've never intervened in my other installations and they work, I didn't think to check this one. The DCF has probably gone completely goofy.

With connection issues I have seen the value for "While BOINC running, % of time host has an Internet connection" drop to such a low % that a machine would not request new work. Even with 0 in the queue & processing nothing.

You can also see the value locally in your client_state.xml file. In this section:
<time_stats>
    <on_frac>1.000000</on_frac>
    <connected_frac>1.000000</connected_frac>
    <active_frac>1.000000</active_frac>
    <gpu_active_frac>1.000000</gpu_active_frac>
    <last_update>1500000000.000000</last_update>
</time_stats>


If you are going to edit these values client side be sure you have stopped BOINC before doing so. Otherwise you can trash all the work you have in your queue. At least I have managed to do so in the past.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1174725 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1174728 - Posted: 30 Nov 2011, 19:20:59 UTC - in response to Message 1174719.  

Before you ask, yes it is set to ask for all types of tasks and the cache days are set to 10.

Where - locally or online? Did you check/clear local preferences?

Gruß,
Gundolf

I've messed-around with both. I'll bet Richard's onto the problem. Since I've never intervened in my other installations and they work, I didn't think to check this one. The DCF has probably gone completely goofy.

You can check that online, even if you're not in front of the computer, if it's reported recently. DCF is still reported and displayed in host details (to the logged-in owner, I can't see yours).

Work fetch drops to a trickle if DCF < 0.02: I have one at 0.056367, but nothing lower than that.
ID: 1174728 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1174749 - Posted: 30 Nov 2011, 20:25:00 UTC - in response to Message 1174707.  

I'm wondering, because I've never tried such a thing before, what happens if I install 6.10.60 over what's already there, then reinstall Lunatics? If I've shut it all down correctly, will it abandon whatever few tasks it has? Will it change anything in a meaningful way?


Ive did it with only a small issue (the same in 2 rigs), when the installation finished it told me that something was not registered, so Ive had to run the install again, selecting the "repair" option. After this second installation everything worked without losing anything and there was no need to re-install the Lunatics again. (I did the installations with boinc -manager and client- completelly off)

There are some cosmetics differences between 6.10 and 6.12, like the order of the columns in the task lists, the lack of some options for the statistics graphics, the lack of the notices tab and also the event log is a tab instead of a separate window...

But the schedulling for file transfers is so much better (retry times and project backoffs in the order of minutes/hours instead of exponential centuries) that nothing else matters ... :D

ID: 1174749 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1174853 - Posted: 1 Dec 2011, 5:55:05 UTC - in response to Message 1174728.  



Work fetch drops to a trickle if DCF < 0.02: I have one at 0.056367, but nothing lower than that.


Mine's at 0.79081.


ID: 1174853 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1174855 - Posted: 1 Dec 2011, 5:57:06 UTC - in response to Message 1174725.  
Last modified: 1 Dec 2011, 6:01:30 UTC



With connection issues I have seen the value for "While BOINC running, % of time host has an Internet connection" drop to such a low % that a machine would not request new work. Even with 0 in the queue & processing nothing.



Sitting at:

% of time BOINC is running 99.7882 %

While BOINC running, % of time host has an Internet connection 99.9746 %
ID: 1174855 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1174857 - Posted: 1 Dec 2011, 6:00:54 UTC - in response to Message 1174749.  



Ive did it with only a small issue (the same in 2 rigs), when the installation finished it told me that something was not registered, so Ive had to run the install again, selecting the "repair" option. After this second installation everything worked without losing anything and there was no need to re-install the Lunatics again. (I did the installations with boinc -manager and client- completelly off)

<snip>

But the schedulling for file transfers is so much better (retry times and project backoffs in the order of minutes/hours instead of exponential centuries) that nothing else matters ... :D


That may be a lot of the problem. I've watched the stupid thing go from a retry of minutes to 9+ hours, while I was looking at it. (yes, I have no life)

Punching "Retry Now" often gets an immediate download.

I think I hate 6.12.34.
ID: 1174857 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1174891 - Posted: 1 Dec 2011, 10:46:41 UTC - in response to Message 1174725.  

Before you ask, yes it is set to ask for all types of tasks and the cache days are set to 10.

Where - locally or online? Did you check/clear local preferences?

Gruß,
Gundolf


I've messed-around with both. I'll bet Richard's onto the problem. Since I've never intervened in my other installations and they work, I didn't think to check this one. The DCF has probably gone completely goofy.

With connection issues I have seen the value for "While BOINC running, % of time host has an Internet connection" drop to such a low % that a machine would not request new work. Even with 0 in the queue & processing nothing.

You can also see the value locally in your client_state.xml file. In this section:
<time_stats>
    <on_frac>1.000000</on_frac>
    <connected_frac>1.000000</connected_frac>
    <active_frac>1.000000</active_frac>
    <gpu_active_frac>1.000000</gpu_active_frac>
    <last_update>1500000000.000000</last_update>
</time_stats>


If you are going to edit these values client side be sure you have stopped BOINC before doing so. Otherwise you can trash all the work you have in your queue. At least I have managed to do so in the past.


AFAIK data from client_state.xml is read only at BOINC startup. After that client_state.xml gets writen to only. IF you want to edit it, you need to shut down boinc first.
ID: 1174891 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1174896 - Posted: 1 Dec 2011, 10:59:54 UTC - in response to Message 1174857.  
Last modified: 1 Dec 2011, 11:00:11 UTC

But the schedulling for file transfers is so much better (retry times and project backoffs in the order of minutes/hours instead of exponential centuries) that nothing else matters ... :D


That may be a lot of the problem. I've watched the stupid thing go from a retry of minutes to 9+ hours, while I was looking at it. (yes, I have no life)

Punching "Retry Now" often gets an immediate download.

I think I hate 6.12.34.


David thought long project backoffs were better than a project specific 'no net' button. When the project is really down that makes sense. In our usual 'can't get through' it's just bloody annoying. It got marginally better in 6.13 when the DL keep running and a successful DL will clear the project backoff again.
Whenever there is congestion keeping an eye on the transfertab and clearing those excessive backoffs may be all the difference of keeping a big rig going. Of course, if you can't babysit BOINC...
ID: 1174896 · Report as offensive
Profile Ronald R CODNEY
Avatar

Send message
Joined: 19 Nov 11
Posts: 87
Credit: 420,920
RAC: 0
United States
Message 1174897 - Posted: 1 Dec 2011, 11:27:25 UTC

Query: For a single cruncher (No accessible GPU)(4 processor), I've accumulated 10.5k WU's in 10 days. Is this decent?
ID: 1174897 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1174900 - Posted: 1 Dec 2011, 11:56:48 UTC - in response to Message 1174897.  

Query: For a single cruncher (No accessible GPU)(4 processor), I've accumulated 10.5k WU's in 10 days. Is this decent?

Welcome to SETI.

A small terminology correction. A WU, (otherwise known as a task, result, or job), is the search for ET that you see in BOINC Manager.

What you have 10.5K of are 'credits' or cobblestones. The science jobs - or WUs - are of differing lengths, and you earn between say 30 and 150 (in round numbers) credits per WU. And before you ask - no, credits are worth nothing, but they are a useful measure of progress.

You don't always get a credit immediately on completion of a WU - all work here is computed, or 'crunched', on at least two users' computers, and credit is only granted when the results agree. So you have quite a bit more credit in the bank, in the form of pending tasks (as I'd actually prefer to call them).

You've been having a bit of difficulty with downloading jobs - but then again, so have we all, recently. I'd keep an eye on your interent connection, if I were you.

As you settle in and get to know the folks here, you'll get advice and help on how to improve your production rate. But for the moment, I'd say 10.5K credits in 10 days is a decent start, yes. Best of luck with your next 10K, and many more after that.
ID: 1174900 · Report as offensive
Profile Ronald R CODNEY
Avatar

Send message
Joined: 19 Nov 11
Posts: 87
Credit: 420,920
RAC: 0
United States
Message 1174935 - Posted: 1 Dec 2011, 13:40:41 UTC - in response to Message 1174900.  

Thanks Rich for the response. Understand about the WU/credit terminology, my bad there.
ID: 1174935 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1174965 - Posted: 1 Dec 2011, 17:01:40 UTC - in response to Message 1174896.  

But the schedulling for file transfers is so much better (retry times and project backoffs in the order of minutes/hours instead of exponential centuries) that nothing else matters ... :D


That may be a lot of the problem. I've watched the stupid thing go from a retry of minutes to 9+ hours, while I was looking at it. (yes, I have no life)

Punching "Retry Now" often gets an immediate download.

I think I hate 6.12.34.


David thought long project backoffs were better than a project specific 'no net' button. When the project is really down that makes sense. In our usual 'can't get through' it's just bloody annoying. It got marginally better in 6.13 when the DL keep running and a successful DL will clear the project backoff again.
Whenever there is congestion keeping an eye on the transfertab and clearing those excessive backoffs may be all the difference of keeping a big rig going. Of course, if you can't babysit BOINC...


Incremental retry times and overall project backoff is not a bad idea per se.
What it is totally wrong is that the first time a file transfer fails it gets a retry delay of several hours.
With Boinc 6.12, without babysitting it I can't get work for Seti at all cause there is allways a file delayed in retry mode so the scheduller dosnt fetch new tasks, and of course, the cache gets filled with tasks for other projects.

Retry times should increase slowly at the first tries, they should rise up faster if there are consecutive failures without any efective bit transfered in between and it should go to a minimun if there was some bit transfered since the last try...

This way, if a project is down it will reach longer delays really faster while projects like Seti, with stressed bandwith will still able to keep the clients feeded...

OTOH, what is the point in beeing so worried about not stressing a project that is already down? if it is really down, It can't get worse... ;-)

ID: 1174965 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1174999 - Posted: 1 Dec 2011, 18:46:45 UTC
Last modified: 1 Dec 2011, 18:48:55 UTC

Along with the backoffs, a real big PITA is Boinc's suppressing work requests when even a single download is in retry status. This is pure BS.
If there are 10 or more stuck downloads, I could perhaps see not asking for more work until they clear. But when downloads are going slowly and even 1 goes into retry seconds before the next scheduler request is made, it is crazy not to allow the host to ask for more work if it needs it.

And this goes for uploads as well....
Why the **** suppress requests for more work just because the upload server may be down????
If the host is processing work and getting downloads, what is the point in not allowing it to get more work just because it can't report for a bit????
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1174999 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Non-Server Strangeness & question


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.