Calm a Llama Down (Feb 13 2008)

Message boards : Technical News : Calm a Llama Down (Feb 13 2008)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 712228 - Posted: 13 Feb 2008, 23:54:49 UTC

I'm realizing the server status page is giving a slightly bogus picture of our current server setup, and it's actually too much work right now to fix the status script, so I'll just tell you now what the current situation is: our public web server is thinman, our scheduling server is ptolemy, our upload server is bruno, and our download server is bane. None of these currently a redundant twin or a "hot" backup (but we have vader and maul all set up to be a replacement for any of the above if need be). More on that below Our primary/secondary BOINC (mysql) database servers are jocelyn/sidious, and our primary/secondary SETI science (informix) database servers are thumper/bambi. Specs for all these are correctly noted on the status page. We have other systems employed for less interesting but important things, but that's basically the meat of it. If we could double the CPU/memory/disk space on everything we have we'll be set (for the time being).

Anyway.. things are looking better. Weekly outage recovery is still a little weird - I don't think our single download server (bane) can handle such crunch periods alone so we'll probably bring vader back into the fold for that. The other servers are super happy given the recent changes to reduce NFS traffic. I enacted some more such changes this morning. This tweaking, coupled with server ewen (where Eric does his Hydrogen work) crashing and hanging the network a bit, made for a slightly bumpy ride this morning. However, between smoother seas and perhaps running "update stats" on a couple signal tables made the assimilators much faster. We'll finally catch up on that queue in a couple hours I think. Due to the reduced dropped connections on the scheduling/upload servers it seem that the router got more cycles to spend on downloads, and we reached almost 70Mbps last night. Still need to get that new router going...

Other than that - more mail drudgery. As much as I like computers, I hate when perfectly good but nevertheless wonky solutions to small problems become the foundations for advanced development, thus amplifying the original wonky-ness.

Oh yeah - Eric sent some graphs around. Looks like the radar blanking code is working. Neat. Jeff's working that code into the splitter now so we can retest that small data file and compare results.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 712228 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 712236 - Posted: 14 Feb 2008, 0:10:49 UTC


Thank You Kindly for that Information Matt . . .


BOINC Wiki . . .

Science Status Page . . .
ID: 712236 · Report as offensive
Profile [BAT]ptagi
Avatar

Send message
Joined: 7 Mar 07
Posts: 4
Credit: 61,338
RAC: 0
Belgium
Message 712329 - Posted: 14 Feb 2008, 3:38:44 UTC
Last modified: 14 Feb 2008, 3:41:51 UTC

I'm facing problems uploading my finsihed work. It keeps telling me project servers unavailable.
ID: 712329 · Report as offensive
Profile Kenn Benoît-Hutchins
Volunteer tester
Avatar

Send message
Joined: 24 Aug 99
Posts: 46
Credit: 18,091,320
RAC: 31
Canada
Message 712334 - Posted: 14 Feb 2008, 3:55:29 UTC

SETI@home Wed 13 Feb 1943:09 2008 Sending scheduler request: Requested by user. Requesting 283755 seconds of work, reporting 98 completed tasks
SETI@home Wed 13 Feb 19:43:14 2008 Scheduler request succeeded: got 0 new tasks
SETI@home Wed 13 Feb 19:43:14 2008 Message from server: No work sent
SETI@home Wed 13 Feb 19:43:14 2008 Message from server: (reached daily quota of 2 results)

What does that mean? I have my boinc manager set to retain three days of work.

Kenn


Kenn

What is left unsaid is neither heard, nor heeded.
Ce qui est laissé inexprimé ni n'est entendu, ni est observé.
ID: 712334 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 712335 - Posted: 14 Feb 2008, 3:59:42 UTC - in response to Message 712228.  
Last modified: 14 Feb 2008, 3:59:57 UTC

Updating statistics and indexes is regular DBA maintenance, just like backups (though not as often). If updating one table helped this much, doing the rest of the DB should help significantly too. Hard to say over the internet what the best interval for maintenance should be. When it gets slow, the DBA just does it. :)

That said, to keep the users off my case, I usually run weekly or bi-weekly for hard-core production servers (e.g., Oracle, MSSQL). I run monthly for not so used databases.

Also, I found an empty splitter file 24ja07af (zero bytes) on the status page. Is this any concern? Is this what happens before the file is "filled up"?
ID: 712335 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 712336 - Posted: 14 Feb 2008, 4:01:58 UTC - in response to Message 712334.  
Last modified: 14 Feb 2008, 4:02:34 UTC

SETI@home Wed 13 Feb 1943:09 2008 Sending scheduler request: Requested by user. Requesting 283755 seconds of work, reporting 98 completed tasks
SETI@home Wed 13 Feb 19:43:14 2008 Scheduler request succeeded: got 0 new tasks
SETI@home Wed 13 Feb 19:43:14 2008 Message from server: No work sent
SETI@home Wed 13 Feb 19:43:14 2008 Message from server: (reached daily quota of 2 results)

What does that mean? I have my boinc manager set to retain three days of work.
Kenn


Kenn:
This means your computer has gotten too much work but not returned enough successful work units. Make sure your WU are completed with success and your machine is not crashing or such. Over time, SETI servers will let you download more work with such validated successful WU (you have to wait for credit/validation).

If you need more help, please search the forums or ask your question in the "Number Crunching" forum.
ID: 712336 · Report as offensive
Profile Shane Meyer
Volunteer tester
Avatar

Send message
Joined: 22 Jan 00
Posts: 126
Credit: 31,280,265
RAC: 42
Australia
Message 712379 - Posted: 14 Feb 2008, 6:02:03 UTC

Kenn
Stop aborting WU's during downloading just let them come through
they will eventually!!
Or detaching
you need to complete some units for your download limit to be restored
ID: 712379 · Report as offensive
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 712385 - Posted: 14 Feb 2008, 6:19:10 UTC - in response to Message 712334.  

SETI@home Wed 13 Feb 1943:09 2008 Sending scheduler request: Requested by user. Requesting 283755 seconds of work, reporting 98 completed tasks
SETI@home Wed 13 Feb 19:43:14 2008 Scheduler request succeeded: got 0 new tasks
SETI@home Wed 13 Feb 19:43:14 2008 Message from server: No work sent
SETI@home Wed 13 Feb 19:43:14 2008 Message from server: (reached daily quota of 2 results)

What does that mean? I have my boinc manager set to retain three days of work.

Kenn


Because your computer seems to have wasted a bunch of work units due to aborted downloads, your computer's quota was reduced by one for each result that was wasted in this manner. BOINC throttles computers that generate invalid results mostly as a safety measure, so that a computer that is flaky due to overclocking, overheating, or bad hygiene (Yes, computers can accumulate dust, so open them up and clean them out from time to time so they don't overheat) cannot cause too much damage to a project. This damage can cause good work units to be tossed out because BOINC tosses out work units that accumulate too many errors.

Each result successfully processed will double your quota until it reaches the administratively set maximum quota or goes beyond it, in which it is forced back to just the maximum quota.
ID: 712385 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 712390 - Posted: 14 Feb 2008, 7:07:43 UTC - in response to Message 712228.  
Last modified: 14 Feb 2008, 7:10:19 UTC

I'm realizing the server status page is giving a slightly bogus picture of our current server setup, and it's actually too much work right now to fix the status script, so I'll just tell you now what the current situation is: our public web server is thinman, our scheduling server is ptolemy, our upload server is bruno, and our download server is bane. None of these currently a redundant twin or a "hot" backup (but we have vader and maul all set up to be a replacement for any of the above if need be). More on that below Our primary/secondary BOINC (mysql) database servers are jocelyn/sidious, and our primary/secondary SETI science (informix) database servers are thumper/bambi. Specs for all these are correctly noted on the status page. We have other systems employed for less interesting but important things, but that's basically the meat of it. If we could double the CPU/memory/disk space on everything we have we'll be set (for the time being).

Anyway.. things are looking better. Weekly outage recovery is still a little weird - I don't think our single download server (bane) can handle such crunch periods alone so we'll probably bring vader back into the fold for that. The other servers are super happy given the recent changes to reduce NFS traffic. I enacted some more such changes this morning. This tweaking, coupled with server ewen (where Eric does his Hydrogen work) crashing and hanging the network a bit, made for a slightly bumpy ride this morning. However, between smoother seas and perhaps running "update stats" on a couple signal tables made the assimilators much faster. We'll finally catch up on that queue in a couple hours I think. Due to the reduced dropped connections on the scheduling/upload servers it seem that the router got more cycles to spend on downloads, and we reached almost 70Mbps last night. Still need to get that new router going...

Other than that - more mail drudgery. As much as I like computers, I hate when perfectly good but nevertheless wonky solutions to small problems become the foundations for advanced development, thus amplifying the original wonky-ness.

Oh yeah - Eric sent some graphs around. Looks like the radar blanking code is working. Neat. Jeff's working that code into the splitter now so we can retest that small data file and compare results.

- Matt


+++++++++++++++ Thanx Matt , ++++++++++++++++++++

for the update on the situation @ SETI . Quite tricky, having no, or @ least to LITTLE, back up, equipment.
Is your NETWORK LIMIT 'theoretic', 100Mbit/s .?

Hope to be able to DONATE in the future
ID: 712390 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 712412 - Posted: 14 Feb 2008, 9:02:22 UTC - in response to Message 712385.  

Sadly I seem to be in the same position I was in a couple of weeks ago. Nothing is actually being downloaded to my work computer. It was still fine at home this a.m. but that is a much slower machine. I am being told here that access to the servers succeeded but nothing gets any further.

14/02/2008 08:52:02|SETI@home|Started download of 13ja07ae.26417.23385.10.7.33
14/02/2008 08:52:09||Access to reference site succeeded - project servers may be temporarily down.
14/02/2008 08:53:33||Project communication failed: attempting access to reference site
14/02/2008 08:53:33|SETI@home|Temporarily failed download of 24fe07ab.14574.2117.15.7.77: http error
14/02/2008 08:53:33|SETI@home|Backing off 1 min 0 sec on download of 24fe07ab.14574.2117.15.7.77
14/02/2008 08:53:39|SETI@home|Started download of 13ja07ae.7364.1299.11.7.204
14/02/2008 08:53:47||Access to reference site succeeded - project servers may be temporarily down.

Last time this happened, in my haste I detached from the project. When I tried again after the weekend, all went sailing through happily and has continued thus until last night. Therefore the problem does not seem to be at my end or in the intervening space between here and SETI. If the server page is showing a bogus picture, is there in reality a problem with Bane or is it something else? Meanwhile I shall be patient this time and not cause others problems by either aborting or detaching. Trouble is, I suspect this is why Seti gets deserters. My faith in the ultimate prize is tarnished but otherwise undiminished!


[/quote]
Because your computer seems to have wasted a bunch of work units due to aborted downloads, your computer's quota was reduced by one for each result that was wasted in this manner. BOINC throttles computers that generate invalid results mostly as a safety measure, so that a computer that is flaky due to overclocking, overheating, or bad hygiene (Yes, computers can accumulate dust, so open them up and clean them out from time to time so they don't overheat) cannot cause too much damage to a project. This damage can cause good work units to be tossed out because BOINC tosses out work units that accumulate too many errors.

Each result successfully processed will double your quota until it reaches the administratively set maximum quota or goes beyond it, in which it is forced back to just the maximum quota.[/quote]


ID: 712412 · Report as offensive
Profile AndyW Project Donor
Volunteer tester
Avatar

Send message
Joined: 23 Oct 02
Posts: 5862
Credit: 10,957,677
RAC: 18
United Kingdom
Message 712416 - Posted: 14 Feb 2008, 9:23:17 UTC

I have similar messages on all my machines this morning, so it looks like either a server or connectivity issue somewhere.
ID: 712416 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 712418 - Posted: 14 Feb 2008, 9:53:49 UTC - in response to Message 712416.  

Thank goodness it's not just me! The whole site seemed to be down for the best part of half an hour just now so I suspect the problems are in California.

I have similar messages on all my machines this morning, so it looks like either a server or connectivity issue somewhere.



ID: 712418 · Report as offensive
QSilver

Send message
Joined: 26 May 99
Posts: 232
Credit: 6,452,764
RAC: 0
United States
Message 712477 - Posted: 14 Feb 2008, 14:47:51 UTC - in response to Message 712418.  

Thank goodness it's not just me! The whole site seemed to be down for the best part of half an hour just now so I suspect the problems are in California.

I have similar messages on all my machines this morning, so it looks like either a server or connectivity issue somewhere.



Anyone who has upload problems, processing problems, etc. would be better served by reading the Number Crunching forum. Typically, widespread problems will get noticed very quickly by the inhabitants of that forum. They will also be able to quickly diagnose local problems that may only affect your rig/farm/set-up. For instance, there's an Upload problems?? thread that was updated about 2 hours ago (from this posting).

The threads in this forum are for the project managers to inform users of techinical issues related to project administration. Issues related to uploads, downloads, and processing are best discussed and resolved in Number Crunching.

Just my $1/50.
QS
ID: 712477 · Report as offensive
Profile lostcub

Send message
Joined: 2 May 03
Posts: 2
Credit: 1,122,746
RAC: 0
United States
Message 712504 - Posted: 14 Feb 2008, 16:06:04 UTC - in response to Message 712477.  

(SIGH)...At least I know NOW I didn't do something wrong... LOL

=================================================


Thank goodness it's not just me! The whole site seemed to be down for the best part of half an hour just now so I suspect the problems are in California.

I have similar messages on all my machines this morning, so it looks like either a server or connectivity issue somewhere.



Anyone who has upload problems, processing problems, etc. would be better served by reading the Number Crunching forum. Typically, widespread problems will get noticed very quickly by the inhabitants of that forum. They will also be able to quickly diagnose local problems that may only affect your rig/farm/set-up. For instance, there's an Upload problems?? thread that was updated about 2 hours ago (from this posting).

The threads in this forum are for the project managers to inform users of techinical issues related to project administration. Issues related to uploads, downloads, and processing are best discussed and resolved in Number Crunching.

Just my $1/50.
QS


ID: 712504 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 712580 - Posted: 14 Feb 2008, 20:10:27 UTC - in response to Message 712334.  


What does that mean?

I've noticed that on the 14th of February you had over 20 task that had client errors. Have you changed something on your pc? Once you start returning completed work unit's your daily quota will start to increase again.
______
Speedy
ID: 712580 · Report as offensive
Profile Kenn Benoît-Hutchins
Volunteer tester
Avatar

Send message
Joined: 24 Aug 99
Posts: 46
Credit: 18,091,320
RAC: 31
Canada
Message 712589 - Posted: 14 Feb 2008, 20:57:50 UTC - in response to Message 712580.  
Last modified: 14 Feb 2008, 21:00:48 UTC


What does that mean?

I've noticed that on the 14th of February you had over 20 task that had client errors. Have you changed something on your pc? Once you start returning completed work unit's your daily quota will start to increase again.
______
Speedy


I am now running properly. I uninstalled and reinstalled (after having tried reset unsuccessfully).

Everything seems to be operating properly now. I had an update for my operating system, and
that is the only thing that I can think of that may have fubarred the works. That update, though,
did not affect any other programmes.



Kenn
Kenn

What is left unsaid is neither heard, nor heeded.
Ce qui est laissé inexprimé ni n'est entendu, ni est observé.
ID: 712589 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 712619 - Posted: 14 Feb 2008, 22:16:40 UTC - in response to Message 712589.  


I am now running properly. I uninstalled and reinstalled (after having tried reset unsuccessfully).

Everything seems to be operating properly now. I had an update for my operating system, and
that is the only thing that I can think of that may have fubarred the works. That update, though,
did not affect any other programmes.


Kenn

I'm pleased you got everything working.
______
Speedy
ID: 712619 · Report as offensive
Profile Kenn Benoît-Hutchins
Volunteer tester
Avatar

Send message
Joined: 24 Aug 99
Posts: 46
Credit: 18,091,320
RAC: 31
Canada
Message 712768 - Posted: 15 Feb 2008, 4:41:05 UTC - in response to Message 712619.  
Last modified: 15 Feb 2008, 4:44:16 UTC


I'm pleased you got everything working.
______
Speedy


Message 712612

The above URL explains the probable cause of my problems.

Kenn
Kenn

What is left unsaid is neither heard, nor heeded.
Ce qui est laissé inexprimé ni n'est entendu, ni est observé.
ID: 712768 · Report as offensive

Message boards : Technical News : Calm a Llama Down (Feb 13 2008)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.