Problems...

Message boards : Number crunching : Problems...
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 13 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 975910 - Posted: 5 Mar 2010, 9:33:17 UTC - in response to Message 975830.  

What cooling systems have the server? Perhaps even more cooling systems use?


while I was gone, no one replied... There is an air conditioning system that feeds the server closet (yes, it is like a "Hall, Janitors Closet" that houses the servers).

If no air is flowing (cool or otherwise). Things go dead in a hurry.

Regards


One of the many overhead services, light the power to run them.
Supplied by Berkeley, out of 'their cut' of donations.

Not a small thing......I know what it costs to run AC 24/7.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 975910 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 975996 - Posted: 5 Mar 2010, 18:03:49 UTC

05/03/2010 20:59:48 SETI@home Reporting 3 completed tasks, requesting new tasks for GPU
05/03/2010 20:59:50 Project communication failed: attempting access to reference site
05/03/2010 20:59:52 Internet access OK - project servers may be temporarily down.
05/03/2010 20:59:53 SETI@home Scheduler request failed: Server returned nothing (no headers, no data)

Does SETI's inet connection still overloaded to lead to such server reply or something another wrong here?
ID: 975996 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 976001 - Posted: 5 Mar 2010, 18:23:03 UTC - in response to Message 975996.  

05/03/2010 20:59:48 SETI@home Reporting 3 completed tasks, requesting new tasks for GPU
05/03/2010 20:59:50 Project communication failed: attempting access to reference site
05/03/2010 20:59:52 Internet access OK - project servers may be temporarily down.
05/03/2010 20:59:53 SETI@home Scheduler request failed: Server returned nothing (no headers, no data)

Does SETI's inet connection still overloaded to lead to such server reply or something another wrong here?

This doesn't line up with traffic overload on the Cricket graphs. More likely problems accessing the database I would guess.

F.
ID: 976001 · Report as offensive
Profile Julie
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 28 Oct 09
Posts: 34060
Credit: 18,883,157
RAC: 18
Belgium
Message 976012 - Posted: 5 Mar 2010, 19:34:35 UTC

When I report WU's I get the same but when I hit the update button the second time it always works and this on all my computers!
rOZZ
Music
Pictures
ID: 976012 · Report as offensive
Profile dnolan
Avatar

Send message
Joined: 30 Aug 01
Posts: 1228
Credit: 47,779,411
RAC: 32
United States
Message 976013 - Posted: 5 Mar 2010, 19:35:18 UTC - in response to Message 976001.  

05/03/2010 20:59:48 SETI@home Reporting 3 completed tasks, requesting new tasks for GPU
05/03/2010 20:59:50 Project communication failed: attempting access to reference site
05/03/2010 20:59:52 Internet access OK - project servers may be temporarily down.
05/03/2010 20:59:53 SETI@home Scheduler request failed: Server returned nothing (no headers, no data)

Does SETI's inet connection still overloaded to lead to such server reply or something another wrong here?

This doesn't line up with traffic overload on the Cricket graphs. More likely problems accessing the database I would guess.

F.


Yeah, I think something else is going on. Last night I watched my Cuda machine request work and get 4 tasks ten times in a row, 4 tasks sent each time, each request about a minute after the last, then it got the above message about 4 times in a row, then got some more work and laid off the requests... The traffic at the time didn't look that heavy to me.

-Dave
ID: 976013 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 976036 - Posted: 5 Mar 2010, 20:31:29 UTC

anakin

The scheduling Server has an Apache Web Server that recieves your file "sched_request_setiathome.berkeley.edu.xml" Those vary in size from about 1K to several hundred K. Your Boinc Client fires up a connection to anakin "http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi" which in its config has a Max number of allowed connections.

If it transfering requests from several other computers "Max Connections" You are not going to get a connection.

You of course know what is the sched_request file:
It is your account information.
What Application and version that are on that particular machine.
A copy of the Preferences (and venues that you have configured).
The contents of what was in the "stderr.txt" for any completed workunits.
A list results in progress.
A list of workunits to be ran (including Deadlines and estimates of time to complete).

So when the Scheduler recieves this from you, It then gets to decide Are you reporting completed work? If Yes fire up information to the Validator (transitioners also on anakin).
Is Your Cache Full or do you need work (count up all those times and see how many seconds of work you require). If You need work, check the Feeder (anakin) for Results Ready to send and assign them to that computer. Reply with the "sched_reply_setiathome.berkeley.edu.xml" which defines where to go get your assigned Workunits. It can also tell your Boinc Client to remove outdated files or Workunits that have been canceled.

So it is a "busy" machine! It has a LOT to Decide.

Regards


Please consider a Donation to the Seti Project.

ID: 976036 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 976044 - Posted: 5 Mar 2010, 21:12:16 UTC

anakin

So it is a "busy" machine! It has a LOT to Decide.

Problem just is that it happened after the big outage, before there's wasn't any major problem.
ID: 976044 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 976102 - Posted: 6 Mar 2010, 1:58:33 UTC - in response to Message 976044.  

anakin

So it is a "busy" machine! It has a LOT to Decide.

Problem just is that it happened after the big outage, before there's wasn't any major problem.


That can i fully sign.
Something on 20 Feb 2010 around 18:28 UTC has definitely changed in the behaviour of the scheduler.
Is it possible to trace back the changes round this time (last changes before this time)?
(Meanwhile i have upgraded from 6.10.18 to 6.10.36 -> no change in this specific point)
- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 976102 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 976273 - Posted: 6 Mar 2010, 17:18:04 UTC

Seems, you have found it!? Last time i got the (no headers, no data) failure, was on 10:46 UTC, since then, no update-request had this msg.

Can others confirm?
- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 976273 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19321
Credit: 40,757,560
RAC: 67
United Kingdom
Message 976276 - Posted: 6 Mar 2010, 17:47:03 UTC - in response to Message 976273.  

Seems, you have found it!? Last time i got the (no headers, no data) failure, was on 10:46 UTC, since then, no update-request had this msg.

Can others confirm?

Still getting that "no headers, no data" msg at 17:33 UTC.
ID: 976276 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 976282 - Posted: 6 Mar 2010, 18:22:30 UTC - in response to Message 976273.  

Also updated to 6.10.36. Still getting the message now and then.


PROUD MEMBER OF Team Starfire World BOINC
ID: 976282 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 66218
Credit: 55,293,173
RAC: 49
United States
Message 976294 - Posted: 6 Mar 2010, 18:57:14 UTC - in response to Message 976282.  

Also updated to 6.10.36. Still getting the message now and then.

I also get the same message and I'm using 6.10.36 too.

3/6/2010 10:41:04 AM SETI@home Scheduler request failed: Server returned nothing (no headers, no data)

I pretty much don't much attention to It though.
Savoir-Faire is everywhere!
The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST

ID: 976294 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 976301 - Posted: 6 Mar 2010, 19:16:57 UTC

It often gets my attention since I often have to do manual updates, Boinc often wont do it by itself.

06/03/2010 20:10:34 SETI@home update requested by user
06/03/2010 20:10:35 SETI@home Sending scheduler request: Requested by user.
06/03/2010 20:10:35 SETI@home Reporting 5 completed tasks, requesting new tasks for CPU
06/03/2010 20:10:37 Project communication failed: attempting access to reference site
06/03/2010 20:10:38 Internet access OK - project servers may be temporarily down.
06/03/2010 20:10:40 SETI@home Scheduler request failed: Server returned nothing (no headers, no data)
06/03/2010 20:14:48 SETI@home update requested by user
06/03/2010 20:14:48 SETI@home Sending scheduler request: Requested by user.
06/03/2010 20:14:48 SETI@home Reporting 5 completed tasks, requesting new tasks for CPU
06/03/2010 20:14:49 Project communication failed: attempting access to reference site
06/03/2010 20:14:51 Internet access OK - project servers may be temporarily down.
06/03/2010 20:14:53 SETI@home Scheduler request failed: Server returned nothing (no headers, no data)
ID: 976301 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 976305 - Posted: 6 Mar 2010, 19:34:57 UTC - in response to Message 976294.  

I pretty much don't much attention to It though.


Same here. All working so far.
ID: 976305 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 976308 - Posted: 6 Mar 2010, 19:53:16 UTC - in response to Message 976301.  

I've found that if I just let it go it will go through when it wants to.


PROUD MEMBER OF Team Starfire World BOINC
ID: 976308 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 976320 - Posted: 6 Mar 2010, 20:54:21 UTC - in response to Message 976308.  
Last modified: 6 Mar 2010, 20:59:46 UTC

*Cut*
ID: 976320 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 976363 - Posted: 7 Mar 2010, 0:32:04 UTC
Last modified: 7 Mar 2010, 0:32:56 UTC

Still needed to ask twice...

07/03/2010 03:28:46 SETI@home update requested by user
07/03/2010 03:28:50 SETI@home Sending scheduler request: Requested by user.
07/03/2010 03:28:50 SETI@home Reporting 20 completed tasks, not requesting new tasks
07/03/2010 03:28:52 Project communication failed: attempting access to reference site
07/03/2010 03:28:54 Internet access OK - project servers may be temporarily down.
07/03/2010 03:28:55 SETI@home Scheduler request failed: Server returned nothing (no headers, no data)
07/03/2010 03:28:56 SETI@home update requested by user
07/03/2010 03:29:00 SETI@home Sending scheduler request: Requested by user.
07/03/2010 03:29:00 SETI@home Reporting 20 completed tasks, not requesting new tasks
07/03/2010 03:29:10 SETI@home Scheduler request completed
ID: 976363 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 976380 - Posted: 7 Mar 2010, 2:06:32 UTC

I have been in and out of my house and on the road for the last 6 plus days. Other than the other evening when I had the chance to install 6.10.36 on my various machines and spend a few hours "Alpha" testing. Boinc and Seti have been running in AUTO! I have not seen a large backlog of "results" ready to report. As I just sat down again my Seti Machines are happy... Other projects have a few issues that I need to look at.

That said, the message is a "GENERIC Message reported by the BOINC Client that there WAS an ISSUE contacting the Scheduler." It does Not report what actually happened during the process of contact/acknowledgement and handling of Your "sched_request_setiathome.berkeley.edu.xml"

A user with a 10 Cache has a "sched_request_setiathome.berkeley.edu.xml" that is probably somewhere close to or above a Half a Megabyte. That will tie up the scheduler for quite a FEW Seconds while the file is being UPLOADED (with everything else going on {Uploads/Downloads/Other Scheduler attempts}). For the record I have not asked how many active scheduler sockets (listeners) are available. I do suspect that it is fairly large to the extent that an actual connection "could time out," then you would also get the same message from the Boinc Client "SETI@home Scheduler request failed: Server returned nothing (no headers, no data)"

Once again, if "anakin" is processing even 10 "power users" with a 10 day cache (over several thousands workunits and results) and communicating with the transitioner and the feeder queue to send work. It is going to be working as fast as it can matching things. If it drops a connection not really a big deal. Users with older Boinc Clients are probably going to have the Libecurl issues where you "might" need to restart the Boinc client.

Lastly and I continue to read this, "if you push the button and it succeeds." Then I would suspect the Boinc Client more than I would the server. The only thing that happened is the User, pushed a Button on the Boinc client. The Server has not changed (other than handling requests it gets).

Yes, sometimes I think I really have been doing this Too Long.

Regards


Please consider a Donation to the Seti Project.

ID: 976380 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19321
Credit: 40,757,560
RAC: 67
United Kingdom
Message 976461 - Posted: 7 Mar 2010, 5:34:55 UTC - in response to Message 976380.  

Al,
I think it has been observered and reported because in recent weeks it has been a common occurance. Since the last outage until about 18:00 UTC yesterday it was the response to at least 60% of my attempts to contact Berkeley on the two computers I control.
At times, for periods of several hours, usually when the USA is awake, it was the only response. Looking through my logs the most successful time for me to connect was between 05:00 UTC ( midnight NY) until 10:00 UTC.

Andy
ID: 976461 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 976550 - Posted: 7 Mar 2010, 17:54:18 UTC - in response to Message 976461.  

Thank You Andy

After looking at my logs I see a couple connection attempts that failed but no reason as to why (2:00am PST on the 6th of Mar). With the current "shorty storm" this machine has been asking for work about every ten minutes for the last 18+ hours (successfully). My current "sched_request_setiathome.berkeley.edu.xml" is at 170K at the moment (492 WU's or 5000 lines in the file) .

As it will make my logs a bit chatty, I have enabled
<http_debug>1</http_debug>
in my cc_config.xml file. That will give the best indication of what might be happening. If someonce can cut and paste the failed scheduler conversation that will help.

Regards



Please consider a Donation to the Seti Project.

ID: 976550 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 13 · Next

Message boards : Number crunching : Problems...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.