Something wrong somwhere?


log in

Advanced search

Message boards : Number crunching : Something wrong somwhere?

Previous · 1 · 2 · 3
Author Message
dahls
Send message
Joined: 24 Oct 04
Posts: 122
Credit: 25,086,466
RAC: 22,923
Norway
Message 994287 - Posted: 5 May 2010, 21:48:24 UTC - in response to Message 994254.

Jørn, you can visit scheduler URL http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi from machine under question?

Normal answer looks like:

<scheduler_reply> <scheduler_version>611</scheduler_version> <master_url>http://setiathome.berkeley.edu/</master_url> <request_delay>11.000000</request_delay> <message priority="low">Error in request message: no start tag </message> <project_name>SETI@home</project_name> </scheduler_reply>



It does look exactly like this.


In my opinion it seems like there is something wrong at the server side.

To determine this, look into server answer after the uploading attempt (file "sched_reply_setiathome.berkeley.edu.xml")
Normal acknowledgement for completed task looks like:

... <result_ack> <name>11dc06ag.26523.16023.4.10.112_0</name> </result_ack> ...

Excuse me please, I never saw rejecting server answer, can't tell which it looks like.


No, it does say that a 500 Internal Server Error has occurred:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator,
boincadm@ssl.berkeley.edu and inform them of the time the error occurred,
and anything you might have done that may have
caused the error.</p>
<p>More information about this error may be available
in the server error log.</p>
<hr>
<address>Apache/2.2.14 (Fedora) Server at setiboinc.ssl.berkeley.edu Port 80</address>
</body></html>



Should I delete everything on the triple core machine and then reinstall BOINC again?

IMHO, in case of such big uploading troubles, which makes crunching at this machine useless, this action seems normal from my point of view...


I tried but got an error that a library was missing (see other message posted eariler today).

And since there IS an error somewhere I will delete everything and start all over again.
____________

dahls
Send message
Joined: 24 Oct 04
Posts: 122
Credit: 25,086,466
RAC: 22,923
Norway
Message 994289 - Posted: 5 May 2010, 21:57:46 UTC - in response to Message 994287.

The old directory has now been removed (renamed to something else just in case someone want to examine the data) and a boinc_6.10.17_x86_64-pc-linux-gnu.sh was installed. A few minutes later it has downloaded several work sets and everything seem to be working fine.

So my conclusion is that somewhere there has been some kind of error that has occured.

Sorry about all those work sets that was lost... :(
____________

dahls
Send message
Joined: 24 Oct 04
Posts: 122
Credit: 25,086,466
RAC: 22,923
Norway
Message 994377 - Posted: 6 May 2010, 9:05:48 UTC - in response to Message 992227.

After nearly 12 hours running a new installation of BOINC on the triple-core machine that was not able to report completed tasks, it seems like everything is working:


06-May-2010 08:45:08 [SETI@home] Scheduler request completed: got 1 new tasks
06-May-2010 08:45:10 [SETI@home] Started download of 10fe07ag.7614.2118.4.10.51
06-May-2010 08:45:15 [SETI@home] Finished download of 10fe07ag.7614.2118.4.10.51
06-May-2010 08:48:21 [SETI@home] Computation for task 12dc06ag.10935.6616.13.10.209_1 finished
06-May-2010 08:48:21 [SETI@home] Starting 28no06ag.17662.2117.12.10.90_0
06-May-2010 08:48:21 [SETI@home] Starting task 28no06ag.17662.2117.12.10.90_0 using setiathome_enhanced version 528
06-May-2010 08:48:23 [SETI@home] Started upload of 12dc06ag.10935.6616.13.10.209_1_0
06-May-2010 08:48:30 [SETI@home] Finished upload of 12dc06ag.10935.6616.13.10.209_1_0
06-May-2010 08:58:18 [SETI@home] Computation for task 12dc06ag.10935.6616.13.10.194_1 finished
06-May-2010 08:58:18 [SETI@home] Starting 28no06ag.17662.2117.12.10.84_0
06-May-2010 08:58:18 [SETI@home] Starting task 28no06ag.17662.2117.12.10.84_0 using setiathome_enhanced version 528
06-May-2010 08:58:20 [SETI@home] Started upload of 12dc06ag.10935.6616.13.10.194_1_0
06-May-2010 08:58:20 [SETI@home] Sending scheduler request: To fetch work.
06-May-2010 08:58:20 [SETI@home] Reporting 1 completed tasks, requesting new tasks
06-May-2010 08:58:25 [SETI@home] Finished upload of 12dc06ag.10935.6616.13.10.194_1_0
06-May-2010 08:58:25 [SETI@home] Scheduler request completed: got 1 new tasks
06-May-2010 08:58:27 [SETI@home] Started download of 29no06ag.17537.1301.5.10.10
06-May-2010 08:58:32 [SETI@home] Finished download of 29no06ag.17537.1301.5.10.10
06-May-2010 08:58:40 [SETI@home] Sending scheduler request: To fetch work.
06-May-2010 08:58:40 [SETI@home] Reporting 1 completed tasks, requesting new tasks
06-May-2010 08:58:45 [SETI@home] Scheduler request completed: got 1 new tasks
06-May-2010 08:58:47 [SETI@home] Started download of 29no06ag.17537.1301.5.10.143
06-May-2010 08:58:52 [SETI@home] Finished download of 29no06ag.17537.1301.5.10.143
06-May-2010 08:58:56 [SETI@home] Sending scheduler request: To fetch work.
06-May-2010 08:58:56 [SETI@home] Requesting new tasks
06-May-2010 08:59:01 [SETI@home] Scheduler request completed: got 0 new tasks
06-May-2010 08:59:01 [SETI@home] Message from server: (Project has no jobs available)
06-May-2010 08:59:17 [SETI@home] Sending scheduler request: To fetch work.
06-May-2010 08:59:17 [SETI@home] Requesting new tasks
06-May-2010 08:59:22 [SETI@home] Scheduler request completed: got 0 new tasks
06-May-2010 08:59:22 [SETI@home] Message from server: (Project has no jobs available)
06-May-2010 09:00:37 [SETI@home] Sending scheduler request: To fetch work.
06-May-2010 09:00:37 [SETI@home] Requesting new tasks
06-May-2010 09:00:42 [SETI@home] Scheduler request completed: got 1 new tasks
06-May-2010 09:00:44 [SETI@home] Started download of 30no06ad.11437.25021.7.10.26
06-May-2010 09:00:49 [SETI@home] Finished download of 30no06ad.11437.25021.7.10.26


Just one question - I still got the old directory with the data that the server refused to accept. Is this something that could be of interest for someone who want to find out what caused this?

If not, I'm going to delete it.
____________

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,433,274
RAC: 0
Korea, North
Message 994433 - Posted: 6 May 2010, 16:03:54 UTC - in response to Message 994377.

I dont think you'll be able to use the old data file anymore. I've got a Mandriva box that forgets about the old files when I install a new version of BOINC over the old. Maybe one of the Linux Guru's has a solution
____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

dahls
Send message
Joined: 24 Oct 04
Posts: 122
Credit: 25,086,466
RAC: 22,923
Norway
Message 998944 - Posted: 26 May 2010, 16:24:30 UTC - in response to Message 992227.

Another machine has failed to upload completed works, http://setiathome.berkeley.edu/show_host_detail.php?hostid=5306094

I have tried to do a '/boinccmd --project http://setiathome.berkeley.edu/ update', but nohting happens. It just won't communicate.

This is a dual core AMD running Fedora Core 12.

Guess I have to do what I did with the triple core - delete BOINC and reinstall and reattach to the project again :(
____________

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 356,897
RAC: 19
Germany
Message 998953 - Posted: 26 May 2010, 16:59:03 UTC - in response to Message 998944.

The appropriate boinccmd command would have been
--file_transfer URL filename retry

What about (error) messages in the stdout file?

Gruß,
Gundolf
____________
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours

dahls
Send message
Joined: 24 Oct 04
Posts: 122
Credit: 25,086,466
RAC: 22,923
Norway
Message 998985 - Posted: 26 May 2010, 20:11:08 UTC - in response to Message 998953.
Last modified: 26 May 2010, 20:16:27 UTC

Your command or the one I tried give the same message:

26-May-2010 21:53:00 [SETI@home] Sending scheduler request: To report completed tasks.
26-May-2010 21:53:00 [SETI@home] Reporting 66 completed tasks, requesting new tasks
26-May-2010 21:58:11 [---] Project communication failed: attempting access to reference site
26-May-2010 21:58:11 [SETI@home] Scheduler request failed: Timeout was reached
26-May-2010 21:58:14 [---] Internet access OK - project servers may be temporarily down.
26-May-2010 21:59:12 [SETI@home] Sending scheduler request: To report completed tasks.
26-May-2010 21:59:12 [SETI@home] Reporting 66 completed tasks, requesting new tasks
26-May-2010 22:04:22 [---] Project communication failed: attempting access to reference site
26-May-2010 22:04:22 [SETI@home] Scheduler request failed: Timeout was reached
26-May-2010 22:04:25 [---] Internet access OK - project servers may be temporarily down.
26-May-2010 22:05:23 [SETI@home] Sending scheduler request: To report completed tasks.
26-May-2010 22:05:23 [SETI@home] Reporting 66 completed tasks, requesting new tasks


However, seti database say that this machine hasn't been in contact with the server since May 22nd. 7 other machines are working find and have no problem with up- and downloads while this does not seem to work.

This is more or less the same behaviour as the other machine had when it failed uploading the results. First it get the messages as shown above. But it never get in contact with the server. When I deleted the BOINC directory and reinstalled BOINC it began to download works and it has worked OK since then.

There MUST be something wrong somewhere in the communication between the client and the seti server!

All traffic is going through a firewall. When looking at the log files it seems like this "problem" host and the other is talking with the same hosts. So if there is a problem, it must be at the server side.

The list of http://setiathome.berkeley.edu/hosts_user.php?userid=7992989 show that all communicate well except the one that does not currently work.
____________

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 356,897
RAC: 19
Germany
Message 998989 - Posted: 26 May 2010, 20:48:58 UTC - in response to Message 998985.

26-May-2010 21:58:11 [---] Project communication failed: attempting access to reference site
26-May-2010 21:58:11 [SETI@home] Scheduler request failed: Timeout was reached

However, seti database say that this machine hasn't been in contact with the server since May 22nd.

And it's obviously right in say so ;-) since all connection attempts time out.

All traffic is going through a firewall. When looking at the log files it seems like this "problem" host and the other is talking with the same hosts. So if there is a problem, it must be at the server side.

That's a strange conclusion. I would think it can't be the server side, because all other connections (not only yours) succeed.

You should turn on some logging flags, using a cc_config.xml file, namely <file_xfer_debug>, <http_debug> and <http_xfer_debug>.

Gruß,
Gundolf

dahls
Send message
Joined: 24 Oct 04
Posts: 122
Credit: 25,086,466
RAC: 22,923
Norway
Message 998999 - Posted: 26 May 2010, 22:43:44 UTC - in response to Message 998989.

How do you explain that several other machines, all which hide behind the same official IP, are communicating OK with the server while this one is not?

The firewall (a checpoint FireWall 1) is showing that all are communicating with the same IP.

Somehthing IS wrong with how SETI clients communicate with the server.

I'll let the client try until tomorrow. If it has not been able to upload the work sets, I'll do the same as with the other machine - stop BOINC, rename the directory and reinstall. I'm quite sure it will start working again.
____________

dahls
Send message
Joined: 24 Oct 04
Posts: 122
Credit: 25,086,466
RAC: 22,923
Norway
Message 999095 - Posted: 27 May 2010, 9:43:20 UTC - in response to Message 998999.

Today I reainstalled BOINC and everything is now working again.

Strange that nobody who are working with the code are interested in finding out why the communication between a client and the server seem to stall.

I got the copy of the old directory with files, in case someone do care. :)
____________

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12259
Credit: 2,554,771
RAC: 811
Netherlands
Message 999096 - Posted: 27 May 2010, 9:47:08 UTC - in response to Message 999095.

Strange that nobody who are working with the code are interested in finding out why the communication between a client and the server seem to stall.

Your current logs don't help as you didn't run with the debug flags that Gundolf posted. Those flags give in-depth information on what's happening.

But even then, computers are finicky. Programs that may have worked for ages can one day stop doing so, or work differently than before. One bit read wrong and you're in trouble. That has nothing to do with the code.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

dahls
Send message
Joined: 24 Oct 04
Posts: 122
Credit: 25,086,466
RAC: 22,923
Norway
Message 999113 - Posted: 27 May 2010, 12:22:25 UTC - in response to Message 999096.

I'll keep this in mind the next time it happens.

It may be a coincidence but after running BOINC for years, this problem occur two times within a few months on two different machine both running Fedora Core 12.
____________

Previous · 1 · 2 · 3

Message boards : Number crunching : Something wrong somwhere?

Copyright © 2014 University of California