Connection Error

Questions and Answers : Windows : Connection Error
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile CElliott
Volunteer tester

Send message
Joined: 19 Jul 99
Posts: 178
Credit: 79,285,961
RAC: 0
United States
Message 1182383 - Posted: 2 Jan 2012, 12:25:50 UTC

Two three times a week, the Boinc tray icon displays a red dot next to the Boinc logo and a balloon with text something like "Connection error, incorrect password." When I click on the Boinc icon to bring up BoincMgr, and click on select computer, all I have to do is enter the computer name and the correct password appears also. Then I click OK and Boinc proceeds normally. But many hours of processing may be lost. This happens on all my computers, presently three. How can I stop this conection error from happening?
ID: 1182383 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1182388 - Posted: 2 Jan 2012, 13:06:36 UTC - in response to Message 1182383.  

But many hours of processing may be lost.

No, with the symptoms you describe, nothing is lost. Only the BOINC manager (the GUI) is disconnected from the client, which controls the project applications. So, you only can't see that it's still working (besides with windows task manager).

This happens on all my computers, presently three.

The account you posted from has no active computers attached. The last connection any of your computers did on this account was on 5 Oct 2011 | 16:48:21 UTC.

Did you accidentally create a new account when adding (attaching) the newest hosts?

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours
ID: 1182388 · Report as offensive
Profile CElliott
Volunteer tester

Send message
Joined: 19 Jul 99
Posts: 178
Credit: 79,285,961
RAC: 0
United States
Message 1182597 - Posted: 3 Jan 2012, 14:23:06 UTC - in response to Message 1182388.  

But many hours of processing may be lost.

No, with the symptoms you describe, nothing is lost. Only the BOINC manager (the GUI) is disconnected from the client, which controls the project applications. So, you only can't see that it's still working (besides with windows task manager).

This is true for a while, but then boinc.exe exits (for lack of a heartbeat?), and the system stops processing WUs. For example, that is what happened here:

01-Jan-2012 05:18:16 [SETI@home Beta Test] Starting task 12jl11aa.26172.24610.15.14.175_0 using setiathome_v7 version 697
01-Jan-2012 05:33:04 [---] Starting BOINC client version 6.10.60 for windows_intelx86


The system restarted at 5:33:00 because I have a program that checks every system on the network every 5 minutes and restarts any remote system it cannot connect to (after three tries) with a Boinc RPC.

The problem appears on my main computer, which I use for word processing during the day. I cannot afford to restart this machine when Boinc quits at night because then the backup program would exit in mid-backup, which screws up everything. I am thinking of writing a program that tries an RPC to Boinc every so often and if the RPC fails, finds the PID's for Boinc.exe and BoincMgr.exe, if they exist, terminates them, and then restarts BoincMgr.

This happens on all my computers, presently three.

The account you posted from has no active computers attached. The last connection any of your computers did on this account was on 5 Oct 2011 | 16:48:21 UTC.

I actually am processing WUs on the Beta system at present. You may be able to see the account here http://setiweb.ssl.berkeley.edu/beta/show_user.php?userid=735. I asked the question on this forum because the answers are often quicker.

Did you accidentally create a new account when adding (attaching) the newest hosts?

Gruß,
Gundolf


ID: 1182597 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1182697 - Posted: 4 Jan 2012, 3:25:18 UTC - in response to Message 1182597.  

This is true for a while, but then boinc.exe exits (for lack of a heartbeat?), and the system stops processing WUs. For example, that is what happened here:

01-Jan-2012 05:18:16 [SETI@home Beta Test] Starting task 12jl11aa.26172.24610.15.14.175_0 using setiathome_v7 version 697
01-Jan-2012 05:33:04 [---] Starting BOINC client version 6.10.60 for windows_intelx86


The system restarted at 5:33:00 because I have a program that checks every system on the network every 5 minutes and restarts any remote system it cannot connect to (after three tries) with a Boinc RPC.


If boinc.exe exits you have to see some lines that say Exit/Shutdown before "Starting BOINC client":

08-Dec-2011 04:24:53 [SETI@home] Computation for task 22oc11ac.9371.53591.11.10.90_1 finished
08-Dec-2011 04:24:53 [SETI@home] Starting 24oc11ad.27923.20522.15.10.215_1
08-Dec-2011 04:24:53 [SETI@home] Starting task 24oc11ad.27923.20522.15.10.215_1 using setiathome_enhanced version 528
08-Dec-2011 04:24:56 [SETI@home] Started upload of 22oc11ac.9371.53591.11.10.90_1_0
08-Dec-2011 04:25:09 [SETI@home] Finished upload of 22oc11ac.9371.53591.11.10.90_1_0
09-Dec-2011 04:25:13 [SETI@home] Sending scheduler request: To report completed tasks.
09-Dec-2011 04:25:13 [SETI@home] Reporting 1 completed tasks, not requesting new tasks
09-Dec-2011 04:25:30 [SETI@home] Scheduler request completed
[12/09/11 13:19:30] TRACE [-1323943]: ***** Win9x Monitor System Shutdown/Logoff Event Detected *****

09-Dec-2011 13:19:30 [---] Exit requested by user
09-Dec-2011 14:19:17 [---] Starting BOINC client version 6.6.38 for windows_intelx86
09-Dec-2011 14:19:17 [---] log flags: task, file_xfer, sched_ops, benchmark_debug


Is it possible that your entire remote system was hung (the OS unresponsive, nothing to do with BOINC directly)?


 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1182697 · Report as offensive
Profile CElliott
Volunteer tester

Send message
Joined: 19 Jul 99
Posts: 178
Credit: 79,285,961
RAC: 0
United States
Message 1182908 - Posted: 5 Jan 2012, 2:40:34 UTC - in response to Message 1182697.  

There is no "Exit." Boinc.exe exits, BoincMgr cannot connect to it, so the latter posts a tray ballon saying connection error, incorrect password, or similar. But the password is already there and correct, which can be demonstrated by going into BoincMgr.exe and clicking on Advanced/Select Computer and entering the computer name; the password appears as if by magic.

I think the problem might be a timing issue caused by the following: The Beta site was not accepting requests or downloading WUs over the entire weekend of the New Year and had been like that since the Tuesday outage. When it suddenly came on about noon EST on 01/02/12, I returned about 3,000 WUs in a few minutes. That implies that the client_state.xml files were huge because of all those voluminous <stderr_txt> sections. Boinc.exe had a problem reading or writing that long file and quit. In fact, I might have caused the problem: A program of mine looks for a change in size of the job_log_XYZ.txt file; when it sees that it reads the client_state.xml file over the network to see what happened, and puts started and finished results in a database. Normally, that is not a problem, but with a very long (> 50 MB) client_state.xml file, my program might just be locking it out long enough for Boinc.exe to have trouble writing to it, so Boinc.exe quits. I write this, not because confession is good for the soul, but because I only see this problem occasionally, but then it happens several times a week. Perhaps it only happens when I have a large number of WUs in inventory and there is an outage.

It would help if Boinc.exe and BoincMgr.exe wrote messages to the System Event Log when they encounter a problem, or a significant event. This is what NTPD.exe does, and it too runs on both Windows and Linux. NTP.ORG has developed their own API for this (complete with levels of importance so one can specify the logging detail), and then channel the message to the correct logging api depending on what O/S they are running on.
ID: 1182908 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1182941 - Posted: 5 Jan 2012, 8:24:11 UTC - in response to Message 1182908.  

There is no "Exit." Boinc.exe exits

That's a contradiction! ;-)

When it suddenly came on about noon EST on 01/02/12, I returned about 3,000 WUs in a few minutes. That implies that the client_state.xml files were huge because of all those voluminous <stderr_txt> sections.

In those cases, the <max_tasks_reported> option in cc_config.xml might help.

The number of tasks might also be the cause for the manager's disconnecting from the client. Do you have "Show all tasks" selected in the Tasks tab?

It would help if Boinc.exe and BoincMgr.exe wrote messages to the System Event Log when they encounter a problem, or a significant event.

They don't write to the system event log but to the BOINC event log (formerly known as Messages tab), which is kept in the file stdoutdae.txt in your BOINC data directory.

Gruß,
Gundolf
ID: 1182941 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1182953 - Posted: 5 Jan 2012, 11:38:30 UTC - in response to Message 1182908.  
Last modified: 5 Jan 2012, 11:39:13 UTC

It would help if Boinc.exe and BoincMgr.exe wrote messages to the System Event Log when they encounter a problem, or a significant event.

Boinc.exe when it crashes will attempt to write its debug material to stderrdae.txt
Boincmgr.exe when it has problems, will write reports of that to stderrgui.txt

These files can be found in your BOINC Data directory, where client_state.xml lives. On most Windows versions this directory is hidden. You can check in your BOINC start-up messages where your data directory lives. But normally, the default places for it can be found in this FAQ.

Only when the whole shebang really crashes, will something be written into Windows Event Viewer->Applications

Boinc.exe exiting but leaving Boincmgr.exe should be found in the stderrdae.txt file.
ID: 1182953 · Report as offensive
Profile CElliott
Volunteer tester

Send message
Joined: 19 Jul 99
Posts: 178
Credit: 79,285,961
RAC: 0
United States
Message 1183050 - Posted: 5 Jan 2012, 22:10:33 UTC - in response to Message 1182953.  

I found stderrdae.txt; it is fairly large. In every case, it was trying to write the state file when it failed.
ID: 1183050 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1183052 - Posted: 5 Jan 2012, 22:19:05 UTC - in response to Message 1183050.  

Post the appropriate trace lines. But probably best not here, as that is too much info for this thread. Use an external site, such as http://pastebin.com/ and leave the link to your crash material.
ID: 1183052 · Report as offensive
Profile CElliott
Volunteer tester

Send message
Joined: 19 Jul 99
Posts: 178
Credit: 79,285,961
RAC: 0
United States
Message 1183190 - Posted: 6 Jan 2012, 13:26:16 UTC - in response to Message 1183052.  
Last modified: 6 Jan 2012, 13:28:27 UTC

Thank you for replying. It crashed again last night. The Boinc tray icon was displaying connecion error when I came down to work this morning. Here is when my Java program read the client_state.xml file on computer "Nat:"

1/6/2012 3:09:42 AM Start processing Nat
1/6/2012 3:09:44 AM Start processing Phillip

The dump begins at "Dump Timestamp : 01/06/12 03:10:15." That does not prove that my program that reads the state file is interferring with Boinc.exe, but they are too close together not to be gravely suspicious. I put the entire crash dump on http://www.pastebin.com/. My user name is CELLiott and the Paste Bin is named BoincCrashDump. I can access it; please tell me if you cannot.

BTW, the client_state.xml file is only 3,666,837 bytes long now. The file length may have nothing to do with the error.
ID: 1183190 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1183495 - Posted: 7 Jan 2012, 15:29:45 UTC - in response to Message 1183190.  

I just noticed, you're on 6.10.58; any reason why you aren't using a 6.12?
I will forward the log to the developers, but I doubt that they will do anything with it, unless you manage to have the same problem on 6.12.34 or later.
ID: 1183495 · Report as offensive
Profile CElliott
Volunteer tester

Send message
Joined: 19 Jul 99
Posts: 178
Credit: 79,285,961
RAC: 0
United States
Message 1183852 - Posted: 8 Jan 2012, 20:08:24 UTC - in response to Message 1183495.  

I don't use 6.12 because I hate the extra steps one has to go thru to see the messages. I don't see any reason for the change, unless it is impossible to have more than 6 tabs across the top.

On the one machine I have that runs 6.12, there is nothing in stderrdae.txt since 2007, so perhaps the problem is fixed in 6.12.
ID: 1183852 · Report as offensive

Questions and Answers : Windows : Connection Error


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.