No schedulars responded.

Message boards : Number crunching : No schedulars responded.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Keith Kennedy

Send message
Joined: 28 May 99
Posts: 149
Credit: 244,165
RAC: 0
United States
Message 126595 - Posted: 23 Jun 2005, 1:51:47 UTC

I agree that it's most likely the workload of upgrading everybody to 4.18 that is backlogging the server. I'm just starting to get both downloads and uploads now, after several hours of nothing getting through either way.
ID: 126595 · Report as offensive
Profile ralic
Volunteer tester

Send message
Joined: 6 Jan 00
Posts: 308
Credit: 274,230
RAC: 0
Message 126716 - Posted: 23 Jun 2005, 8:31:13 UTC - in response to Message 126401.  

22/06/2005 10:49:00 PM|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed with a return value of 500


An HTTP 500 error indicates an internal server error. I usually see these errors when the server is overloaded. (The 4.18 release the likely culprit at this point in time)

BTW: Which version of the client are you using that responds with this message? I've wanted this level of info for some time... Saves me having to capture network packets to see the scheduling server error code.

TIA
ID: 126716 · Report as offensive
Profile The Gas Giant
Volunteer tester
Avatar

Send message
Joined: 22 Nov 01
Posts: 1904
Credit: 2,646,654
RAC: 0
Australia
Message 126737 - Posted: 23 Jun 2005, 10:24:46 UTC - in response to Message 126716.  

22/06/2005 10:49:00 PM|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed with a return value of 500


An HTTP 500 error indicates an internal server error. I usually see these errors when the server is overloaded. (The 4.18 release the likely culprit at this point in time)

BTW: Which version of the client are you using that responds with this message? I've wanted this level of info for some time... Saves me having to capture network packets to see the scheduling server error code.

TIA


Optimised BOINC 4.45 (boinc-445-sse2) with optimised seti app.

I have tried the standard 4.45 as well I didn't get the return value of 500 with that just the standard failed message, so I have gone back to the optimised boinc and am still getting the standard failed message. Other projects are not affected.

I'm beginning to suspect that it's my ISP not resolving the url http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi correctly ...is there a way to confirm this?

Thanks for all your replies folks, I'm sure I'll get there in the end.

Live long and crunch.

Paul.
ID: 126737 · Report as offensive
Profile The Gas Giant
Volunteer tester
Avatar

Send message
Joined: 22 Nov 01
Posts: 1904
Credit: 2,646,654
RAC: 0
Australia
Message 126745 - Posted: 23 Jun 2005, 10:40:53 UTC

I just used a proxy and uploaded no problem. Hmmmm....

ID: 126745 · Report as offensive
Profile ralic
Volunteer tester

Send message
Joined: 6 Jan 00
Posts: 308
Credit: 274,230
RAC: 0
Message 126760 - Posted: 23 Jun 2005, 11:15:56 UTC - in response to Message 126737.  

I'm beginning to suspect that it's my ISP not resolving the url http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi correctly ...is there a way to confirm this?


Well, this is a tough one.
With the Net being as dynamic as it is, it's difficult to troubleshoot, simply because a DNS resolution obtained at one point in time, could well resolve to something else a millisecond later.

Unless it's cached, but then the cache could be updated between resolutions. Also, the cached answer could be on your system, it could also be at your ISP, or also at the secondary DNS configured by you and/or your ISP, or at a secondary that your ISP uses, or etc....

On a win box, you could try using 'nslookup' to walk the DNS tree and see if you get a different IP resolution for setiboinc.ssl.berkeley.edu

Of course, the DNS tree branches and also can do so dynamically, meaning the result you get now may not be the result that you get 5 minutes from now.

Querying the primary DNS for ssl, setiboinc.ssl.berkeley.edu resolves to 66.28.250.124 thus all DNS queries should resolve to that same address. If you find one that doesn't, then it could be the cause of your problem.

Of course, I'm no Net DNS expert, so I write this under correction since I could also be wrong, but it's the way that I understand it.

Personally, I still suspect that the server is overloaded and you just got lucky when the proxy hit it. ;-)
ID: 126760 · Report as offensive
Profile The Gas Giant
Volunteer tester
Avatar

Send message
Joined: 22 Nov 01
Posts: 1904
Credit: 2,646,654
RAC: 0
Australia
Message 127013 - Posted: 23 Jun 2005, 22:09:12 UTC

Well no problem for 10 minutes...back to No Scheduler Responded....

Live long and crunch.

Paul
(S@H1 8888)
And proud of it!
ID: 127013 · Report as offensive
Profile ralic
Volunteer tester

Send message
Joined: 6 Jan 00
Posts: 308
Credit: 274,230
RAC: 0
Message 127261 - Posted: 24 Jun 2005, 9:36:34 UTC - in response to Message 127013.  
Last modified: 24 Jun 2005, 9:45:02 UTC

Well no problem for 10 minutes...back to No Scheduler Responded....

It should be getting better, but only time will tell.
Below is a snippet from one of my n/w logs for 3 scheduler requests this morning. It shows that the server hostname, server ip and url are identical, however the server response (bolded) varies, 7 seconds later.
200 (OK)
500 (Internal Server Error)

2005-06-24 10:57:34 Hostname: [setiboinc.ssl.berkeley.edu] IP: [66.28.250.124:80] URL: [/sah_cgi/cgi] ServerResponse: [200]
2005-06-24 10:57:41 Hostname: [setiboinc.ssl.berkeley.edu] IP: [66.28.250.124:80] URL: [/sah_cgi/cgi] ServerResponse: [500]
2005-06-24 10:58:56 Hostname: [setiboinc.ssl.berkeley.edu] IP: [66.28.250.124:80] URL: [/sah_cgi/cgi] ServerResponse: [200]

The 500 value corresponds to a "no schedulers responded" in the BOINC message log.
ID: 127261 · Report as offensive
Profile The Gas Giant
Volunteer tester
Avatar

Send message
Joined: 22 Nov 01
Posts: 1904
Credit: 2,646,654
RAC: 0
Australia
Message 127271 - Posted: 24 Jun 2005, 10:39:41 UTC

Well another day another proxy. This time from here -> http://www.aliveproxy.com/purchase/ and clicking on the free subscription, going to the forums and finding the latest free list (I hate to imagine what this listing is really used for!). It works right now, I just wonder how long the latest proxy setting will work for.

Live long and crunch.

Paul
(S@H1 8888)
And proud of it!
ID: 127271 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 127303 - Posted: 24 Jun 2005, 12:36:45 UTC - in response to Message 127271.  

Well another day another proxy. This time from here -> http://www.aliveproxy.com/purchase/ and clicking on the free subscription, going to the forums and finding the latest free list (I hate to imagine what this listing is really used for!). It works right now, I just wonder how long the latest proxy setting will work for.

Live long and crunch.

Paul, what changed from when it was working until it stopped? Does your IP or DNS change? Have you monitored for that? I have a wireless network at home and one puter is on the ragged edge of loosing signals and have had problems from it leaving and entering the network.

tony
ID: 127303 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 127346 - Posted: 24 Jun 2005, 14:53:54 UTC

If you are getting the 500 code, Rom is interested in the server request and reply XML files... I would be interested in them too (along with the logs) ... I mean, this is the type of problem I would like to document even if we are stumped for the moment ....
ID: 127346 · Report as offensive
Profile The Gas Giant
Volunteer tester
Avatar

Send message
Joined: 22 Nov 01
Posts: 1904
Credit: 2,646,654
RAC: 0
Australia
Message 127502 - Posted: 24 Jun 2005, 23:08:10 UTC

I'm not on wireless. I am on a single port ADSL USB modem.

I've not seen the 500 message again. If I see it again I will forward the particular files to both Paul D Buck and Rom Walton.

I updated my work laptop from home with no problems what-so-ever.

With the current proxy I am only seeing intermittent No Scheduler Responded messages.

BOINC is now connecting to SETI satisfactorily. I have never had a problem connecting to Predictor.

Interesting problem all round. I will try connecting without a proxy later and see how it goes.....busy busy busy.

Paul.
ID: 127502 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 127504 - Posted: 24 Jun 2005, 23:16:50 UTC

You can check for the error in the stderrdae.txt file in your Boinc main directory, Paul. Just open it with Notepad and copy the selected parts out.

You can then check the stdoutdae.txt at the same timestamps for more messages.
ID: 127504 · Report as offensive
Profile Kajunfisher
Volunteer tester
Avatar

Send message
Joined: 29 Mar 05
Posts: 1407
Credit: 126,476
RAC: 0
United States
Message 127525 - Posted: 25 Jun 2005, 0:40:32 UTC

Rom had a post in the "Unhandled Exceptions for 4.45" thread requesting that they install "Symbols":

Using the information from this page:
http://boinc.berkeley.edu/debug_win.php

This might help solve part of the problem, or at least identify where the problem is.
ID: 127525 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 127712 - Posted: 25 Jun 2005, 5:18:42 UTC - in response to Message 127504.  

You can check for the error in the stderrdae.txt file in your Boinc main directory, Paul. Just open it with Notepad and copy the selected parts out.

You can then check the stdoutdae.txt at the same timestamps for more messages.


*I* would like to look at the logs for other reasons. But for Rom's purposes, and mine as an interested by-stander, he needs the scheduler request/repl messages so he can try to see what is happening at the message level ...
ID: 127712 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 128143 - Posted: 26 Jun 2005, 13:35:58 UTC

Hi,

i gave CC4.46 a try in hope of getting behind this problem. Here are the first messages i got in boincmgr when starting CC4.46:

26/06/2005 14:52:08||Starting BOINC client version 4.46 for windows_intelx86
26/06/2005 14:52:08||Data directory: C:\Programme\BOINC
26/06/2005 14:52:08||Invalid account file: account_setup
26/06/2005 14:52:08|SETI@home|Found app_info.xml; using anonymous platform
26/06/2005 14:52:08||Version Change Detected (4.45 -> 4.46); running CPU benchmarks
26/06/2005 14:52:08|SETI@home|Computer ID: 997552; location: home; project prefs: default
26/06/2005 14:52:08||General prefs: from SETI@home (last modified 2005-06-19 16:59:44)
26/06/2005 14:52:08||General prefs: no separate prefs for home; using your defaults
26/06/2005 14:52:20||Remote control allowed
26/06/2005 14:52:22||Running CPU benchmarks
26/06/2005 14:53:20||Benchmark results:
26/06/2005 14:53:20|| Number of CPUs: 1
26/06/2005 14:53:20|| 1262 double precision MIPS (Whetstone) per CPU
26/06/2005 14:53:20|| 2268 integer MIPS (Dhrystone) per CPU
26/06/2005 14:53:20||Finished CPU benchmarks
26/06/2005 14:53:20||Resuming computation and network activity
26/06/2005 14:53:20||request_reschedule_cpus: Resuming activities
26/06/2005 14:53:20|SETI@home|Fetching master file
26/06/2005 14:53:23|SETI@home|Master page download succeeded
26/06/2005 14:53:24|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
26/06/2005 14:53:24|SETI@home|Requesting 345600 seconds of work, returning 32 results
26/06/2005 14:53:33|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed with a return value of 500
26/06/2005 14:53:33|SETI@home|No schedulers responded
26/06/2005 14:53:34|SETI@home|Deferring communication with project for 58 seconds
26/06/2005 14:53:34||Insufficient work; requesting more
26/06/2005 14:54:33|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
26/06/2005 14:54:33|SETI@home|Requesting 345600 seconds of work, returning 32 results
26/06/2005 14:54:40|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed with a return value of 500
26/06/2005 14:54:40|SETI@home|No schedulers responded
26/06/2005 14:54:41|SETI@home|Deferring communication with project for 58 seconds
26/06/2005 14:55:40|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
26/06/2005 14:55:40|SETI@home|Requesting 345600 seconds of work, returning 32 results
26/06/2005 14:55:54|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed with a return value of 500
26/06/2005 14:55:54|SETI@home|No schedulers responded
26/06/2005 14:55:55|SETI@home|Deferring communication with project for 58 seconds
...
26/06/2005 15:09:58|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed with a return value of 500
26/06/2005 15:09:58|SETI@home|No schedulers responded
26/06/2005 15:09:59|SETI@home|Deferring communication with project for 43 minutes and 4 seconds

and so on.
I haven't seen the 500-error with CC4.43 or CC4.45 before, so this is new to me.

Here is what stdoutgui.txt says:

14:53:05: Error: Cannot find active dialup connection: eine ungültige Strukturgröße wurde entdeckt.
14:54:05: Error: Cannot find active dialup connection: eine ungültige Strukturgröße wurde entdeckt.
...
15:15:05: Error: Cannot find active dialup connection: eine ungültige Strukturgröße wurde entdeckt.

An error message in english with a german explanation??? Luckily i'm a native german: the translation should be: 'an invalid size of structur was found.'

If you need more information, please request.


_\|/_
U r s
ID: 128143 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 128145 - Posted: 26 Jun 2005, 13:47:36 UTC - in response to Message 127346.  

If you are getting the 500 code, Rom is interested in the server request and reply XML files... I would be interested in them too (along with the logs) ... I mean, this is the type of problem I would like to document even if we are stumped for the moment ....


Hi Paul D. Buck,

do you mean sched_request_....xml and sched_reply_....xml ?


_\|/_
U r s
ID: 128145 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 128226 - Posted: 26 Jun 2005, 16:21:25 UTC - in response to Message 128145.  
Last modified: 26 Jun 2005, 16:24:26 UTC

If you are getting the 500 code, Rom is interested in the server request and reply XML files... I would be interested in them too (along with the logs) ... I mean, this is the type of problem I would like to document even if we are stumped for the moment ....


Hi Paul D. Buck,

do you mean sched_request_....xml and sched_reply_....xml ?


yes ...

And I would still like the logs ... that is the TXT files, zipped up ... the sched XML files have to be grabbed right after the communication session before they get overwritten ...

p.d.buck@comcast.net for me ...
ID: 128226 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 128386 - Posted: 26 Jun 2005, 19:33:16 UTC - in response to Message 128226.  


yes ...

And I would still like the logs ... that is the TXT files, zipped up ... the sched XML files have to be grabbed right after the communication session before they get overwritten ...

p.d.buck@comcast.net for me ...


Lucky me, i did capture them right away. Anyway, still no respond from the schedular, even with CC4.46. So my two problematic hosts are out of work until this problem is solved, or the deadline...

I zipped them alltogether and will send them soon.





_\|/_
U r s
ID: 128386 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 128557 - Posted: 26 Jun 2005, 23:06:38 UTC
Last modified: 26 Jun 2005, 23:15:45 UTC

Here is what i got in stdout after a few unsuccessful tries and restarts of boinc CC4.45 with windbg.exe (the symbol files were not found/accepted):

27/06/2005 00:46:35||Starting BOINC client version 4.45 for windows_intelx86
27/06/2005 00:46:35||Data directory: E:BOINC
27/06/2005 00:46:35||Invalid account file: account_setup
27/06/2005 00:46:35|SETI@home|Found app_info.xml; using anonymous platform
27/06/2005 00:46:35|SETI@home|Computer ID: 8255; location: ; project prefs: default
27/06/2005 00:46:35||General prefs: from SETI@home (last modified 2005-06-19 16:59:44)
27/06/2005 00:46:35||General prefs: using your defaults
27/06/2005 00:46:40||Remote control allowed
27/06/2005 00:46:40|SETI@home|Resuming computation for result 18au03aa.8872.9345.579834.249_0 using setiathome version 4.11
27/06/2005 00:46:40|SETI@home|Resuming computation for result 18au03aa.8872.9393.242328.2_1 using setiathome version 4.11
27/06/2005 00:46:40|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
27/06/2005 00:46:40|SETI@home|Requesting 0 seconds of work, returning 36 results
27/06/2005 00:46:40||Using earliest-deadline-first scheduling because computer is overcommitted.
27/06/2005 00:46:48|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed
27/06/2005 00:46:48|SETI@home|No schedulers responded
27/06/2005 00:46:49|SETI@home|Deferring communication with project for 58 seconds
27/06/2005 00:46:49||May run out of work in 4.00 days; requesting more
27/06/2005 00:47:48|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
27/06/2005 00:47:48|SETI@home|Requesting 0 seconds of work, returning 36 results
27/06/2005 00:48:00|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed
27/06/2005 00:48:00|SETI@home|No schedulers responded
27/06/2005 00:48:01|SETI@home|Deferring communication with project for 58 seconds
27/06/2005 00:49:00|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
27/06/2005 00:49:00|SETI@home|Requesting 0 seconds of work, returning 36 results
27/06/2005 00:49:06|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi failed
27/06/2005 00:49:06|SETI@home|No schedulers responded
27/06/2005 00:49:07|SETI@home|Deferring communication with project for 58 seconds
27/06/2005 00:50:06|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
27/06/2005 00:50:06|SETI@home|Requesting 0 seconds of work, returning 36 results
27/06/2005 00:50:13|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
27/06/2005 00:52:37||request_reschedule_cpus: project op
27/06/2005 00:52:38|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
27/06/2005 00:52:38|SETI@home|Requesting 0 seconds of work, returning 0 results
27/06/2005 00:52:40|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded

The last benchmark session was 'funny', too. The result was:

2005-06-26 22:42:59 [---] Benchmark results:
2005-06-26 22:42:59 [---] Number of CPUs: 2
2005-06-26 22:42:59 [---] 4350 double precision MIPS (Whetstone) per CPU
2005-06-26 22:42:59 [---] 4438 integer MIPS (Dhrystone) per CPU
2005-06-26 22:42:59 [---] Finished CPU benchmarks

resulting in a much two low completion time (1h46min59sec) even with the optimized sah4.11-client. My normal times are about 4hrs and the benchmark is still unpredictable on this dual PIII.

edit: finally this host started to request and download new wus:
27/06/2005 01:10:38|SETI@home|Requesting 678274 seconds of work, returning 0 results
Two crunching, one to go.

_\|/_
U r s
ID: 128557 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 128563 - Posted: 26 Jun 2005, 23:13:26 UTC - in response to Message 128557.  
Last modified: 26 Jun 2005, 23:22:03 UTC

27/06/2005 00:46:35||Data directory: E:BOINC
27/06/2005 00:46:35||Invalid account file: account_setup
27/06/2005 00:46:35|SETI@home|Found app_info.xml; using anonymous platform
27/06/2005 00:46:35|SETI@home|Computer ID: 8255; location: ; project prefs: default
27/06/2005 00:46:35||General prefs: from SETI@home (last modified 2005-06-19 16:59:44)
27/06/2005 00:46:35||General prefs: using your defaults

(1)Check your account preferences to make sure the "Default" connect to is not "0" or really low. some have mentioned these values changing after yesterdays DB fix. [edit] I see from your log it's set to 4 days so you can ignore this part and focus on the next problem[end edit]

also You have an "Invalid file: account setup" whatever that means.

tony

Rom told me the Windbg is only good when you experience a crash, so that might not be of much use.
ID: 128563 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : No schedulars responded.


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.