Questions and Answers :
Unix/Linux :
New work from projects coming soon?
Message board moderation
Author | Message |
---|---|
agcarver Send message Joined: 14 May 99 Posts: 21 Credit: 150,823 RAC: 0 |
I saw the notes about the system failures, I'm just wondering how long it'll be before new work starts arriving. Four systems that I'm running BOINC on have complained of scheduler timeouts or no work for a little over a month now. Just to be sure I've already reset the projects and still no luck acquiring new work. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
The server problems have not lasted for a month. Are you sure there's not a different issue at play here? |
agcarver Send message Joined: 14 May 99 Posts: 21 Credit: 150,823 RAC: 0 |
The server problems have not lasted for a month. Are you sure there's not a different issue at play here? Yes, I'm quite sure but I can double check everything. Two clients (both Solaris 10) are running BOINC 5.10.17 and have been doing so for months on end. Two other clients are Debian Linux running 5.4.11 and 5.8.16 also running for months without incident. I can reach the main Berkeley pages with wget from all four machines with no problem. If I try to connect to the scheduler with wget (http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi according to the XML file), it resolves the IP just fine but it otherwise just sits there doing nothing. Nothing has changed on any of the four systems, not even any kind of library updates. Actually, my wget test to the scheduler finally returned after a few minutes. The reply was: <scheduler_reply> <scheduler_version>603</scheduler_version> <master_url>http://setiathome.berkeley.edu/</master_url> <request_delay>11.000000</request_delay> <message priority="low">Error in request message: no start tag </message> <project_name>SETI@home</project_name> </scheduler_reply> |
Dotsch Send message Joined: 9 Jun 99 Posts: 2422 Credit: 919,393 RAC: 0 |
Have you logged the output from the BOINC client ? - Could you please look, if there was an work request by your Solaris systems the last time ? What happens, if you stop the BOINC client and restart it again with ".boinc_client -update_prefs http://setiathome.berkeley.edu" ? - Could you please post the complete messages from the startup of the BOINC client with the -update.. option. |
agcarver Send message Joined: 14 May 99 Posts: 21 Credit: 150,823 RAC: 0 |
Have you logged the output from the BOINC client ? - Could you please look, if there was an work request by your Solaris systems the last time ? Yes, I do log all the messages. All four machines do make workunit requests for some number of seconds worth of data (the average size for each of the machines) and I get either a timeout or a deferment of some number of minutes and seconds, or a project communication failed message. Sometimes there's also a "Access to reference site succeeded - project servers may be temporarily down". They've been making requests every few minutes for a month. As for using -update_prefs, I get (similar across machines, this is just one of them): 2008-07-31 10:17:25 [---] Starting BOINC client version 5.8.16 for i686-pc-linux-gnu 2008-07-31 10:17:25 [---] log flags: task, file_xfer, sched_ops 2008-07-31 10:17:25 [---] Libraries: libcurl/7.16.0 OpenSSL/0.9.8d zlib/1.2.3 2008-07-31 10:17:25 [---] Data directory: /home/agcarver/BOINC 2008-07-31 10:17:25 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) processor [Family 6 Model 4 Stepping 2][fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow up] 2008-07-31 10:17:25 [---] Memory: 473.01 MB physical, 392.17 MB virtual 2008-07-31 10:17:25 [---] Disk: 7.32 GB total, 4.12 GB free 2008-07-31 10:17:25 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4418837; location: home; project prefs: default 2008-07-31 10:17:25 [---] General prefs: from SETI@home (last modified 2007-07-08 16:51:41) 2008-07-31 10:17:25 [---] Host location: home 2008-07-31 10:17:25 [---] General prefs: no separate prefs for home; using your defaults Then it just sits there since this particular machine was instructed to wait over three hours before making another attempt at contacting the scheduler. |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
Yes, I do log all the messages. All four machines do make workunit requests for some number of seconds worth of data (the average size for each of the machines) and I get either a timeout or a deferment of some number of minutes and seconds, or a project communication failed message. Sometimes there's also a "Access to reference site succeeded - project servers may be temporarily down". They've been making requests every few minutes for a month... I think those messages might be worth posting here too. |
agcarver Send message Joined: 14 May 99 Posts: 21 Credit: 150,823 RAC: 0 |
Yes, I do log all the messages. All four machines do make workunit requests for some number of seconds worth of data (the average size for each of the machines) and I get either a timeout or a deferment of some number of minutes and seconds, or a project communication failed message. Sometimes there's also a "Access to reference site succeeded - project servers may be temporarily down". They've been making requests every few minutes for a month... Actually, that's all I ever got were "Project communication failed" with nothing further. It was just a one-line message in the logs. I've since gotten everything restarted finally. I had to issue the -update_prefs command multiple times, then do several resets on the projects, go back to update_prefs, then reset a few more times and things got unstuck. Are there any kinds of throttling going on at the servers based on IP address? The four affected machines all sit behind a NAT router so all four end up coming from the same (fixed) IP address. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Are there any kinds of throttling going on at the servers based on IP address? The four affected machines all sit behind a NAT router so all four end up coming from the same (fixed) IP address. No. The only thing that ever happens with the servers and IP addresses are IPs that tend to check stats too frequently (such as for user created stat sites) and begin to put too much load on the servers or bandwidth. The sending or receiving of workunits are never throttled. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.