New work from projects coming soon?


log in

Advanced search

Questions and Answers : Unix/Linux : New work from projects coming soon?

Author Message
agcarver
Send message
Joined: 14 May 99
Posts: 21
Credit: 150,823
RAC: 0
United States
Message 790065 - Posted: 30 Jul 2008, 21:09:29 UTC

I saw the notes about the system failures, I'm just wondering how long it'll be before new work starts arriving. Four systems that I'm running BOINC on have complained of scheduler timeouts or no work for a little over a month now. Just to be sure I've already reset the projects and still no luck acquiring new work.

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 31,045,437
RAC: 20,945
United States
Message 790070 - Posted: 30 Jul 2008, 21:23:16 UTC

The server problems have not lasted for a month. Are you sure there's not a different issue at play here?

agcarver
Send message
Joined: 14 May 99
Posts: 21
Credit: 150,823
RAC: 0
United States
Message 790085 - Posted: 30 Jul 2008, 22:00:32 UTC - in response to Message 790070.

The server problems have not lasted for a month. Are you sure there's not a different issue at play here?



Yes, I'm quite sure but I can double check everything. Two clients (both Solaris 10) are running BOINC 5.10.17 and have been doing so for months on end. Two other clients are Debian Linux running 5.4.11 and 5.8.16 also running for months without incident.

I can reach the main Berkeley pages with wget from all four machines with no problem. If I try to connect to the scheduler with wget (http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi according to the XML file), it resolves the IP just fine but it otherwise just sits there doing nothing.

Nothing has changed on any of the four systems, not even any kind of library updates.

Actually, my wget test to the scheduler finally returned after a few minutes. The reply was:

<scheduler_reply>
<scheduler_version>603</scheduler_version>
<master_url>http://setiathome.berkeley.edu/</master_url>
<request_delay>11.000000</request_delay>
<message priority="low">Error in request message: no start tag </message>
<project_name>SETI@home</project_name>
</scheduler_reply>

____________

Dotsch
Volunteer tester
Avatar
Send message
Joined: 9 Jun 99
Posts: 2422
Credit: 847,804
RAC: 3
Germany
Message 790279 - Posted: 31 Jul 2008, 6:40:27 UTC

Have you logged the output from the BOINC client ? - Could you please look, if there was an work request by your Solaris systems the last time ?
What happens, if you stop the BOINC client and restart it again with ".boinc_client -update_prefs http://setiathome.berkeley.edu" ? - Could you please post the complete messages from the startup of the BOINC client with the -update.. option.
____________

agcarver
Send message
Joined: 14 May 99
Posts: 21
Credit: 150,823
RAC: 0
United States
Message 790412 - Posted: 31 Jul 2008, 14:26:28 UTC - in response to Message 790279.

Have you logged the output from the BOINC client ? - Could you please look, if there was an work request by your Solaris systems the last time ?
What happens, if you stop the BOINC client and restart it again with ".boinc_client -update_prefs http://setiathome.berkeley.edu" ? - Could you please post the complete messages from the startup of the BOINC client with the -update.. option.



Yes, I do log all the messages. All four machines do make workunit requests for some number of seconds worth of data (the average size for each of the machines) and I get either a timeout or a deferment of some number of minutes and seconds, or a project communication failed message. Sometimes there's also a "Access to reference site succeeded - project servers may be temporarily down". They've been making requests every few minutes for a month.


As for using -update_prefs, I get (similar across machines, this is just one of them):
2008-07-31 10:17:25 [---] Starting BOINC client version 5.8.16 for i686-pc-linux-gnu
2008-07-31 10:17:25 [---] log flags: task, file_xfer, sched_ops
2008-07-31 10:17:25 [---] Libraries: libcurl/7.16.0 OpenSSL/0.9.8d zlib/1.2.3
2008-07-31 10:17:25 [---] Data directory: /home/agcarver/BOINC
2008-07-31 10:17:25 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) processor [Family 6 Model 4 Stepping 2][fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow up]
2008-07-31 10:17:25 [---] Memory: 473.01 MB physical, 392.17 MB virtual
2008-07-31 10:17:25 [---] Disk: 7.32 GB total, 4.12 GB free
2008-07-31 10:17:25 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 4418837; location: home; project prefs: default
2008-07-31 10:17:25 [---] General prefs: from SETI@home (last modified 2007-07-08 16:51:41)
2008-07-31 10:17:25 [---] Host location: home
2008-07-31 10:17:25 [---] General prefs: no separate prefs for home; using your defaults

Then it just sits there since this particular machine was instructed to wait over three hours before making another attempt at contacting the scheduler.
____________

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 359,338
RAC: 33
Germany
Message 790483 - Posted: 31 Jul 2008, 17:15:47 UTC - in response to Message 790412.
Last modified: 31 Jul 2008, 17:16:09 UTC

Yes, I do log all the messages. All four machines do make workunit requests for some number of seconds worth of data (the average size for each of the machines) and I get either a timeout or a deferment of some number of minutes and seconds, or a project communication failed message. Sometimes there's also a "Access to reference site succeeded - project servers may be temporarily down". They've been making requests every few minutes for a month...

I think those messages might be worth posting here too.

agcarver
Send message
Joined: 14 May 99
Posts: 21
Credit: 150,823
RAC: 0
United States
Message 790493 - Posted: 31 Jul 2008, 17:44:51 UTC - in response to Message 790483.

Yes, I do log all the messages. All four machines do make workunit requests for some number of seconds worth of data (the average size for each of the machines) and I get either a timeout or a deferment of some number of minutes and seconds, or a project communication failed message. Sometimes there's also a "Access to reference site succeeded - project servers may be temporarily down". They've been making requests every few minutes for a month...

I think those messages might be worth posting here too.



Actually, that's all I ever got were "Project communication failed" with nothing further. It was just a one-line message in the logs.

I've since gotten everything restarted finally. I had to issue the -update_prefs command multiple times, then do several resets on the projects, go back to update_prefs, then reset a few more times and things got unstuck.

Are there any kinds of throttling going on at the servers based on IP address? The four affected machines all sit behind a NAT router so all four end up coming from the same (fixed) IP address.
____________

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13625
Credit: 31,045,437
RAC: 20,945
United States
Message 790539 - Posted: 31 Jul 2008, 19:57:23 UTC - in response to Message 790493.

Are there any kinds of throttling going on at the servers based on IP address? The four affected machines all sit behind a NAT router so all four end up coming from the same (fixed) IP address.


No. The only thing that ever happens with the servers and IP addresses are IPs that tend to check stats too frequently (such as for user created stat sites) and begin to put too much load on the servers or bandwidth.

The sending or receiving of workunits are never throttled.

Questions and Answers : Unix/Linux : New work from projects coming soon?

Copyright © 2014 University of California