Current download problem prohibits also other projects downloads

Message boards : Number crunching : Current download problem prohibits also other projects downloads
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3989
Credit: 85,281,665
RAC: 126
Finland
Message 457562 - Posted: 14 Nov 2006, 10:51:16 UTC

Hi,

for some curious reason, my hosts did not download new WU's from other projects although CPU's were already running idle. BoincView shows that there is a work buffer for sah even when download is not succesful.

I had to suspend sah to download new work from Einstein. This happened with Boinc 5.4.11 and 5.3.12 (truXoft). Update did not help.

Harri
ID: 457562 · Report as offensive
Calculator
Volunteer tester

Send message
Joined: 30 Sep 06
Posts: 62
Credit: 69,529
RAC: 0
Germany
Message 457565 - Posted: 14 Nov 2006, 11:16:04 UTC

Same with me.
I guess it is because the state is "downloading" and boinc thinks the work is going to come soon or sth. like that.
ID: 457565 · Report as offensive
Profile mikey
Volunteer tester
Avatar

Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 457566 - Posted: 14 Nov 2006, 11:17:02 UTC - in response to Message 457562.  

Hi,

for some curious reason, my hosts did not download new WU's from other projects although CPU's were already running idle. BoincView shows that there is a work buffer for sah even when download is not succesful.

I had to suspend sah to download new work from Einstein. This happened with Boinc 5.4.11 and 5.3.12 (truXoft). Update did not help. Harri

You are probably running the old deficiet issue. Where you owe Seti time and until that is satisfied, no otherproject can download work. It is set up to satisfy your multy project setting, the ones where you say I want my computer to crunch x amount of time for this project and y for that program. If the program that you have set up for X, Seti in this case, then your other projects go down because Seti is lacking in the time you owe it.

ID: 457566 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 457568 - Posted: 14 Nov 2006, 11:20:40 UTC
Last modified: 14 Nov 2006, 11:24:33 UTC

It's my guess that his client has requested X seconds of work from seti and the order was filled, but just hasn't reached him yet, so the puter thinks it has X seconds on hand when infact it doesn't. The scheduler has already taken that into account and isn't requesting other work. Atleast, I think this is correct.

So when the download is finally completed, he would have X seconds on hand, and if the code was changed to see the non existent download and actually get work from elsewhere, then if the outage was short, the host would be overcommitted and risk missing deadlines.
tony

ID: 457568 · Report as offensive
Profile CoolBlue87GT
Avatar

Send message
Joined: 27 Dec 03
Posts: 59
Credit: 53,580
RAC: 0
United States
Message 457582 - Posted: 14 Nov 2006, 11:59:13 UTC

Same problem here. now have three work units, status downloading. Here's part of the messages. Any help would be nice.

11/14/2006 5:39:11 AM||Starting BOINC client version 5.4.11 for windows_intelx86
11/14/2006 5:39:11 AM||libcurl/7.15.3 OpenSSL/0.9.8a zlib/1.2.3
11/14/2006 5:39:11 AM||Data directory: F:\\program files\\BOINC
11/14/2006 5:39:11 AM||Processor: 1 AuthenticAMD mobile AMD Athlon(tm) XP2200+
11/14/2006 5:39:11 AM||Memory: 446.48 MB physical, 885.71 MB virtual
11/14/2006 5:39:11 AM||Disk: 55.89 GB total, 45.16 GB free
11/14/2006 5:39:11 AM|SETI@home|URL: http://setiathome.berkeley.edu/; Computer ID: 2847434; location: home; project prefs: default
11/14/2006 5:39:11 AM||No general preferences found - using BOINC defaults
11/14/2006 5:39:11 AM||Local control only allowed
11/14/2006 5:39:11 AM||Listening on port 31416
11/14/2006 5:40:16 AM|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
11/14/2006 5:49:04 AM|SETI@home|Scheduler request succeeded
11/14/2006 5:49:06 AM|SETI@home|Started download of file 10jn03aa.8062.16754.436050.3.20
11/14/2006 5:49:07 AM|SETI@home|Incomplete read of less than 5KB for 10jn03aa.8062.16754.436050.3.20 - truncating
11/14/2006 5:49:07 AM|SETI@home|Temporarily failed download of 10jn03aa.8062.16754.436050.3.20: Error 403
11/14/2006 5:49:07 AM|SETI@home|Backing off 1 minutes and 0 seconds on download of file 10jn03aa.8062.16754.436050.3.20
11/14/2006 5:50:08 AM|SETI@home|Started download of file 10jn03aa.8062.16754.436050.3.20
11/14/2006 5:50:09 AM|SETI@home|Incomplete read of less than 5KB for 10jn03aa.8062.16754.436050.3.20 - truncating
11/14/2006 5:50:09 AM|SETI@home|Temporarily failed download of 10jn03aa.8062.16754.436050.3.20: Error 403
ID: 457582 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 457609 - Posted: 14 Nov 2006, 12:43:42 UTC - in response to Message 457568.  

Formerly
mmciastro. Name and avatar changed for a change

tony

Phew!

Good.

So the forums aren't completely broken!

:-)
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 457609 · Report as offensive
Harri Liljeroos
Avatar

Send message
Joined: 29 May 99
Posts: 3989
Credit: 85,281,665
RAC: 126
Finland
Message 457621 - Posted: 14 Nov 2006, 12:58:46 UTC - in response to Message 457568.  

It's my guess that his client has requested X seconds of work from seti and the order was filled, but just hasn't reached him yet, so the puter thinks it has X seconds on hand when infact it doesn't. The scheduler has already taken that into account and isn't requesting other work. Atleast, I think this is correct.

So when the download is finally completed, he would have X seconds on hand, and if the code was changed to see the non existent download and actually get work from elsewhere, then if the outage was short, the host would be overcommitted and risk missing deadlines.
tony


I think that's what is happening. This is one rare situation when micro managing may be needed. As I mentioned, suspending seti for a minute allowed other projects to download some work and keep my hosts busy, at least for a while.

Harri
ID: 457621 · Report as offensive
Astro
Volunteer tester
Avatar

Send message
Joined: 16 Apr 02
Posts: 8026
Credit: 600,015
RAC: 0
Message 457672 - Posted: 14 Nov 2006, 13:59:57 UTC - in response to Message 457621.  

This is one rare situation when micro managing may be needed.
Harri

Indeed, you did exactly the right thing.

tony

ID: 457672 · Report as offensive
Peter Baker
Volunteer tester

Send message
Joined: 5 Nov 06
Posts: 2
Credit: 10,585
RAC: 0
United Kingdom
Message 457679 - Posted: 14 Nov 2006, 14:10:39 UTC
Last modified: 14 Nov 2006, 14:12:06 UTC

I am having the same issue as coolblue. At first I thought it was my systems playing up. Which I have found not to be the case as all my machines can stream a radio station at 128kbit without buffering. This made me wonder if its a problem with the download server. All of my machines just keep failing to download and on occasions they cant even connect to the server. Even though the machines are still recieving the radio stream. Here is part of my log >>

14/11/2006 13:51:14|SETI@home|Started download of file 14jn03ab.20420.23936.778410.3.111
14/11/2006 13:51:17|SETI@home|Temporarily failed download of 14jn03ab.20420.23936.778410.3.111: http error
14/11/2006 13:51:17|SETI@home|Backing off 1 minutes and 21 seconds on download of file 14jn03ab.20420.23936.778410.3.111
14/11/2006 13:52:39|SETI@home|Started download of file 14jn03ab.20420.23936.778410.3.111
14/11/2006 13:52:41|SETI@home|Incomplete read of less than 5KB for 14jn03ab.20420.23936.778410.3.111 - truncating
14/11/2006 13:52:41|SETI@home|Temporarily failed download of 14jn03ab.20420.23936.778410.3.111: Error 403
14/11/2006 13:52:41|SETI@home|Backing off 1 minutes and 45 seconds on download of file 14jn03ab.20420.23936.778410.3.111

Thats not all I get. I also have this pop-up on occasions >>

14/11/2006 13:39:08|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
14/11/2006 13:39:08|SETI@home|Reason: Requested by user
14/11/2006 13:39:08|SETI@home|Reporting 1 tasks
14/11/2006 13:39:13|SETI@home|Scheduler request failed: couldn't resolve host name
14/11/2006 13:39:13|SETI@home|Deferring scheduler requests for 1 minutes and 0 seconds

All 4 of my machines are having the exact same issue. And its not my PC or my net connection thats causing the problem.

Plus I still have some tasks that still need reporting.
And just to clarify, I am only running SETI on all of my machines.

P.S. Still pretty new to all this stuff, but I am learning fast.

.::EDIT::. I know this isn't the topic to use, but it seems it may be related .::END-EDIT::.
ID: 457679 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 457835 - Posted: 14 Nov 2006, 17:23:55 UTC - in response to Message 457621.  

It's my guess that his client has requested X seconds of work from seti and the order was filled, but just hasn't reached him yet, so the puter thinks it has X seconds on hand when infact it doesn't. The scheduler has already taken that into account and isn't requesting other work. Atleast, I think this is correct.

So when the download is finally completed, he would have X seconds on hand, and if the code was changed to see the non existent download and actually get work from elsewhere, then if the outage was short, the host would be overcommitted and risk missing deadlines.
tony


I think that's what is happening. This is one rare situation when micro managing may be needed. As I mentioned, suspending seti for a minute allowed other projects to download some work and keep my hosts busy, at least for a while.

Harri

These outages make fascinating testbeds to observe BOINC behaviour under pressure.

I have exactly the opposite situation [NB I'm not regarding it as a problem - just an observation].

Rig does mostly SETI work, with Einstein as a low-share secondary to cover outages. I also run a 3 day cache, again to cover situations just like this one.

At some point during the outage, the rig got scheduled a VHAR with a 4-day deadline, so immediately (and by design) went into EDF. However, it hasn't yet downloaded the data for the VHAR, so the EDF has resulted in - several hours crunching for Einstein, and no work for SETI at all!

Ah well, such is life - LTD will sort it all out eventually.
ID: 457835 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 457844 - Posted: 14 Nov 2006, 17:32:40 UTC

Now will someone at SETI please let us know what is going on? As usual there is no mention of anything in the technical newsletters or announcements. Just that "sah_assimilator_nonenh kryten Not Running" is on the server status page. Is that the problem? Can no one get it sorted?

ID: 457844 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 457848 - Posted: 14 Nov 2006, 17:37:01 UTC - in response to Message 457566.  
Last modified: 14 Nov 2006, 17:38:29 UTC

Hi,

for some curious reason, my hosts did not download new WU's from other projects although CPU's were already running idle. BoincView shows that there is a work buffer for sah even when download is not succesful.

I had to suspend sah to download new work from Einstein. This happened with Boinc 5.4.11 and 5.3.12 (truXoft). Update did not help. Harri

You are probably running the old deficiet issue. Where you owe Seti time and until that is satisfied, no otherproject can download work. It is set up to satisfy your multy project setting, the ones where you say I want my computer to crunch x amount of time for this project and y for that program. If the program that you have set up for X, Seti in this case, then your other projects go down because Seti is lacking in the time you owe it.


That's not necessarily true. In this case if SAH is owed time due to LTD, then BOINC won't DL work from the other projects until all the current work onboard is done or is not runable, and then only DL one result at a time from the others.

Of course, manual intervention with a project suspend will work around the problem, but may cause other debt problems down the road.

My recommendation; if you still have work onboard, or run multiple projects it's probably better to just ride this one out. BOINC will automatically make up the lost resource share time sooner or later.

Alinator
ID: 457848 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 457852 - Posted: 14 Nov 2006, 17:38:45 UTC - in response to Message 457844.  

Now will someone at SETI please let us know what is going on? As usual there is no mention of anything in the technical newsletters or announcements. Just that "sah_assimilator_nonenh kryten Not Running" is on the server status page. Is that the problem? Can no one get it sorted?


If you check the other trouble threads currently running in this forum, you will find that most of us know what is going on. Seti has has a randomly occuring mount failure of some sort that prevents downloads. They will have to reboot the servers to correct it. I would expect it to happen fairly soon. The assimilator that is not running on the status page has nothing to do with it. I don't think it ever runs anymore, because it was for the old non-enhanced WUs, which we have not been crunching for some time.
Sit tight, they should have it sorted out soon......
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 457852 · Report as offensive
Bob Neville
Avatar

Send message
Joined: 3 Jan 00
Posts: 35
Credit: 7,451,208
RAC: 0
United States
Message 457856 - Posted: 14 Nov 2006, 17:41:48 UTC

someone wake up the hamster!! :)


words are the symbols of mental experience
ID: 457856 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 457859 - Posted: 14 Nov 2006, 17:43:19 UTC - in response to Message 457844.  

Now will someone at SETI please let us know what is going on? As usual there is no mention of anything in the technical newsletters or announcements. Just that "sah_assimilator_nonenh kryten Not Running" is on the server status page. Is that the problem? Can no one get it sorted?

I'm sure they are well aware of the problem, and working on the problem now. Why should they take time away from fixing something just to post and update? When its all said and done, they will post something in the tech news, just as they always do. Relax, take a breath. Nothing you, nor I, or anyone else here have any control over.
ID: 457859 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 457890 - Posted: 14 Nov 2006, 18:01:33 UTC

November 14, 2006
A configuration problem on our servers have caused workunit downloads to fail since yesterday afternoon. This has been fixed. However, we are bringing the whole project down for our regular Tuesday outage to back up our database. We should be back up in a few hours (22:00 UTC).
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 457890 · Report as offensive
Profile mikey
Volunteer tester
Avatar

Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 457953 - Posted: 14 Nov 2006, 22:24:17 UTC - in response to Message 457844.  

Now will someone at SETI please let us know what is going on? As usual there is no mention of anything in the technical newsletters or announcements. Just that "sah_assimilator_nonenh kryten Not Running" is on the server status page. Is that the problem? Can no one get it sorted?

I believe it happened in the middle of the night, when they were snoozing. Probably after being out all night with those coeds, you know how Master's Degree people are.

ID: 457953 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 457960 - Posted: 14 Nov 2006, 22:39:25 UTC - in response to Message 457621.  

It's my guess that his client has requested X seconds of work from seti and the order was filled, but just hasn't reached him yet, so the puter thinks it has X seconds on hand when infact it doesn't. The scheduler has already taken that into account and isn't requesting other work. Atleast, I think this is correct.

So when the download is finally completed, he would have X seconds on hand, and if the code was changed to see the non existent download and actually get work from elsewhere, then if the outage was short, the host would be overcommitted and risk missing deadlines.
tony


I think that's what is happening. This is one rare situation when micro managing may be needed. As I mentioned, suspending seti for a minute allowed other projects to download some work and keep my hosts busy, at least for a while.

Harri

This is a known issue in the current release, where it does CPU scheduling system-wide. The current beta does scheduling on a per-core basis.
ID: 457960 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 457977 - Posted: 14 Nov 2006, 23:11:27 UTC - in response to Message 457953.  

Now will someone at SETI please let us know what is going on? As usual there is no mention of anything in the technical newsletters or announcements. Just that "sah_assimilator_nonenh kryten Not Running" is on the server status page. Is that the problem? Can no one get it sorted?

I believe it happened in the middle of the night, when they were snoozing. Probably after being out all night with those coeds, you know how Master's Degree people are.


Hmmm, I don't know. The INR-688 interface to Cogent "flatlined" yesterday around 3 PM Berkeley time, so it would seem they knew they were in trouble before they left yesterday (Also an afternoon time frame was mentioned in the news item).

Still, the extra curricular activity factor may have played a part in how long it was out. ;-)

Alinator
ID: 457977 · Report as offensive
Profile keyboards
Volunteer tester
Avatar

Send message
Joined: 14 Jul 00
Posts: 66
Credit: 492,766
RAC: 0
United States
Message 458008 - Posted: 14 Nov 2006, 23:42:56 UTC

From the front page:

November 14, 2006
A configuration problem on our servers have caused workunit downloads to fail since yesterday afternoon. This has been fixed. However, we are bringing the whole project down for our regular Tuesday outage to back up our database. We should be back up in a few hours (22:00 UTC).


Be patient, they are working on it.
!!Stupidity should be PAINFUL!!
ID: 458008 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Current download problem prohibits also other projects downloads


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.