Eric's biannual post #6: You can tuna fish, but you can't tune a TCP


log in

Advanced search

Message boards : SETI@home Staff Blog : Eric's biannual post #6: You can tuna fish, but you can't tune a TCP

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next
Author Message
Profile [SETI.USA] OneChicken
Avatar
Send message
Joined: 3 Apr 04
Posts: 70
Credit: 906,887
RAC: 0
United States
Message 570506 - Posted: 18 May 2007, 17:57:07 UTC - in response to Message 570468.


I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.


Eric: Any chance this can be fixed on Berkeley's end? I have some remote mahcines that I can not get to.

____________


Proud member of SETI.USA

Profile Labbie
Avatar
Send message
Joined: 19 Jun 06
Posts: 4083
Credit: 5,930,102
RAC: 0
United States
Message 570507 - Posted: 18 May 2007, 17:57:16 UTC - in response to Message 570500.

For me, the renaming the app_info file trick has accomplsihed something.

I did get 3 WUs on one machine, but on the Results for Computer page, it also gave me 5 additional Ghosts. Note that the timestamp for the Ghosts are weird, while the good DLs are consistent with the time they downloaded.

Good DLs in Red

535624975 129448185 18 May 2007 16:56:38 UTC 5 Jun 2007 9:12:03 UTC In Progress Unknown New --- --- ---
535218668 128846780 17 May 2007 17:43:31 UTC 22 May 2007 1:53:31 UTC In Progress Unknown New --- --- ---
535051627 129476175 18 May 2007 16:56:38 UTC 31 May 2007 19:41:22 UTC In Progress Unknown New --- --- ---
535051564 129476162 18 May 2007 16:56:38 UTC 31 May 2007 19:41:22 UTC In Progress Unknown New --- --- ---

534945332 129442634 18 May 2007 11:47:27 UTC 22 May 2007 19:57:27 UTC In Progress Unknown New --- --- ---
534759659 129388850 18 May 2007 1:11:32 UTC 11 Jun 2007 23:44:31 UTC In Progress Unknown New --- --- ---
534758973 129388628 18 May 2007 1:10:06 UTC 6 Jun 2007 19:09:32 UTC In Progress Unknown New --- --- ---
534758708 129388540 18 May 2007 1:09:01 UTC 6 Jun 2007 19:08:27 UTC In Progress Unknown New --- --- ---

[EDIT]Infact, one of the Ghosts got a timestamp in the future, if I'calculating GMT correctly from MDT[/EDIT]

The WU with the 17:43:31 UTC timestamp is dated yesterday (17 May 2007), so I don't think we've encountered ghostly time-travelling ETs just yet. Shame, really.


Yep, you are right, I misread the date.

____________

Calm Chaos Forum...Join Calm Chaos Now

Dominik S.
Send message
Joined: 4 Jun 03
Posts: 15
Credit: 4,346,294
RAC: 0
Poland
Message 570523 - Posted: 18 May 2007, 18:10:26 UTC - in response to Message 570468.



I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.

Eric

No it's not, I delete the file "sched_request_setiathome.berkeley.edu.xml" and restarted BOINC and now i have new ghost WU
____________

Profile Clyde C. Phillips, III
Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 570536 - Posted: 18 May 2007, 18:24:49 UTC

I don't know whether it's ghosts or what but I haven't been able to get a single Seti unit for either of my computers for at least a couple days:

5/18/2007 8:07:07 AM||Project communication failed: attempting access to reference site
5/18/2007 8:07:09 AM||Access to reference site succeeded - project servers may be temporarily down.
5/18/2007 8:07:11 AM|SETI@home|Scheduler request failed: couldn't connect to server
5/18/2007 8:07:11 AM|SETI@home|Deferring scheduler requests for 1 minutes and 32 seconds
5/18/2007 8:08:46 AM|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
5/18/2007 8:08:46 AM|SETI@home|Reason: To fetch work
5/18/2007 8:08:46 AM|SETI@home|Requesting 345600 seconds of new work
5/18/2007 8:09:07 AM||Project communication failed: attempting access to reference site
5/18/2007 8:09:09 AM||Access to reference site succeeded - project servers may be temporarily down.
5/18/2007 8:09:11 AM|SETI@home|Scheduler request failed: couldn't connect to server
5/18/2007 8:09:11 AM|SETI@home|Deferring scheduler requests for 48 minutes and 26 seconds
5/18/2007 8:57:42 AM|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
5/18/2007 8:57:42 AM|SETI@home|Reason: To fetch work
5/18/2007 8:57:42 AM|SETI@home|Requesting 345600 seconds of new work
5/18/2007 8:57:43 AM||Project communication failed: attempting access to reference site
5/18/2007 8:57:44 AM||Access to reference site succeeded - project servers may be temporarily down.
5/18/2007 8:57:47 AM|SETI@home|Scheduler request failed: server returned nothing (no headers, no data)
5/18/2007 8:57:47 AM|SETI@home|Deferring scheduler requests for 2 hours, 2 minutes and 53 seconds
5/18/2007 11:00:42 AM|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
5/18/2007 11:00:42 AM|SETI@home|Reason: To fetch work
5/18/2007 11:00:42 AM|SETI@home|Requesting 345600 seconds of new work
5/18/2007 11:00:46 AM||Project communication failed: attempting access to reference site
5/18/2007 11:00:47 AM||Access to reference site succeeded - project servers may be temporarily down.
5/18/2007 11:00:47 AM|SETI@home|Scheduler request failed: server returned nothing (no headers, no data)
5/18/2007 11:00:47 AM|SETI@home|Deferring scheduler requests for 3 hours, 49 minutes and 19 seconds
5/18/2007 2:21:33 PM||Rescheduling CPU: application exited

____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8374
Credit: 46,568,823
RAC: 14,320
United Kingdom
Message 570539 - Posted: 18 May 2007, 18:31:35 UTC - in response to Message 570523.



I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.

Eric

No it's not, I delete the file "sched_request_setiathome.berkeley.edu.xml" and restarted BOINC and now i have new ghost WU

You have to read the full original post. Eric was tracking down an earlier, simpler problem relating to "Incomplete request received".

The ghost WU seems to relate to "HTTP internal server error" and the use of optimised apps. Try that workround - it's been posted enough times already.

On the other hand, if you're getting ghost WUs without an app_info.xml file and an optimised app, that would be useful to know - please post again.

Dominik S.
Send message
Joined: 4 Jun 03
Posts: 15
Credit: 4,346,294
RAC: 0
Poland
Message 570543 - Posted: 18 May 2007, 18:43:55 UTC - in response to Message 570539.



I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.

Eric

No it's not, I delete the file "sched_request_setiathome.berkeley.edu.xml" and restarted BOINC and now i have new ghost WU

You have to read the full original post. Eric was tracking down an earlier, simpler problem relating to "Incomplete request received".

The ghost WU seems to relate to "HTTP internal server error" and the use of optimised apps. Try that workround - it's been posted enough times already.

On the other hand, if you're getting ghost WUs without an app_info.xml file and an optimised app, that would be useful to know - please post again.

Sorry, it's my fault,
I'm getting ghost WUs with app_info.xml of course, but the trick with renaming it works.
Really sorry for misunerstanding replay.
____________

Profile Y & J
Volunteer tester
Send message
Joined: 14 Nov 01
Posts: 15
Credit: 215,639
RAC: 0
United States
Message 570551 - Posted: 18 May 2007, 19:00:11 UTC - in response to Message 570539.
Last modified: 18 May 2007, 19:01:03 UTC

Thanks Richard
Fixed up both units.

No it's not, I delete the file "sched_request_setiathome.berkeley.edu.xml" and restarted BOINC and now i have new ghost WU
You have to read the full original post. Eric was tracking down an earlier, simpler problem relating to "Incomplete request received".


____________
[color= blue][u]SETI@home classic workunits = 5,906 with CPU time of 60,377 hours[/u][/color]

gomeyer
Volunteer tester
Send message
Joined: 21 May 99
Posts: 488
Credit: 50,157,953
RAC: 0
United States
Message 570574 - Posted: 18 May 2007, 19:43:38 UTC - in response to Message 570477.
Last modified: 18 May 2007, 19:44:26 UTC

That worked for me also. The question I now have is, once we go back to using app_info.xml will that break communications again?


The answer is yes, restoring app_info.xml and restarting BOINC does indeed break it again. I guess that last step should be skipped unless you're sure you have enough work to last a while, then stop new work requests to prevent ghosts.

Rndmacts
Send message
Joined: 18 Aug 99
Posts: 4
Credit: 122,806
RAC: 0
Canada
Message 570578 - Posted: 18 May 2007, 19:53:14 UTC

I have been getting the same problems everyone else has, and I had closed and restarted Boinc several times with no relief. Finally rebooted computer and Boinc started and sent all finished work units and downloaded new units. I didn't try the app_info.xml fix, just rebooted, everything seems fine now.
____________
Can a whisper be heard across the universe?

crazyrabbit1
Send message
Joined: 17 Sep 06
Posts: 35
Credit: 2,282,319
RAC: 0
Germany
Message 570581 - Posted: 18 May 2007, 19:56:24 UTC - in response to Message 570468.


SETI@home 17/05/2007 18:18:18 Message from server: Incomplete request received.


I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.


SETI@home 17/05/2007 19:01:39 Scheduler request failed: HTTP internal server error
SETI@home 17/05/2007 19:06:49 Scheduler request failed: server returned nothing (no headers, no data)


This usually means the scheduler request timed out because we are still overwhelmed.

Eric


@Eric
on my side it does not seems to work, i deleted the file and restarted boinc and jusut get the message no headers no data returned. Also i get new ghost units after switching to the opapp again.

After all i would thank you and the hole team for the hard work to get things up to normal working.

Dominik S.
Send message
Joined: 4 Jun 03
Posts: 15
Credit: 4,346,294
RAC: 0
Poland
Message 570593 - Posted: 18 May 2007, 20:11:13 UTC - in response to Message 570581.


SETI@home 17/05/2007 18:18:18 Message from server: Incomplete request received.


I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.


SETI@home 17/05/2007 19:01:39 Scheduler request failed: HTTP internal server error
SETI@home 17/05/2007 19:06:49 Scheduler request failed: server returned nothing (no headers, no data)


This usually means the scheduler request timed out because we are still overwhelmed.

Eric


@Eric
on my side it does not seems to work, i deleted the file and restarted boinc and jusut get the message no headers no data returned. Also i get new ghost units after switching to the opapp again.

After all i would thank you and the hole team for the hard work to get things up to normal working.

The problem with ghost units is different one. It's probably associated with using anonymous platform (you are using optimised app and have app_info.xml)
____________

crazyrabbit1
Send message
Joined: 17 Sep 06
Posts: 35
Credit: 2,282,319
RAC: 0
Germany
Message 570608 - Posted: 18 May 2007, 20:36:19 UTC - in response to Message 570593.


SETI@home 17/05/2007 18:18:18 Message from server: Incomplete request received.


I think I've tracked down this problem. There seems to be something related to the outage that has corrupted the file "sched_request_setiathome.berkeley.edu.xml"
in your BOINC directories. Restarting BOINC or deleting the file and restarting BOINC should fix that problem.


SETI@home 17/05/2007 19:01:39 Scheduler request failed: HTTP internal server error
SETI@home 17/05/2007 19:06:49 Scheduler request failed: server returned nothing (no headers, no data)


This usually means the scheduler request timed out because we are still overwhelmed.

Eric


@Eric
on my side it does not seems to work, i deleted the file and restarted boinc and jusut get the message no headers no data returned. Also i get new ghost units after switching to the opapp again.

After all i would thank you and the hole team for the hard work to get things up to normal working.

The problem with ghost units is different one. It's probably associated with using anonymous platform (you are using optimised app and have app_info.xml)


I see no difference between the two problems, i get ghosts with the app from lunatics and i get the message "no header no data" from the server. if i use the original app i get work and no error messages. i think i will wait until things get better.

Eric Korpela
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 3 Apr 99
Posts: 1085
Credit: 8,267,436
RAC: 7,537
United States
Message 570612 - Posted: 18 May 2007, 20:46:56 UTC - in response to Message 570506.


Eric: Any chance this can be fixed on Berkeley's end? I have some remote mahcines that I can not get to.


I haven't come up with a way yet. But I'm still thinking....

Eric
____________

Profile Teratoma [SETI.USA]
Avatar
Send message
Joined: 30 Mar 00
Posts: 16
Credit: 2,200,914
RAC: 0
United States
Message 570631 - Posted: 18 May 2007, 21:35:09 UTC

All of these deleting files ideas are great. Restarting Boinc, good advice. I've done it about 6 or 7 times today.

The problem is that if you can't reach the project none of these fixes work.

I get a lot of "Scheduler request failed: server returned nothing (no headers, no data)"

And some "Scheduler request failed: HTTP internal server error"

But neither is consistent. Sometime I can upload and sometimes I can report. I cannot get new work no matter what I do.

Now I get "Scheduler request failed: failed sending data to the peer"

So, If I can't reach the project 9 out of 10 attempts, and I cannot get work on that 1 attempt, what am I going to do. I suppose that when I do (not if) run out of work, I can detach or uninstall Boinc and start over. However, with each detach or uninstall, the probability of me returning to this project keeps reducing.

I know everyone is working hard, but...it shouldn't be this difficult for us to participate. People will leave and some may never return.
____________

..

Profile Crunch3r
Volunteer tester
Avatar
Send message
Joined: 15 Apr 99
Posts: 1540
Credit: 3,314,460
RAC: 0
Germany
Message 570637 - Posted: 18 May 2007, 21:40:41 UTC - in response to Message 570612.
Last modified: 18 May 2007, 21:41:40 UTC


Eric: Any chance this can be fixed on Berkeley's end? I have some remote mahcines that I can not get to.


I haven't come up with a way yet. But I'm still thinking....

Eric


While we're talking about remote machines. :)

I got the same problem too. 3 of my machines are not accessible atm (nor vpn or anything else).

Is it possible to initialize a reset on those machines from the user account page ?
(like a reset send from the project ?)

And if so could this be implemented ?










____________

Join BOINC United now!
Auto eVB | Autoversicherung

zombie67 [MM]
Volunteer tester
Avatar
Send message
Joined: 22 Apr 04
Posts: 753
Credit: 15,891,563
RAC: 195
United States
Message 570639 - Posted: 18 May 2007, 21:43:58 UTC - in response to Message 570608.

I see no difference between the two problems, i get ghosts with the app from lunatics and i get the message "no header no data" from the server. if i use the original app i get work and no error messages. i think i will wait until things get better.


Issue #1: SETI@home 17/05/2007 18:18:18 Message from server: Incomplete request received.

This error is caused by a corrupt "sched_request_setiathome.berkeley.edu.xml" file. Fixed by quitting/restarting BOINC, or quitting BOINC, deleting the file, restarting BOINC.


Issue #2: Cannot download new work & ghost results created. This can be fixed by renaming app_info.xml to something else. More detailed instructions here:

http://setiathome.berkeley.edu/forum_thread.php?id=39531&nowrap=true#570170


Issue #3: Other misc. error messages when trying to connect to S@H servers. Caused by heavily loaded servers. Ignore, will fix itself over time as everyone catches up.

____________
Dublin, CA

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8374
Credit: 46,568,823
RAC: 14,320
United Kingdom
Message 570640 - Posted: 18 May 2007, 21:45:07 UTC - in response to Message 570631.
Last modified: 18 May 2007, 21:46:49 UTC

All of these deleting files ideas are great. Restarting Boinc, good advice. I've done it about 6 or 7 times today.

The problem is that if you can't reach the project none of these fixes work.

I get a lot of "Scheduler request failed: server returned nothing (no headers, no data)"

And some "Scheduler request failed: HTTP internal server error"

But neither is consistent. Sometime I can upload and sometimes I can report. I cannot get new work no matter what I do.

Now I get "Scheduler request failed: failed sending data to the peer"

So, If I can't reach the project 9 out of 10 attempts, and I cannot get work on that 1 attempt, what am I going to do. I suppose that when I do (not if) run out of work, I can detach or uninstall Boinc and start over. However, with each detach or uninstall, the probability of me returning to this project keeps reducing.

I know everyone is working hard, but...it shouldn't be this difficult for us to participate. People will leave and some may never return.

"Scheduler request failed: server returned nothing (no headers, no data)" - congestion
"Scheduler request failed: failed sending data to the peer" - congestion
"Scheduler request failed: HTTP internal server error" - you are running an optimised app, and the scheduler is broken.

[probably - your computers are hidden, which makes helpful troubleshooting next to impossible. But your signature banner tends to imply an optimiser]

Look at Number Crunching, and the 'Ghosts' thread - your solution is there.

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13541
Credit: 29,298,314
RAC: 14,961
United States
Message 570641 - Posted: 18 May 2007, 21:45:33 UTC

Thanks Eric! I've fixed all my systems using optimized apps with your suggestion. I'm back up and crunching on all machines now. No errors and no failed connections.
____________

Profile Blurf
Volunteer tester
Send message
Joined: 2 Sep 06
Posts: 7397
Credit: 6,472,298
RAC: 5,268
United States
Message 570689 - Posted: 18 May 2007, 22:57:18 UTC

Eric-

With all your hard work lately (and Matt's before his vacation) and with all due respect, maybe you need to call in some outside help as this simply isn't getting resolved???

Has this issue outgrown your skills and the Calvary needs to be called in??

How can we get you more IMMEDIATE assistance (besides $ and hardware)?
____________


Profile Keith Myers
Volunteer tester
Avatar
Send message
Joined: 29 Apr 01
Posts: 169
Credit: 62,517,132
RAC: 19,375
United States
Message 570704 - Posted: 18 May 2007, 23:23:25 UTC - in response to Message 570640.

All of these deleting files ideas are great. Restarting Boinc, good advice. I've done it about 6 or 7 times today.

The problem is that if you can't reach the project none of these fixes work.

I get a lot of "Scheduler request failed: server returned nothing (no headers, no data)"

And some "Scheduler request failed: HTTP internal server error"

But neither is consistent. Sometime I can upload and sometimes I can report. I cannot get new work no matter what I do.

Now I get "Scheduler request failed: failed sending data to the peer"

So, If I can't reach the project 9 out of 10 attempts, and I cannot get work on that 1 attempt, what am I going to do. I suppose that when I do (not if) run out of work, I can detach or uninstall Boinc and start over. However, with each detach or uninstall, the probability of me returning to this project keeps reducing.

I know everyone is working hard, but...it shouldn't be this difficult for us to participate. People will leave and some may never return.

"Scheduler request failed: server returned nothing (no headers, no data)" - congestion
"Scheduler request failed: failed sending data to the peer" - congestion
"Scheduler request failed: HTTP internal server error" - you are running an optimised app, and the scheduler is broken.

[probably - your computers are hidden, which makes helpful troubleshooting next to impossible. But your signature banner tends to imply an optimiser]

Look at Number Crunching, and the 'Ghosts' thread - your solution is there.



I have tried all the fixes suggested in the "Ghosts" thread. I have tried the scheduler file deletion to no avail. I am still getting the "Scheduler request failed: server returned nothing (no headers, no data)" message and also the:

"Scheduler request failed: HTTP internal server error" message. I am running the only app we have for our operating system. It is not optimized.

I have started and stopped BOINC multiple times and requested more work and updated the client with no sign of any new work. The online status of my two workstations shows 125 WU's "IN Progress" with no actual WU on either of my workstations present.

How do I fix this? What steps need to be taken so I can continue working for SETI?


SETI is the only project that we have a client application for. So doing work for other projects is impossible. I have been out of work now for two weeks.


Thanks in advance, Keith
____________

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

Message boards : SETI@home Staff Blog : Eric's biannual post #6: You can tuna fish, but you can't tune a TCP

Copyright © 2014 University of California