Blip (Jun 21 2007)


log in

Advanced search

Message boards : Technical News : Blip (Jun 21 2007)

Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 590095 - Posted: 21 Jun 2007, 23:27:47 UTC

At the end of the day yesterday a simple cut-and-paste misinterpreted by a terminal window introduced an extra line feed to the /etc/exports file on our Network Appliance filer (which hosts our home accounts, web sites, /usr/local, etc.) which rendered its root (/) mount read-only. Of course, you need read-write access to update the exports file. This was a bit of a conundrum, with the added pressure of "mount rot" quickly creeping through our network and slowing machines to a crawl (hence the minor outage which very few seemed to notice). This sent me, Jeff, and Eric into a fit of head scratching, with Eric finally discovering that, even though we couldn't re-export "/" on the simple filer command line, we could freshly export "/." with read-write access to a machine that hadn't quite hung up yet, and fix the offending file. After some reboots to clean the pipes we were back to normal.

I think I fixed the weird "top computers" sorting problems. I believe somebody else made an update trying to optimize it during our recent database panic without realizing it broke the sort logic. Fair enough.

Other than that, Jeff and I worked to get the new server "bane" on line. Yup, we continue to stick with the darth naming convention for now. We made it a third public web server for a second there to test the plumbing, but took it back offline for now. We need to tighten some screws before making it a real production web server.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar
Send message
Joined: 5 Jul 99
Posts: 3501
Credit: 11,813,565
RAC: 457
Canada
Message 590152 - Posted: 22 Jun 2007, 1:17:02 UTC
Last modified: 22 Jun 2007, 1:17:28 UTC

Matt , thank you very much for the update !
Kind Regards
Byron

KB7RZF
Volunteer tester
Avatar
Send message
Joined: 15 Aug 99
Posts: 9463
Credit: 3,033,624
RAC: 2,019
United States
Message 590186 - Posted: 22 Jun 2007, 3:23:16 UTC

Thanks for the update Matt. :-)

Jeremy
____________

Profile Dr. C.E.T.I.
Avatar
Send message
Joined: 29 Feb 00
Posts: 15993
Credit: 690,597
RAC: 10
United States
Message 590215 - Posted: 22 Jun 2007, 4:17:23 UTC


One More for Berkeley . . . Thanks to Each of You . . . You shall be rewarded

____________
BOINC Wiki . . .

Science Status Page . . .

Profile Bill Bryan
Send message
Joined: 14 May 99
Posts: 21
Credit: 3,195,305
RAC: 1,304
United States
Message 590216 - Posted: 22 Jun 2007, 4:18:20 UTC

While most of the time I have no idea what is being discussed here, I appreciate having the information made available. My hearty thanks to those who keep things up-and-running.
____________

Profile Stealth Eagle*
Volunteer tester
Avatar
Send message
Joined: 7 Sep 00
Posts: 5971
Credit: 156,685
RAC: 0
United States
Message 590219 - Posted: 22 Jun 2007, 4:23:45 UTC

Matt, Thank you for the continuing updates they are most appreciated.
RK
____________




What you do today you will have to live with tonight

Profile Pilot
Avatar
Send message
Joined: 18 May 99
Posts: 534
Credit: 5,475,482
RAC: 0
Message 590364 - Posted: 22 Jun 2007, 14:01:49 UTC - in response to Message 590095.

At the end of the day yesterday a simple cut-and-paste misinterpreted by a terminal window introduced an extra line feed to the /etc/exports file on our Network Appliance filer (which hosts our home accounts, web sites, /usr/local, etc.) which rendered its root (/) mount read-only. Of course, you need read-write access to update the exports file. This was a bit of a conundrum, with the added pressure of "mount rot" quickly creeping through our network and slowing machines to a crawl (hence the minor outage which very few seemed to notice). This sent me, Jeff, and Eric into a fit of head scratching, with Eric finally discovering that, even though we couldn't re-export "/" on the simple filer command line, we could freshly export "/." with read-write access to a machine that hadn't quite hung up yet, and fix the offending file. After some reboots to clean the pipes we were back to normal.

I think I fixed the weird "top computers" sorting problems. I believe somebody else made an update trying to optimize it during our recent database panic without realizing it broke the sort logic. Fair enough.

Other than that, Jeff and I worked to get the new server "bane" on line. Yup, we continue to stick with the darth naming convention for now. We made it a third public web server for a second there to test the plumbing, but took it back offline for now. We need to tighten some screws before making it a real production web server.

- Matt


Friday again.
The sort fix for top computers that you made indeed worked for a while, but it didn't stick.
Have a restful weekend and remember that your goals in life have more patience than you do.
It seems to be broken again
____________
When we finally figure it all out, all the rules will change and we can start all over again.

Profile Sterling_Aug
Avatar
Send message
Joined: 27 Sep 02
Posts: 54
Credit: 14,105,725
RAC: 0
United States
Message 590382 - Posted: 22 Jun 2007, 14:52:18 UTC - in response to Message 590364.


The sort fix for top computers that you made indeed worked for a while, but it didn't stick.
Have a restful weekend and remember that your goals in life have more patience than you do.
It seems to be broken again


Yes, the blip is back! LOL

____________

Profile Kenn Hutchins
Volunteer tester
Avatar
Send message
Joined: 24 Aug 99
Posts: 43
Credit: 6,875,893
RAC: 3,667
Canada
Message 591002 - Posted: 23 Jun 2007, 8:12:38 UTC - in response to Message 590095.

I have noticed since the download of the 5.10.7 that I have quite a few 'aborted' WUs; seventy (70) of them as a matter of interest. As well when I do an update I have noticed that two things happen. When the initial update is done there is an 'http error' then the seconds later a second update is done, successfully as per quoted text, save for those 'aborted by project'.

"Fri 22 Jun 23:01:15 2007|SETI@home|Sending scheduler request: To report completed tasks
Fri 22 Jun 23:01:15 2007|SETI@home|Reporting 15 tasks
Fri 22 Jun 23:01:20 2007|SETI@home|Scheduler request failed: HTTP file not found
Fri 22 Jun 23:01:20 2007|SETI@home|Sending scheduler request: To report completed tasks
Fri 22 Jun 23:01:20 2007|SETI@home|Reporting 15 tasks
Fri 22 Jun 23:01:25 2007|SETI@home|Scheduler RPC succeeded [server version 509]"


Any ideas?

Kenn

____________
Kenn

What is left unsaid is neither heard, nor heeded.
Ce qui est laissé inexprimé ni n'est entendu, ni est observé.

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8503
Credit: 23,089,038
RAC: 15,742
United Kingdom
Message 591017 - Posted: 23 Jun 2007, 9:04:16 UTC - in response to Message 591002.

I have noticed since the download of the 5.10.7 that I have quite a few 'aborted' WUs; seventy (70) of them as a matter of interest. As well when I do an update I have noticed that two things happen. When the initial update is done there is an 'http error' then the seconds later a second update is done, successfully as per quoted text, save for those 'aborted by project'.

"Fri 22 Jun 23:01:15 2007|SETI@home|Sending scheduler request: To report completed tasks
Fri 22 Jun 23:01:15 2007|SETI@home|Reporting 15 tasks
Fri 22 Jun 23:01:20 2007|SETI@home|Scheduler request failed: HTTP file not found
Fri 22 Jun 23:01:20 2007|SETI@home|Sending scheduler request: To report completed tasks
Fri 22 Jun 23:01:20 2007|SETI@home|Reporting 15 tasks
Fri 22 Jun 23:01:25 2007|SETI@home|Scheduler RPC succeeded [server version 509]"


Any ideas?

Kenn

Why you get the HTTP error, I do not know. But assuming you are on Broadband, always on. If you set the connection interval to 0, and use the 'Maintain enough work for an additional x days' as your cache setting. The results will report immediately, saving the need to update.

The aborted results are because validation is already complete on that WU. To decrease the number of these that you get, then you would have to decrease your cache. At 0.5 days I've only had one in the last 48 hrs.

Andy

gomeyer
Volunteer tester
Send message
Joined: 21 May 99
Posts: 488
Credit: 50,157,953
RAC: 0
United States
Message 591023 - Posted: 23 Jun 2007, 9:23:13 UTC - in response to Message 591017.


Why you get the HTTP error, I do not know. . . .
Andy

I also saw that HTTP error a little while ago on two machines. I thought it was because I was upgrading them to BOINC 5.10.7, but it seems to have stopped now. ???

Profile Kenn Hutchins
Volunteer tester
Avatar
Send message
Joined: 24 Aug 99
Posts: 43
Credit: 6,875,893
RAC: 3,667
Canada
Message 591327 - Posted: 23 Jun 2007, 19:58:39 UTC - in response to Message 591017.


Why you get the HTTP error, I do not know. But assuming you are on Broadband, always on. If you set the connection interval to 0, and use the 'Maintain enough work for an additional x days' as your cache setting. The results will report immediately, saving the need to update.

The aborted results are because validation is already complete on that WU. To decrease the number of these that you get, then you would have to decrease your cache. At 0.5 days I've only had one in the last 48 hrs.

Andy


Thanks, I'll amend my preferences


____________
Kenn

What is left unsaid is neither heard, nor heeded.
Ce qui est laissé inexprimé ni n'est entendu, ni est observé.

Profile Pilot
Avatar
Send message
Joined: 18 May 99
Posts: 534
Credit: 5,475,482
RAC: 0
Message 591364 - Posted: 23 Jun 2007, 21:32:08 UTC - in response to Message 591023.


Why you get the HTTP error, I do not know. . . .
Andy

I also saw that HTTP error a little while ago on two machines. I thought it was because I was upgrading them to BOINC 5.10.7, but it seems to have stopped now. ???

I was getting that occasionally untill I upgraded to 5.10.7.
I have not seen it since.
Currious eh?
____________
When we finally figure it all out, all the rules will change and we can start all over again.

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 45784
Credit: 36,410,508
RAC: 7,432
Message 591472 - Posted: 24 Jun 2007, 0:27:53 UTC - in response to Message 590095.

At the end of the day yesterday a simple cut-and-paste misinterpreted by a terminal window introduced an extra line feed to the /etc/exports file on our Network Appliance filer (which hosts our home accounts, web sites, /usr/local, etc.) which rendered its root (/) mount read-only. Of course, you need read-write access to update the exports file. This was a bit of a conundrum, with the added pressure of "mount rot" quickly creeping through our network and slowing machines to a crawl (hence the minor outage which very few seemed to notice). This sent me, Jeff, and Eric into a fit of head scratching, with Eric finally discovering that, even though we couldn't re-export "/" on the simple filer command line, we could freshly export "/." with read-write access to a machine that hadn't quite hung up yet, and fix the offending file. After some reboots to clean the pipes we were back to normal.

I think I fixed the weird "top computers" sorting problems. I believe somebody else made an update trying to optimize it during our recent database panic without realizing it broke the sort logic. Fair enough.

Other than that, Jeff and I worked to get the new server "bane" on line. Yup, we continue to stick with the darth naming convention for now. We made it a third public web server for a second there to test the plumbing, but took it back offline for now. We need to tighten some screws before making it a real production web server.

- Matt

Bane, Eh?
Someone has been reading comic books at one time. ;) Another Villain, Ok.
http://en.wikipedia.org/wiki/Bane_(comics)
____________

gomeyer
Volunteer tester
Send message
Joined: 21 May 99
Posts: 488
Credit: 50,157,953
RAC: 0
United States
Message 591529 - Posted: 24 Jun 2007, 2:41:58 UTC - in response to Message 591364.


Why you get the HTTP error, I do not know. . . .
Andy

I also saw that HTTP error a little while ago on two machines. I thought it was because I was upgrading them to BOINC 5.10.7, but it seems to have stopped now. ???

I was getting that occasionally untill I upgraded to 5.10.7.
I have not seen it since.
Curious eh?

Curious indeed. Exactly half of my machines are now getting it on every communication, but it ALWAYS works on the (immediate) retry. All but one are now on 5.10.7. All else is working so I guess now problem, for now.

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 1897
Credit: 9,170,635
RAC: 12,405
United States
Message 591927 - Posted: 24 Jun 2007, 16:13:45 UTC - in response to Message 591529.


Why you get the HTTP error, I do not know. . . .
Andy

I also saw that HTTP error a little while ago on two machines. I thought it was because I was upgrading them to BOINC 5.10.7, but it seems to have stopped now. ???

I was getting that occasionally untill I upgraded to 5.10.7.
I have not seen it since.
Curious eh?

Curious indeed. Exactly half of my machines are now getting it on every communication, but it ALWAYS works on the (immediate) retry. All but one are now on 5.10.7. All else is working so I guess now problem, for now.


I'm getting this same problem with both 5.8.15 and 5.4.11 - so I think the problem is server-side, not client-side. (I.E. Berkeley's the one with the problem)
____________
.

gomeyer
Volunteer tester
Send message
Joined: 21 May 99
Posts: 488
Credit: 50,157,953
RAC: 0
United States
Message 591962 - Posted: 24 Jun 2007, 17:27:25 UTC - in response to Message 591927.


Why you get the HTTP error, I do not know. . . .
Andy

I also saw that HTTP error a little while ago on two machines. I thought it was because I was upgrading them to BOINC 5.10.7, but it seems to have stopped now. ???

I was getting that occasionally untill I upgraded to 5.10.7.
I have not seen it since.
Curious eh?

Curious indeed. Exactly half of my machines are now getting it on every communication, but it ALWAYS works on the (immediate) retry. All but one are now on 5.10.7. All else is working so I guess now problem, for now.


I'm getting this same problem with both 5.8.15 and 5.4.11 - so I think the problem is server-side, not client-side. (I.E. Berkeley's the one with the problem)

My thoughts as well.
BTW, I meant to say ". . . no problem, for now" in my previous post.

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 592013 - Posted: 24 Jun 2007, 18:30:43 UTC - in response to Message 591472.

At the end of the day yesterday a simple cut-and-paste misinterpreted by a terminal window introduced an extra line feed to the /etc/exports file on our Network Appliance filer (which hosts our home accounts, web sites, /usr/local, etc.) which rendered its root (/) mount read-only. Of course, you need read-write access to update the exports file. This was a bit of a conundrum, with the added pressure of "mount rot" quickly creeping through our network and slowing machines to a crawl (hence the minor outage which very few seemed to notice). This sent me, Jeff, and Eric into a fit of head scratching, with Eric finally discovering that, even though we couldn't re-export "/" on the simple filer command line, we could freshly export "/." with read-write access to a machine that hadn't quite hung up yet, and fix the offending file. After some reboots to clean the pipes we were back to normal.

I think I fixed the weird "top computers" sorting problems. I believe somebody else made an update trying to optimize it during our recent database panic without realizing it broke the sort logic. Fair enough.

Other than that, Jeff and I worked to get the new server "bane" on line. Yup, we continue to stick with the darth naming convention for now. We made it a third public web server for a second there to test the plumbing, but took it back offline for now. We need to tighten some screws before making it a real production web server.

- Matt

Bane, Eh?
Someone has been reading comic books at one time. ;) Another Villain, Ok.
http://en.wikipedia.org/wiki/Bane_(comics)

Wrong universe. http://en.wikipedia.org/wiki/Darth_Bane
____________

Message boards : Technical News : Blip (Jun 21 2007)

Copyright © 2014 University of California