Blip (Jun 21 2007)

Message boards : Technical News : Blip (Jun 21 2007)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 590095 - Posted: 21 Jun 2007, 23:27:47 UTC

At the end of the day yesterday a simple cut-and-paste misinterpreted by a terminal window introduced an extra line feed to the /etc/exports file on our Network Appliance filer (which hosts our home accounts, web sites, /usr/local, etc.) which rendered its root (/) mount read-only. Of course, you need read-write access to update the exports file. This was a bit of a conundrum, with the added pressure of "mount rot" quickly creeping through our network and slowing machines to a crawl (hence the minor outage which very few seemed to notice). This sent me, Jeff, and Eric into a fit of head scratching, with Eric finally discovering that, even though we couldn't re-export "/" on the simple filer command line, we could freshly export "/." with read-write access to a machine that hadn't quite hung up yet, and fix the offending file. After some reboots to clean the pipes we were back to normal.

I think I fixed the weird "top computers" sorting problems. I believe somebody else made an update trying to optimize it during our recent database panic without realizing it broke the sort logic. Fair enough.

Other than that, Jeff and I worked to get the new server "bane" on line. Yup, we continue to stick with the darth naming convention for now. We made it a third public web server for a second there to test the plumbing, but took it back offline for now. We need to tighten some screws before making it a real production web server.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 590095 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 590152 - Posted: 22 Jun 2007, 1:17:02 UTC
Last modified: 22 Jun 2007, 1:17:28 UTC

Matt , thank you very much for the update !
Kind Regards
Byron
ID: 590152 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 590186 - Posted: 22 Jun 2007, 3:23:16 UTC

Thanks for the update Matt. :-)

Jeremy
ID: 590186 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 590215 - Posted: 22 Jun 2007, 4:17:23 UTC


One More for Berkeley . . . Thanks to Each of You . . . You shall be rewarded

BOINC Wiki . . .

Science Status Page . . .
ID: 590215 · Report as offensive
Profile Bill Bryan

Send message
Joined: 14 May 99
Posts: 21
Credit: 9,164,019
RAC: 11
United States
Message 590216 - Posted: 22 Jun 2007, 4:18:20 UTC

While most of the time I have no idea what is being discussed here, I appreciate having the information made available. My hearty thanks to those who keep things up-and-running.
ID: 590216 · Report as offensive
Profile Stealth Eagle*
Volunteer tester
Avatar

Send message
Joined: 7 Sep 00
Posts: 5971
Credit: 367,640
RAC: 0
United States
Message 590219 - Posted: 22 Jun 2007, 4:23:45 UTC

Matt, Thank you for the continuing updates they are most appreciated.
RK




What you do today you will have to live with tonight
ID: 590219 · Report as offensive
Profile Pilot
Avatar

Send message
Joined: 18 May 99
Posts: 534
Credit: 5,475,482
RAC: 0
Message 590364 - Posted: 22 Jun 2007, 14:01:49 UTC - in response to Message 590095.  

At the end of the day yesterday a simple cut-and-paste misinterpreted by a terminal window introduced an extra line feed to the /etc/exports file on our Network Appliance filer (which hosts our home accounts, web sites, /usr/local, etc.) which rendered its root (/) mount read-only. Of course, you need read-write access to update the exports file. This was a bit of a conundrum, with the added pressure of "mount rot" quickly creeping through our network and slowing machines to a crawl (hence the minor outage which very few seemed to notice). This sent me, Jeff, and Eric into a fit of head scratching, with Eric finally discovering that, even though we couldn't re-export "/" on the simple filer command line, we could freshly export "/." with read-write access to a machine that hadn't quite hung up yet, and fix the offending file. After some reboots to clean the pipes we were back to normal.

I think I fixed the weird "top computers" sorting problems. I believe somebody else made an update trying to optimize it during our recent database panic without realizing it broke the sort logic. Fair enough.

Other than that, Jeff and I worked to get the new server "bane" on line. Yup, we continue to stick with the darth naming convention for now. We made it a third public web server for a second there to test the plumbing, but took it back offline for now. We need to tighten some screws before making it a real production web server.

- Matt


Friday again.
The sort fix for top computers that you made indeed worked for a while, but it didn't stick.
Have a restful weekend and remember that your goals in life have more patience than you do.
It seems to be broken again
When we finally figure it all out, all the rules will change and we can start all over again.
ID: 590364 · Report as offensive
Profile Sterling_Aug
Avatar

Send message
Joined: 27 Sep 02
Posts: 54
Credit: 14,105,725
RAC: 0
United States
Message 590382 - Posted: 22 Jun 2007, 14:52:18 UTC - in response to Message 590364.  


The sort fix for top computers that you made indeed worked for a while, but it didn't stick.
Have a restful weekend and remember that your goals in life have more patience than you do.
It seems to be broken again


Yes, the blip is back! LOL

ID: 590382 · Report as offensive
Profile Kenn Benoît-Hutchins
Volunteer tester
Avatar

Send message
Joined: 24 Aug 99
Posts: 46
Credit: 18,091,320
RAC: 31
Canada
Message 591002 - Posted: 23 Jun 2007, 8:12:38 UTC - in response to Message 590095.  

I have noticed since the download of the 5.10.7 that I have quite a few 'aborted' WUs; seventy (70) of them as a matter of interest. As well when I do an update I have noticed that two things happen. When the initial update is done there is an 'http error' then the seconds later a second update is done, successfully as per quoted text, save for those 'aborted by project'.

"Fri 22 Jun 23:01:15 2007|SETI@home|Sending scheduler request: To report completed tasks
Fri 22 Jun 23:01:15 2007|SETI@home|Reporting 15 tasks
Fri 22 Jun 23:01:20 2007|SETI@home|Scheduler request failed: HTTP file not found
Fri 22 Jun 23:01:20 2007|SETI@home|Sending scheduler request: To report completed tasks
Fri 22 Jun 23:01:20 2007|SETI@home|Reporting 15 tasks
Fri 22 Jun 23:01:25 2007|SETI@home|Scheduler RPC succeeded [server version 509]"


Any ideas?

Kenn

Kenn

What is left unsaid is neither heard, nor heeded.
Ce qui est laissé inexprimé ni n'est entendu, ni est observé.
ID: 591002 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 591017 - Posted: 23 Jun 2007, 9:04:16 UTC - in response to Message 591002.  

I have noticed since the download of the 5.10.7 that I have quite a few 'aborted' WUs; seventy (70) of them as a matter of interest. As well when I do an update I have noticed that two things happen. When the initial update is done there is an 'http error' then the seconds later a second update is done, successfully as per quoted text, save for those 'aborted by project'.

"Fri 22 Jun 23:01:15 2007|SETI@home|Sending scheduler request: To report completed tasks
Fri 22 Jun 23:01:15 2007|SETI@home|Reporting 15 tasks
Fri 22 Jun 23:01:20 2007|SETI@home|Scheduler request failed: HTTP file not found
Fri 22 Jun 23:01:20 2007|SETI@home|Sending scheduler request: To report completed tasks
Fri 22 Jun 23:01:20 2007|SETI@home|Reporting 15 tasks
Fri 22 Jun 23:01:25 2007|SETI@home|Scheduler RPC succeeded [server version 509]"


Any ideas?

Kenn

Why you get the HTTP error, I do not know. But assuming you are on Broadband, always on. If you set the connection interval to 0, and use the 'Maintain enough work for an additional x days' as your cache setting. The results will report immediately, saving the need to update.

The aborted results are because validation is already complete on that WU. To decrease the number of these that you get, then you would have to decrease your cache. At 0.5 days I've only had one in the last 48 hrs.

Andy
ID: 591017 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 591023 - Posted: 23 Jun 2007, 9:23:13 UTC - in response to Message 591017.  


Why you get the HTTP error, I do not know. . . .
Andy

I also saw that HTTP error a little while ago on two machines. I thought it was because I was upgrading them to BOINC 5.10.7, but it seems to have stopped now. ???
ID: 591023 · Report as offensive
Profile Kenn Benoît-Hutchins
Volunteer tester
Avatar

Send message
Joined: 24 Aug 99
Posts: 46
Credit: 18,091,320
RAC: 31
Canada
Message 591327 - Posted: 23 Jun 2007, 19:58:39 UTC - in response to Message 591017.  


Why you get the HTTP error, I do not know. But assuming you are on Broadband, always on. If you set the connection interval to 0, and use the 'Maintain enough work for an additional x days' as your cache setting. The results will report immediately, saving the need to update.

The aborted results are because validation is already complete on that WU. To decrease the number of these that you get, then you would have to decrease your cache. At 0.5 days I've only had one in the last 48 hrs.

Andy


Thanks, I'll amend my preferences


Kenn

What is left unsaid is neither heard, nor heeded.
Ce qui est laissé inexprimé ni n'est entendu, ni est observé.
ID: 591327 · Report as offensive
Profile Pilot
Avatar

Send message
Joined: 18 May 99
Posts: 534
Credit: 5,475,482
RAC: 0
Message 591364 - Posted: 23 Jun 2007, 21:32:08 UTC - in response to Message 591023.  


Why you get the HTTP error, I do not know. . . .
Andy

I also saw that HTTP error a little while ago on two machines. I thought it was because I was upgrading them to BOINC 5.10.7, but it seems to have stopped now. ???

I was getting that occasionally untill I upgraded to 5.10.7.
I have not seen it since.
Currious eh?
When we finally figure it all out, all the rules will change and we can start all over again.
ID: 591364 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 591472 - Posted: 24 Jun 2007, 0:27:53 UTC - in response to Message 590095.  

At the end of the day yesterday a simple cut-and-paste misinterpreted by a terminal window introduced an extra line feed to the /etc/exports file on our Network Appliance filer (which hosts our home accounts, web sites, /usr/local, etc.) which rendered its root (/) mount read-only. Of course, you need read-write access to update the exports file. This was a bit of a conundrum, with the added pressure of "mount rot" quickly creeping through our network and slowing machines to a crawl (hence the minor outage which very few seemed to notice). This sent me, Jeff, and Eric into a fit of head scratching, with Eric finally discovering that, even though we couldn't re-export "/" on the simple filer command line, we could freshly export "/." with read-write access to a machine that hadn't quite hung up yet, and fix the offending file. After some reboots to clean the pipes we were back to normal.

I think I fixed the weird "top computers" sorting problems. I believe somebody else made an update trying to optimize it during our recent database panic without realizing it broke the sort logic. Fair enough.

Other than that, Jeff and I worked to get the new server "bane" on line. Yup, we continue to stick with the darth naming convention for now. We made it a third public web server for a second there to test the plumbing, but took it back offline for now. We need to tighten some screws before making it a real production web server.

- Matt

Bane, Eh?
Someone has been reading comic books at one time. ;) Another Villain, Ok.
http://en.wikipedia.org/wiki/Bane_(comics)
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 591472 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 591529 - Posted: 24 Jun 2007, 2:41:58 UTC - in response to Message 591364.  


Why you get the HTTP error, I do not know. . . .
Andy

I also saw that HTTP error a little while ago on two machines. I thought it was because I was upgrading them to BOINC 5.10.7, but it seems to have stopped now. ???

I was getting that occasionally untill I upgraded to 5.10.7.
I have not seen it since.
Curious eh?

Curious indeed. Exactly half of my machines are now getting it on every communication, but it ALWAYS works on the (immediate) retry. All but one are now on 5.10.7. All else is working so I guess now problem, for now.
ID: 591529 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 591927 - Posted: 24 Jun 2007, 16:13:45 UTC - in response to Message 591529.  


Why you get the HTTP error, I do not know. . . .
Andy

I also saw that HTTP error a little while ago on two machines. I thought it was because I was upgrading them to BOINC 5.10.7, but it seems to have stopped now. ???

I was getting that occasionally untill I upgraded to 5.10.7.
I have not seen it since.
Curious eh?

Curious indeed. Exactly half of my machines are now getting it on every communication, but it ALWAYS works on the (immediate) retry. All but one are now on 5.10.7. All else is working so I guess now problem, for now.


I'm getting this same problem with both 5.8.15 and 5.4.11 - so I think the problem is server-side, not client-side. (I.E. Berkeley's the one with the problem)
.

Hello, from Albany, CA!...
ID: 591927 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 591962 - Posted: 24 Jun 2007, 17:27:25 UTC - in response to Message 591927.  


Why you get the HTTP error, I do not know. . . .
Andy

I also saw that HTTP error a little while ago on two machines. I thought it was because I was upgrading them to BOINC 5.10.7, but it seems to have stopped now. ???

I was getting that occasionally untill I upgraded to 5.10.7.
I have not seen it since.
Curious eh?

Curious indeed. Exactly half of my machines are now getting it on every communication, but it ALWAYS works on the (immediate) retry. All but one are now on 5.10.7. All else is working so I guess now problem, for now.


I'm getting this same problem with both 5.8.15 and 5.4.11 - so I think the problem is server-side, not client-side. (I.E. Berkeley's the one with the problem)

My thoughts as well.
BTW, I meant to say ". . . no problem, for now" in my previous post.
ID: 591962 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 592013 - Posted: 24 Jun 2007, 18:30:43 UTC - in response to Message 591472.  

At the end of the day yesterday a simple cut-and-paste misinterpreted by a terminal window introduced an extra line feed to the /etc/exports file on our Network Appliance filer (which hosts our home accounts, web sites, /usr/local, etc.) which rendered its root (/) mount read-only. Of course, you need read-write access to update the exports file. This was a bit of a conundrum, with the added pressure of "mount rot" quickly creeping through our network and slowing machines to a crawl (hence the minor outage which very few seemed to notice). This sent me, Jeff, and Eric into a fit of head scratching, with Eric finally discovering that, even though we couldn't re-export "/" on the simple filer command line, we could freshly export "/." with read-write access to a machine that hadn't quite hung up yet, and fix the offending file. After some reboots to clean the pipes we were back to normal.

I think I fixed the weird "top computers" sorting problems. I believe somebody else made an update trying to optimize it during our recent database panic without realizing it broke the sort logic. Fair enough.

Other than that, Jeff and I worked to get the new server "bane" on line. Yup, we continue to stick with the darth naming convention for now. We made it a third public web server for a second there to test the plumbing, but took it back offline for now. We need to tighten some screws before making it a real production web server.

- Matt

Bane, Eh?
Someone has been reading comic books at one time. ;) Another Villain, Ok.
http://en.wikipedia.org/wiki/Bane_(comics)

Wrong universe. http://en.wikipedia.org/wiki/Darth_Bane
ID: 592013 · Report as offensive

Message boards : Technical News : Blip (Jun 21 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.