Marvin crashed

Message boards : News : Marvin crashed
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Eric KorpelaProject Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1231
Credit: 19,566,080
RAC: 11,681
United States
Message 1616871 - Posted: 21 Dec 2014, 4:46:29 UTC

It appears that the root partition filled on marvin (rapidly) while I was AFK, for no reason that I am aware of, which caused it to crash. Nobody is at the colocation facility right now, so the astropulse DB is down. I'll try to get remote accesses for a reboot, but chances are that marvin is down until monday morning.
@SETIEric

ID: 1616871 · Report as offensive
Profile ZalsterProject Donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 3813
Credit: 180,833,402
RAC: 596,809
United States
Message 1616874 - Posted: 21 Dec 2014, 5:05:40 UTC - in response to Message 1616871.  

Sorry to hear that.

Thanks for the update
ID: 1616874 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8560
Credit: 108,207,729
RAC: 74,576
Australia
Message 1616877 - Posted: 21 Dec 2014, 5:10:19 UTC - in response to Message 1616874.  
Last modified: 21 Dec 2014, 5:16:33 UTC

Many AP channels being split are also erroring out now.


EDIT- to add to that, MB splitter output has almost halved, so the ready-to-send buffer is falling & we should be out of work in the next 18-24 hours.
Grant
Darwin NT
ID: 1616877 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 14751
Credit: 4,164,992
RAC: 14,442
United Kingdom
Message 1616882 - Posted: 21 Dec 2014, 5:34:09 UTC

Blimey you folks are certainly have a run of bad luck :( Thanks for update Eric, fingers crossed it's an easy thing to figure out and solve.
ID: 1616882 · Report as offensive
Eric KorpelaProject Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1231
Credit: 19,566,080
RAC: 11,681
United States
Message 1616889 - Posted: 21 Dec 2014, 5:52:45 UTC - in response to Message 1616882.  

It's probably easy to fix. The likely problem is getting support from the colocation facility on the weekend outside of working hours. We don't pay the additional fees for 24/7 support (primarily because they aren't small).
@SETIEric

ID: 1616889 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 14751
Credit: 4,164,992
RAC: 14,442
United Kingdom
Message 1616892 - Posted: 21 Dec 2014, 6:16:09 UTC - in response to Message 1616889.  

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?
ID: 1616892 · Report as offensive
Profile Brent Norman
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1666
Credit: 65,760,029
RAC: 369,089
Canada
Message 1616894 - Posted: 21 Dec 2014, 6:30:26 UTC - in response to Message 1616877.  

Many AP channels being split are also erroring out now.


EDIT- to add to that, MB splitter output has almost halved, so the ready-to-send buffer is falling & we should be out of work in the next 18-24 hours.



That should be OK, most peple have a 7 day buffer.
ID: 1616894 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8560
Credit: 108,207,729
RAC: 74,576
Australia
Message 1616895 - Posted: 21 Dec 2014, 6:41:56 UTC - in response to Message 1616894.  

That should be OK, most peple have a 7 day buffer.

Depends on what you mean by most people.
Mine will last 8-12 hours, except for my slower machine which will have about 4 days work.
Others will run out in a couple of hours, or less.
Grant
Darwin NT
ID: 1616895 · Report as offensive
Profile Gary CharpentierCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 20452
Credit: 28,349,306
RAC: 30,252
United States
Message 1616896 - Posted: 21 Dec 2014, 6:59:13 UTC - in response to Message 1616871.  

It appears that the root partition filled on marvin (rapidly) while I was AFK, for no reason that I am aware of, which caused it to crash. Nobody is at the colocation facility right now, so the astropulse DB is down. I'll try to get remote accesses for a reboot, but chances are that marvin is down until monday morning.

Just be sure it didn't fill because it was filling log files with error messages to overflowing.
ID: 1616896 · Report as offensive
Tutankhamon
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6573
Credit: 40,718,807
RAC: 20,492
Sweden
Message 1616911 - Posted: 21 Dec 2014, 9:24:51 UTC - in response to Message 1616894.  
Last modified: 21 Dec 2014, 9:40:11 UTC

Many AP channels being split are also erroring out now.


EDIT- to add to that, MB splitter output has almost halved, so the ready-to-send buffer is falling & we should be out of work in the next 18-24 hours.



That should be OK, most peple have a 7 day buffer.

Heh, even with a full cache of 100 my GPU will run dry in about 1 day. I only managed to get half of that before Marvin crashed again.

A 7 day cache is impossible to build up for me, with the max 100 WU limit...
My GPU will once again go silent, in just a couple of hours.

Note: I'm not whining, just stating the facts. It's not the end of the world, or my world, if I run out of WU's :-)
ID: 1616911 · Report as offensive
Profile Ageless
Avatar

Send message
Joined: 9 Jun 99
Posts: 14130
Credit: 3,415,967
RAC: 1,689
Netherlands
Message 1616916 - Posted: 21 Dec 2014, 9:37:13 UTC - in response to Message 1616871.  

while I was AFK

Wait, why would you even be AK (at keyboard) on a Sunday?
Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!
ID: 1616916 · Report as offensive
Tutankhamon
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6573
Credit: 40,718,807
RAC: 20,492
Sweden
Message 1616917 - Posted: 21 Dec 2014, 9:42:12 UTC - in response to Message 1616916.  
Last modified: 21 Dec 2014, 9:57:58 UTC

while I was AFK

Wait, why would you even be AK (at keyboard) on a Sunday?

Maybe he's having itchy fingers, and need to scratch on something :-)

Apart from that, I see no reasons why Eric shouldn't just enjoy his weekends, and forget about the project.
ID: 1616917 · Report as offensive
Profile Chris SCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 19 Nov 00
Posts: 39607
Credit: 30,310,928
RAC: 51,203
United Kingdom
Message 1616919 - Posted: 21 Dec 2014, 10:13:32 UTC

Many thanks for the heads up and explanation Eric.

I see no reasons why Eric shouldn't just enjoy his weekends, and forget about the project.

I agree, but Eric is far too dedicated for that. In the past he has made round trips to the Lab from home on a Sunday, before they had remote boot power strips and the Co-Lo.

We don't pay the additional fees for 24/7 support (primarily because they aren't small).

Exactly, the money just isn't there for it, what little they have needs to be spent elsewhere. Right from the beginning in 1999, Seti never promised it would be a 24/7 project, but over the years it has virtually become that anyway, and people have got so used to it always being up, except for the Tuesday Outrage, that they just expect it now as a matter of course.

Blimey you folks are certainly having a run of bad luck :(

I know what you mean Zappy, but it isn't really bad luck, computers and software fall over for a reason. In this case Eric knows the cause, but not yet the reason why. But this is another database problem to manifest itself, after all the others. I asked Matt a few years ago if there was a maximum limit for an Informix database, he said he couldn't foresee Seti getting to that stage. But we have had certain initial setup parameters exceeded already.

Hopefully Eric will find the cause and fix it to ensure hopefully it doesn't happen again. Let's all hope that 2015 has a bit more up-time than recently.
ID: 1616919 · Report as offensive
Profile tullio
Volunteer moderator
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 6156
Credit: 1,457,473
RAC: 781
Italy
Message 1616922 - Posted: 21 Dec 2014, 10:44:55 UTC
Last modified: 21 Dec 2014, 10:45:13 UTC

I am running 7 BOINC projects, plus one not BOINC (CernVM_WebAPI). Most of them go on vacation for Xmas because developers and admins enjoy their vacations, but some of them (like Einstein@home) still give me work. So Merry Christmas to everybody.
Tullio
ID: 1616922 · Report as offensive
bluestar

Send message
Joined: 5 Sep 12
Posts: 2072
Credit: 1,908,947
RAC: 690
Message 1616932 - Posted: 21 Dec 2014, 12:00:15 UTC
Last modified: 21 Dec 2014, 12:03:55 UTC

It is nice to see that someone is bothering about this project at all.

Supposedly I too often end up here having other thoughts on my mind.

I paid a visit to Lunatics yesterday evening. Only had a quick look at their page.

Really I am under the impression that application development is a continuous process which never seems to end.

One may perhaps be asking whether or not such applications (including the special or proprietary ones), are able to detect a signal if it should be present.

For now we only are able to make assumptions on whether or not a signal ever was there by means of looking at the four result categories, as well as possible processing times of a given task, as well as autocorrelation, of course.

Back in 1977, Jerry R. Ehman probably was able to detect the Wow signal because the area was already known to be rich in radio sources. Whether or not the source behind this signal was stationary or not probably never will be known for sure, since it only was detected in one of two horns being used for this purpose at that time, namely the radio telescope belonging to the Ohio State University.

Sadly this facility is no more.

The whole thing was eventually torn down and was replaced by other facilities instead.

One part of history gone.

Instead being replaced by something else.
ID: 1616932 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 8951
Credit: 48,798,604
RAC: 35,135
United Kingdom
Message 1616968 - Posted: 21 Dec 2014, 15:00:57 UTC

It is nice to see that someone is bothering about this project at all.


Over the last few weeks we have been kept up to date with all the problems. We also know that Matt and Jeff have been involved.

So please be a little more respectful.
"Proud to be born and bred in Croydon"
ID: 1616968 · Report as offensive
Profile Chris SCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 19 Nov 00
Posts: 39607
Credit: 30,310,928
RAC: 51,203
United Kingdom
Message 1616975 - Posted: 21 Dec 2014, 16:29:04 UTC

Well said Bernie, if you hadn't have done I would have.

+100
Those are my principles, and if you don't like them ... well, I have others.
Groucho Marx 1895-1977

I also have mine, and if you don't like them ... tough, live with it.
Chris S 2017

I hate iPhones! Member of UCB Charter Hill Society
ID: 1616975 · Report as offensive
Eric KorpelaProject Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1231
Credit: 19,566,080
RAC: 11,681
United States
Message 1616981 - Posted: 21 Dec 2014, 16:45:58 UTC - in response to Message 1616892.  

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?


It's mainly generation of astropulse work that will suffer. No permanent damage should result. But I'd like to have everything in working order before folks do leave for the holidays.
@SETIEric

ID: 1616981 · Report as offensive
Tutankhamon
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6573
Credit: 40,718,807
RAC: 20,492
Sweden
Message 1616982 - Posted: 21 Dec 2014, 16:50:19 UTC - in response to Message 1616981.  
Last modified: 21 Dec 2014, 16:50:56 UTC

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?


It's mainly generation of astropulse work that will suffer. No permanent damage should result. But I'd like to have everything in working order before folks do leave for the holidays.

Don't put an enormous amount of time on the AP stuff now this close to the holidays. If MB works as it should, that would IMO be good enough for everyone.

Put AP into the freezer until after the holidays, if it doesn't behave without a lot of work put into the problem.
ID: 1616982 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11372
Credit: 98,854,673
RAC: 100,417
United Kingdom
Message 1616986 - Posted: 21 Dec 2014, 16:54:23 UTC - in response to Message 1616981.  

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?

It's mainly generation of astropulse work that will suffer. No permanent damage should result. But I'd like to have everything in working order before folks do leave for the holidays.

Could you ask them to take a look at Lando as well tomorrow, please? Lando's four MB splitters don't seem to have been pulling their weight since Marvin went down.
ID: 1616986 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : News : Marvin crashed


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.