Marvin crashed

Message boards : News : Marvin crashed
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1346
Credit: 46,147,172
RAC: 56,952
United States
Message 1616871 - Posted: 21 Dec 2014, 4:46:29 UTC

It appears that the root partition filled on marvin (rapidly) while I was AFK, for no reason that I am aware of, which caused it to crash. Nobody is at the colocation facility right now, so the astropulse DB is down. I'll try to get remote accesses for a reboot, but chances are that marvin is down until monday morning.
@SETIEric

ID: 1616871 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5180
Credit: 430,832,571
RAC: 275,218
United States
Message 1616874 - Posted: 21 Dec 2014, 5:05:40 UTC - in response to Message 1616871.  

Sorry to hear that.

Thanks for the update
ID: 1616874 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11574
Credit: 170,967,548
RAC: 104,691
Australia
Message 1616877 - Posted: 21 Dec 2014, 5:10:19 UTC - in response to Message 1616874.  
Last modified: 21 Dec 2014, 5:16:33 UTC

Many AP channels being split are also erroring out now.


EDIT- to add to that, MB splitter output has almost halved, so the ready-to-send buffer is falling & we should be out of work in the next 18-24 hours.
Grant
Darwin NT
ID: 1616877 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15356
Credit: 7,357,095
RAC: 1,056
United Kingdom
Message 1616882 - Posted: 21 Dec 2014, 5:34:09 UTC

Blimey you folks are certainly have a run of bad luck :( Thanks for update Eric, fingers crossed it's an easy thing to figure out and solve.

Member of the People Encouraging Niceness In Society club.

A vote for Godzilla, is a vote for cool explosions!
ID: 1616882 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1346
Credit: 46,147,172
RAC: 56,952
United States
Message 1616889 - Posted: 21 Dec 2014, 5:52:45 UTC - in response to Message 1616882.  

It's probably easy to fix. The likely problem is getting support from the colocation facility on the weekend outside of working hours. We don't pay the additional fees for 24/7 support (primarily because they aren't small).
@SETIEric

ID: 1616889 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15356
Credit: 7,357,095
RAC: 1,056
United Kingdom
Message 1616892 - Posted: 21 Dec 2014, 6:16:09 UTC - in response to Message 1616889.  

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?

Member of the People Encouraging Niceness In Society club.

A vote for Godzilla, is a vote for cool explosions!
ID: 1616892 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 549,857,348
RAC: 839,853
Canada
Message 1616894 - Posted: 21 Dec 2014, 6:30:26 UTC - in response to Message 1616877.  

Many AP channels being split are also erroring out now.


EDIT- to add to that, MB splitter output has almost halved, so the ready-to-send buffer is falling & we should be out of work in the next 18-24 hours.



That should be OK, most peple have a 7 day buffer.
ID: 1616894 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11574
Credit: 170,967,548
RAC: 104,691
Australia
Message 1616895 - Posted: 21 Dec 2014, 6:41:56 UTC - in response to Message 1616894.  

That should be OK, most peple have a 7 day buffer.

Depends on what you mean by most people.
Mine will last 8-12 hours, except for my slower machine which will have about 4 days work.
Others will run out in a couple of hours, or less.
Grant
Darwin NT
ID: 1616895 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 25518
Credit: 49,428,664
RAC: 20,234
United States
Message 1616896 - Posted: 21 Dec 2014, 6:59:13 UTC - in response to Message 1616871.  

It appears that the root partition filled on marvin (rapidly) while I was AFK, for no reason that I am aware of, which caused it to crash. Nobody is at the colocation facility right now, so the astropulse DB is down. I'll try to get remote accesses for a reboot, but chances are that marvin is down until monday morning.

Just be sure it didn't fill because it was filling log files with error messages to overflowing.
ID: 1616896 · Report as offensive
Lin**? You're kidding me!!
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 7639
Credit: 47,680,418
RAC: 2,466
Sweden
Message 1616911 - Posted: 21 Dec 2014, 9:24:51 UTC - in response to Message 1616894.  
Last modified: 21 Dec 2014, 9:40:11 UTC

Many AP channels being split are also erroring out now.


EDIT- to add to that, MB splitter output has almost halved, so the ready-to-send buffer is falling & we should be out of work in the next 18-24 hours.



That should be OK, most peple have a 7 day buffer.

Heh, even with a full cache of 100 my GPU will run dry in about 1 day. I only managed to get half of that before Marvin crashed again.

A 7 day cache is impossible to build up for me, with the max 100 WU limit...
My GPU will once again go silent, in just a couple of hours.

Note: I'm not whining, just stating the facts. It's not the end of the world, or my world, if I run out of WU's :-)
ID: 1616911 · Report as offensive
Profile Ageless
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 14869
Credit: 4,088,416
RAC: 860
Netherlands
Message 1616916 - Posted: 21 Dec 2014, 9:37:13 UTC - in response to Message 1616871.  

while I was AFK

Wait, why would you even be AK (at keyboard) on a Sunday?
Jord

According to Giorgo of the Ancient Astronaut Theorists I do not help with tech questions via private message. He's right: please use the forums for that.
ID: 1616916 · Report as offensive
Lin**? You're kidding me!!
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 7639
Credit: 47,680,418
RAC: 2,466
Sweden
Message 1616917 - Posted: 21 Dec 2014, 9:42:12 UTC - in response to Message 1616916.  
Last modified: 21 Dec 2014, 9:57:58 UTC

while I was AFK

Wait, why would you even be AK (at keyboard) on a Sunday?

Maybe he's having itchy fingers, and need to scratch on something :-)

Apart from that, I see no reasons why Eric shouldn't just enjoy his weekends, and forget about the project.
ID: 1616917 · Report as offensive
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41576
Credit: 41,973,031
RAC: 34
Message 1616919 - Posted: 21 Dec 2014, 10:13:32 UTC

Many thanks for the heads up and explanation Eric.

I see no reasons why Eric shouldn't just enjoy his weekends, and forget about the project.

I agree, but Eric is far too dedicated for that. In the past he has made round trips to the Lab from home on a Sunday, before they had remote boot power strips and the Co-Lo.

We don't pay the additional fees for 24/7 support (primarily because they aren't small).

Exactly, the money just isn't there for it, what little they have needs to be spent elsewhere. Right from the beginning in 1999, Seti never promised it would be a 24/7 project, but over the years it has virtually become that anyway, and people have got so used to it always being up, except for the Tuesday Outrage, that they just expect it now as a matter of course.

Blimey you folks are certainly having a run of bad luck :(

I know what you mean Zappy, but it isn't really bad luck, computers and software fall over for a reason. In this case Eric knows the cause, but not yet the reason why. But this is another database problem to manifest itself, after all the others. I asked Matt a few years ago if there was a maximum limit for an Informix database, he said he couldn't foresee Seti getting to that stage. But we have had certain initial setup parameters exceeded already.

Hopefully Eric will find the cause and fix it to ensure hopefully it doesn't happen again. Let's all hope that 2015 has a bit more up-time than recently.
ID: 1616919 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 7674
Credit: 2,724,049
RAC: 1,979
Italy
Message 1616922 - Posted: 21 Dec 2014, 10:44:55 UTC
Last modified: 21 Dec 2014, 10:45:13 UTC

I am running 7 BOINC projects, plus one not BOINC (CernVM_WebAPI). Most of them go on vacation for Xmas because developers and admins enjoy their vacations, but some of them (like Einstein@home) still give me work. So Merry Christmas to everybody.
Tullio
ID: 1616922 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9846
Credit: 83,500,335
RAC: 95,898
United Kingdom
Message 1616968 - Posted: 21 Dec 2014, 15:00:57 UTC

It is nice to see that someone is bothering about this project at all.


Over the last few weeks we have been kept up to date with all the problems. We also know that Matt and Jeff have been involved.

So please be a little more respectful.
ID: 1616968 · Report as offensive
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41576
Credit: 41,973,031
RAC: 34
Message 1616975 - Posted: 21 Dec 2014, 16:29:04 UTC

Well said Bernie, if you hadn't have done I would have.

+100
"none so blind as those who will not see"
John Heywood 1546

Don't drink water, that stuff rusts pipes!



You are making Proof out of Logic, by just being dubious!

{Bluestar)
ID: 1616975 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1346
Credit: 46,147,172
RAC: 56,952
United States
Message 1616981 - Posted: 21 Dec 2014, 16:45:58 UTC - in response to Message 1616892.  

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?


It's mainly generation of astropulse work that will suffer. No permanent damage should result. But I'd like to have everything in working order before folks do leave for the holidays.
@SETIEric

ID: 1616981 · Report as offensive
Lin**? You're kidding me!!
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 7639
Credit: 47,680,418
RAC: 2,466
Sweden
Message 1616982 - Posted: 21 Dec 2014, 16:50:19 UTC - in response to Message 1616981.  
Last modified: 21 Dec 2014, 16:50:56 UTC

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?


It's mainly generation of astropulse work that will suffer. No permanent damage should result. But I'd like to have everything in working order before folks do leave for the holidays.

Don't put an enormous amount of time on the AP stuff now this close to the holidays. If MB works as it should, that would IMO be good enough for everyone.

Put AP into the freezer until after the holidays, if it doesn't behave without a lot of work put into the problem.
ID: 1616982 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13133
Credit: 149,095,641
RAC: 174,898
United Kingdom
Message 1616986 - Posted: 21 Dec 2014, 16:54:23 UTC - in response to Message 1616981.  

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?

It's mainly generation of astropulse work that will suffer. No permanent damage should result. But I'd like to have everything in working order before folks do leave for the holidays.

Could you ask them to take a look at Lando as well tomorrow, please? Lando's four MB splitters don't seem to have been pulling their weight since Marvin went down.
ID: 1616986 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : News : Marvin crashed


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.