Marvin crashed

Message boards : News : Marvin crashed
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1378
Credit: 54,506,847
RAC: 60
United States
Message 1616871 - Posted: 21 Dec 2014, 4:46:29 UTC

It appears that the root partition filled on marvin (rapidly) while I was AFK, for no reason that I am aware of, which caused it to crash. Nobody is at the colocation facility right now, so the astropulse DB is down. I'll try to get remote accesses for a reboot, but chances are that marvin is down until monday morning.
@SETIEric

ID: 1616871 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5512
Credit: 528,817,460
RAC: 242
United States
Message 1616874 - Posted: 21 Dec 2014, 5:05:40 UTC - in response to Message 1616871.  

Sorry to hear that.

Thanks for the update
ID: 1616874 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13215
Credit: 208,696,464
RAC: 304
Australia
Message 1616877 - Posted: 21 Dec 2014, 5:10:19 UTC - in response to Message 1616874.  
Last modified: 21 Dec 2014, 5:16:33 UTC

Many AP channels being split are also erroring out now.


EDIT- to add to that, MB splitter output has almost halved, so the ready-to-send buffer is falling & we should be out of work in the next 18-24 hours.
Grant
Darwin NT
ID: 1616877 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1616882 - Posted: 21 Dec 2014, 5:34:09 UTC

Blimey you folks are certainly have a run of bad luck :( Thanks for update Eric, fingers crossed it's an easy thing to figure out and solve.

Member of the People Encouraging Niceness In Society club.

ID: 1616882 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1378
Credit: 54,506,847
RAC: 60
United States
Message 1616889 - Posted: 21 Dec 2014, 5:52:45 UTC - in response to Message 1616882.  

It's probably easy to fix. The likely problem is getting support from the colocation facility on the weekend outside of working hours. We don't pay the additional fees for 24/7 support (primarily because they aren't small).
@SETIEric

ID: 1616889 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1616892 - Posted: 21 Dec 2014, 6:16:09 UTC - in response to Message 1616889.  

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?

Member of the People Encouraging Niceness In Society club.

ID: 1616892 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1616894 - Posted: 21 Dec 2014, 6:30:26 UTC - in response to Message 1616877.  

Many AP channels being split are also erroring out now.


EDIT- to add to that, MB splitter output has almost halved, so the ready-to-send buffer is falling & we should be out of work in the next 18-24 hours.



That should be OK, most peple have a 7 day buffer.
ID: 1616894 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13215
Credit: 208,696,464
RAC: 304
Australia
Message 1616895 - Posted: 21 Dec 2014, 6:41:56 UTC - in response to Message 1616894.  

That should be OK, most peple have a 7 day buffer.

Depends on what you mean by most people.
Mine will last 8-12 hours, except for my slower machine which will have about 4 days work.
Others will run out in a couple of hours, or less.
Grant
Darwin NT
ID: 1616895 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 28402
Credit: 53,134,872
RAC: 32
United States
Message 1616896 - Posted: 21 Dec 2014, 6:59:13 UTC - in response to Message 1616871.  

It appears that the root partition filled on marvin (rapidly) while I was AFK, for no reason that I am aware of, which caused it to crash. Nobody is at the colocation facility right now, so the astropulse DB is down. I'll try to get remote accesses for a reboot, but chances are that marvin is down until monday morning.

Just be sure it didn't fill because it was filling log files with error messages to overflowing.
ID: 1616896 · Report as offensive
Grumpy Swede (Democratic Socialist)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8634
Credit: 49,849,242
RAC: 65
Sweden
Message 1616911 - Posted: 21 Dec 2014, 9:24:51 UTC - in response to Message 1616894.  
Last modified: 21 Dec 2014, 9:40:11 UTC

Many AP channels being split are also erroring out now.


EDIT- to add to that, MB splitter output has almost halved, so the ready-to-send buffer is falling & we should be out of work in the next 18-24 hours.



That should be OK, most peple have a 7 day buffer.

Heh, even with a full cache of 100 my GPU will run dry in about 1 day. I only managed to get half of that before Marvin crashed again.

A 7 day cache is impossible to build up for me, with the max 100 WU limit...
My GPU will once again go silent, in just a couple of hours.

Note: I'm not whining, just stating the facts. It's not the end of the world, or my world, if I run out of WU's :-)
ID: 1616911 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15183
Credit: 4,362,181
RAC: 3
Netherlands
Message 1616916 - Posted: 21 Dec 2014, 9:37:13 UTC - in response to Message 1616871.  

while I was AFK

Wait, why would you even be AK (at keyboard) on a Sunday?
ID: 1616916 · Report as offensive
Grumpy Swede (Democratic Socialist)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8634
Credit: 49,849,242
RAC: 65
Sweden
Message 1616917 - Posted: 21 Dec 2014, 9:42:12 UTC - in response to Message 1616916.  
Last modified: 21 Dec 2014, 9:57:58 UTC

while I was AFK

Wait, why would you even be AK (at keyboard) on a Sunday?

Maybe he's having itchy fingers, and need to scratch on something :-)

Apart from that, I see no reasons why Eric shouldn't just enjoy his weekends, and forget about the project.
ID: 1616917 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8489
Credit: 2,930,782
RAC: 1
Italy
Message 1616922 - Posted: 21 Dec 2014, 10:44:55 UTC
Last modified: 21 Dec 2014, 10:45:13 UTC

I am running 7 BOINC projects, plus one not BOINC (CernVM_WebAPI). Most of them go on vacation for Xmas because developers and admins enjoy their vacations, but some of them (like Einstein@home) still give me work. So Merry Christmas to everybody.
Tullio
ID: 1616922 · Report as offensive
bluestar

Send message
Joined: 5 Sep 12
Posts: 4987
Credit: 2,084,789
RAC: 3
Message 1616932 - Posted: 21 Dec 2014, 12:00:15 UTC
Last modified: 21 Dec 2014, 12:03:55 UTC

It is nice to see that someone is bothering about this project at all.

Supposedly I too often end up here having other thoughts on my mind.

I paid a visit to Lunatics yesterday evening. Only had a quick look at their page.

Really I am under the impression that application development is a continuous process which never seems to end.

One may perhaps be asking whether or not such applications (including the special or proprietary ones), are able to detect a signal if it should be present.

For now we only are able to make assumptions on whether or not a signal ever was there by means of looking at the four result categories, as well as possible processing times of a given task, as well as autocorrelation, of course.

Back in 1977, Jerry R. Ehman probably was able to detect the Wow signal because the area was already known to be rich in radio sources. Whether or not the source behind this signal was stationary or not probably never will be known for sure, since it only was detected in one of two horns being used for this purpose at that time, namely the radio telescope belonging to the Ohio State University.

Sadly this facility is no more.

The whole thing was eventually torn down and was replaced by other facilities instead.

One part of history gone.

Instead being replaced by something else.
ID: 1616932 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9947
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1616968 - Posted: 21 Dec 2014, 15:00:57 UTC

It is nice to see that someone is bothering about this project at all.


Over the last few weeks we have been kept up to date with all the problems. We also know that Matt and Jeff have been involved.

So please be a little more respectful.
ID: 1616968 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1378
Credit: 54,506,847
RAC: 60
United States
Message 1616981 - Posted: 21 Dec 2014, 16:45:58 UTC - in response to Message 1616892.  

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?


It's mainly generation of astropulse work that will suffer. No permanent damage should result. But I'd like to have everything in working order before folks do leave for the holidays.
@SETIEric

ID: 1616981 · Report as offensive
Grumpy Swede (Democratic Socialist)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8634
Credit: 49,849,242
RAC: 65
Sweden
Message 1616982 - Posted: 21 Dec 2014, 16:50:19 UTC - in response to Message 1616981.  
Last modified: 21 Dec 2014, 16:50:56 UTC

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?


It's mainly generation of astropulse work that will suffer. No permanent damage should result. But I'd like to have everything in working order before folks do leave for the holidays.

Don't put an enormous amount of time on the AP stuff now this close to the holidays. If MB works as it should, that would IMO be good enough for everyone.

Put AP into the freezer until after the holidays, if it doesn't behave without a lot of work put into the problem.
ID: 1616982 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14407
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1616986 - Posted: 21 Dec 2014, 16:54:23 UTC - in response to Message 1616981.  

I take it it's not OK to leave it as it is until Monday in it's crashed state then? If non of the remote stuff works?

It's mainly generation of astropulse work that will suffer. No permanent damage should result. But I'd like to have everything in working order before folks do leave for the holidays.

Could you ask them to take a look at Lando as well tomorrow, please? Lando's four MB splitters don't seem to have been pulling their weight since Marvin went down.
ID: 1616986 · Report as offensive
bluestar

Send message
Joined: 5 Sep 12
Posts: 4987
Credit: 2,084,789
RAC: 3
Message 1616995 - Posted: 21 Dec 2014, 17:19:05 UTC
Last modified: 21 Dec 2014, 17:23:32 UTC

You are of course correct when saying so.

I should not be using such a language here.

Rather I should also say "that" instead of "which" - again my poor language skills.

I could also mention that I again may have made a new discovery regarding prime numbers, or at least factors.

Eventually quite a number of these factors, 100 - 1000 digit ones should become available in the near future as I get this new collection put together and later uploaded at the proper place (The Factor Database).

Also there really is a marked contrast between those .vlar's and the CUDA-based Seti@home tasks when it comes to processing times. I do like those tasks that are carrying out the gaussian search better than the .vlar tasks, but apparently there may be a new batch of tasks later for the CPU which may be doing exactly that, which should not make it necessary for me to go back to changing the preferences back to CUDA-tasks as well as an option.

The Genefer tasks are also a fascinating subject, but running these tasks by means of CUDA is demanding and is straining both input and output as well as visible graphics on the screen.

Definitely there are both advantages and disadvantages in doing all of this.

Sitting at home like you may also be doing, I probably forget it is a Sunday today.

And in fact Christmas is coming up as well only 3 or 4 days from now.

I wish you good luck in fixing Marvin, Eric!
ID: 1616995 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1378
Credit: 54,506,847
RAC: 60
United States
Message 1617071 - Posted: 21 Dec 2014, 21:16:41 UTC - in response to Message 1616995.  

Marvin isn't finding a boot device on reboot. I'm not even sure it's seeing the RAID card at all (these remote interfaces to the boot screen aren't good at capturing things that happen quickly). Matt and I are meeting at the co-lo first thing tomorrow morning. I'm bringing a boot CD and what I think is a matching RAID card.
@SETIEric

ID: 1617071 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : News : Marvin crashed


 
©2021 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.