Panic Mode On (47) Server problems?

Message boards : Number crunching : Panic Mode On (47) Server problems?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 10 · Next

AuthorMessage
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1110061 - Posted: 26 May 2011, 17:54:46 UTC
Last modified: 26 May 2011, 17:56:02 UTC

We know that one of the servers is having difficulties
http://setiathome.berkeley.edu/forum_thread.php?id=64259

Please continue the venting.

ID: 1110061 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 1110116 - Posted: 26 May 2011, 20:36:49 UTC - in response to Message 1110114.  

Good for you. My cache will run out in a day or so (if I keep the PCs running, doing nothing else but S@H) and therefore, in addition to your requests, I will also demand a refund!



Don't take life too seriously, as you'll never come out of it alive!
ID: 1110116 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 31015
Credit: 53,134,872
RAC: 32
United States
Message 1110127 - Posted: 26 May 2011, 21:03:43 UTC - in response to Message 1110119.  

I understand Bruno decided to play nice.


ID: 1110127 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1110150 - Posted: 26 May 2011, 22:14:20 UTC

Posted by Jeff Cobb over in Tech News.....


The folks at Overland really came through! They give us amazing support. One of their engineers logged into the the server and worked his magic. The RAID and filesystem have come back to life. We'll let the RAID resync and do a final reboot to clear some flags and then restart work generation.



PROUD MEMBER OF Team Starfire World BOINC
ID: 1110150 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 1110177 - Posted: 27 May 2011, 0:35:21 UTC

I ran dry a day or 2 ago, but I wound up getting 2 resend WU's. LOL But, I've got plenty of other projects to crunch for. Hehehe
ID: 1110177 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1110302 - Posted: 27 May 2011, 7:44:34 UTC - in response to Message 1110177.  

I ran dry a day or 2 ago, but I wound up getting 2 resend WU's. LOL But, I've got plenty of other projects to crunch for. Hehehe


Yes, we can see that :D - is there something you DON'T crunch?
I seem to have another day or so, before I can test how well the backup project mechanism works in 6.12.28.
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1110302 · Report as offensive
Profile Miep
Volunteer moderator
Avatar

Send message
Joined: 23 Jul 99
Posts: 2412
Credit: 351,996
RAC: 0
Message 1110405 - Posted: 27 May 2011, 15:04:09 UTC - in response to Message 1110401.  

This is totally unacceptable. My caches will run out in 10 days, and I tell you people that if this problem isn't solved in the coming 25 years, I will leave this project forever.


Well well, the project is still not functioning as it should. I'm counting down now to when I will leave this project. Now it's only 24 years and 364 days left....



I wouldn't do that, if I was you, the counting down bit.
What with the trees in the forest, I'd be afraid what exactly I was counting down to...
Carola
-------
I'm multilingual - I can misunderstand people in several languages!
ID: 1110405 · Report as offensive
KB7RZF
Volunteer tester
Avatar

Send message
Joined: 15 Aug 99
Posts: 9549
Credit: 3,308,926
RAC: 2
United States
Message 1110410 - Posted: 27 May 2011, 15:12:10 UTC - in response to Message 1110302.  

I ran dry a day or 2 ago, but I wound up getting 2 resend WU's. LOL But, I've got plenty of other projects to crunch for. Hehehe


Yes, we can see that :D - is there something you DON'T crunch?
I seem to have another day or so, before I can test how well the backup project mechanism works in 6.12.28.

LOL There's a few newer projects that have come out that I haven't attached to. All of these in my sig have been ones I've crunched for during a teams project of the month and as a just because they looked interesting. SETI will always be home, but I figured its always nice to share. So, I shared. LOL
ID: 1110410 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 31015
Credit: 53,134,872
RAC: 32
United States
Message 1110420 - Posted: 27 May 2011, 15:25:19 UTC - in response to Message 1110401.  

This is totally unacceptable. My caches will run out in 10 days, and I tell you people that if this problem isn't solved in the coming 25 years, I will leave this project forever.


Well well, the project is still not functioning as it should. I'm counting down now to when I will leave this project. Now it's only 24 years and 364 days left....


When it is working again, do you suspend your count, or does it reset?

ID: 1110420 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1110424 - Posted: 27 May 2011, 15:31:59 UTC

One positive thing that's come out of this is that the file deleters and db purge are getting some catch-up time. Results and Work Units waiting for db purge are both under 100K and dropping.
Donald
Infernal Optimist / Submariner, retired
ID: 1110424 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1110426 - Posted: 27 May 2011, 15:34:47 UTC - in response to Message 1110423.  

This is totally unacceptable. My caches will run out in 10 days, and I tell you people that if this problem isn't solved in the coming 25 years, I will leave this project forever.


Well well, the project is still not functioning as it should. I'm counting down now to when I will leave this project. Now it's only 24 years and 364 days left....


I wouldn't do that, if I was you, the counting down bit.
What with the trees in the forest, I'd be afraid what exactly I was counting down to...


Nah, that's not a problem. I come from a family were the men lives until they're close to 100 years old.

So, if the project keeps going, we will have to deal with you for another 40+ years?
Works for me.
Donald
Infernal Optimist / Submariner, retired
ID: 1110426 · Report as offensive
Profile Jason Safoutin
Volunteer tester
Avatar

Send message
Joined: 8 Sep 05
Posts: 1386
Credit: 200,389
RAC: 0
United States
Message 1110433 - Posted: 27 May 2011, 16:13:34 UTC - in response to Message 1110410.  

I ran dry a day or 2 ago, but I wound up getting 2 resend WU's. LOL But, I've got plenty of other projects to crunch for. Hehehe


Yes, we can see that :D - is there something you DON'T crunch?
I seem to have another day or so, before I can test how well the backup project mechanism works in 6.12.28.

LOL There's a few newer projects that have come out that I haven't attached to. All of these in my sig have been ones I've crunched for during a teams project of the month and as a just because they looked interesting. SETI will always be home, but I figured its always nice to share. So, I shared. LOL


Agreed. SETI is my favorite project and will always be my top one. I used to participate in other projects, but never was too interested in any of them. The problems happen often, but that won't stop me from crunching here ever.

"By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3

ID: 1110433 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 31015
Credit: 53,134,872
RAC: 32
United States
Message 1110485 - Posted: 27 May 2011, 19:04:51 UTC - in response to Message 1110440.  

So, if the project keeps going, we will have to deal with you for another 40+ years?



Aaaarggh! :-)))

Eric almost has that Fountain of Youth formula ET sent decoded ...

ID: 1110485 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1110666 - Posted: 28 May 2011, 5:41:14 UTC

There's life on the cricket graph starting around 0500utc. Upload server is still disabled though.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1110666 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1110680 - Posted: 28 May 2011, 7:48:07 UTC

Instead of first asking a question that would expose how ignorant I am about this, let me get ahead of that and proclaim and admit that I am really ignorant about a lot of things.

One of the many is a RAID array. Now, it's hard to be as ignorant as I am, especially since I once configured a RAID using an Adaptec controller that I seem to remember paying more for than the laptop I'm using to type this post. But that was back in the days when expensive motherboards would have two VESA Local Bus slots on them.

Okay, so now that everyone's up-to-date on how out-of-date I am ---

Just how big are these RAID arrays SETI is using (in GBs)? That question would probably answer my next question which is "Why are they using them?" I can't imagine that our upload / download activity, confined by the pipe into the lab, would need nearly 900MB/s and if it does then I can't imagine how many physical drives there would have to be in the array to handle it for more than half of a day at a time.

Can someone give me a clue as to why you'd want to run a "striped" array (I understand redundancy) on this project in big, bold, conceptual strokes that even I can understand?

I was just transferring an ISO file via wireless at 11.5MB/s across my den (I know that's one big file as opposed to 5,000 22k files).

I'm thinking RAM makes more sense and so I want to know why I'm wrong; just as a sort-of "welcome to reality in the 21st century" lesson for me.
ID: 1110680 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1110694 - Posted: 28 May 2011, 8:44:28 UTC - in response to Message 1110680.  

Just how big are these RAID arrays SETI is using (in GBs)? That question would probably answer my next question which is "Why are they using them?"

RAID stands for Redundant Array of Inexpensive Disks.
Note the redundant.

RAID0 is just for speed, there is no redundancy. 1 disk dies, all data is lost.
RAID 1 is mirroring, one disk maintains a copy of another disk.

RAID 5+6 are the ones that really matter- data is spread across multiple disks. With RADI5, if one disk dies no data is lost. It can be rebuilt from the redundant data stored on the other disks in the array. With RAID6 2 disks in the array can die & still no data is lost.
Grant
Darwin NT
ID: 1110694 · Report as offensive
Profile Tim
Volunteer tester
Avatar

Send message
Joined: 19 May 99
Posts: 211
Credit: 278,575,259
RAC: 0
Greece
Message 1110699 - Posted: 28 May 2011, 9:09:00 UTC
Last modified: 28 May 2011, 9:09:49 UTC

Imagine a hard disk failure now…… at our computers with so many tasks to upload :-)
ID: 1110699 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1110711 - Posted: 28 May 2011, 9:53:46 UTC - in response to Message 1110680.  

Just how big are these RAID arrays SETI is using (in GBs)?

Looking at the Server Status Page, the line for 'Results out in the field' right now shows 5,857,809 for SETI (Multibeam), and 159,686 (Astropulse).

Those are 'tasks' in modern terminology. Sometimes two people will be working on the same workunit, but often (more often for MB), one task will be complete and have been returned for validation, but the other will still be active.

For the sake of argument, let's say that represents 5 million MB workunits, and 100 thousand AP workunits. That data has to be held at SETI, on those RAID arrays, until the results are validated - that's so a replacement copy can be sent out if validation is inconclusive or a worker times out.

MB data files are 367 KiB in storage requirements, and AP are 8 MiB. Multiply that lot out, and I get the answer to be...

Two thousand five hundred gigabytes

When a RAID has to be rebuilt (as is going on at the moment), every single one of those bytes has to be read, and where appropriate written back to make the new redundant copy. If the RAID arrays held less data, the process would be quicker, and we could get back to work sooner. That's why I ask people not to hold so many tasks in their caches.
ID: 1110711 · Report as offensive
Profile Jason Safoutin
Volunteer tester
Avatar

Send message
Joined: 8 Sep 05
Posts: 1386
Credit: 200,389
RAC: 0
United States
Message 1110723 - Posted: 28 May 2011, 11:09:51 UTC

I managed to get a cache of 59 WU's just not. This was the first time I tried to download work since the issues started. I noticed one took just a few seconds to crunch. I wonder if there will be many of those. Still not able to upload anything though.
"By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3

ID: 1110723 · Report as offensive
Profile Jason Safoutin
Volunteer tester
Avatar

Send message
Joined: 8 Sep 05
Posts: 1386
Credit: 200,389
RAC: 0
United States
Message 1110737 - Posted: 28 May 2011, 12:10:12 UTC - in response to Message 1110727.  

As I understand it, the reason for using RAID configurations is because there is redundancy built in, i.e. a backup. RAID disks are hot swappable, so if a hard drive fails you simply take it out and throw it away and plug a new one in. Then the RAID array will copy whatever it needs to the new disk.

I don't have enough technical knowlege to know whether or not the Project is manipulating their data in the most efficient way, but I can appreciate, as Richard has shown, that the amount of data invloved is significantly high.

Also, Seti staff are supposed to be project scientists first and foremost, and spend their time analysing the data we provide for them. But of neccessity, they have to also undertake a secondary role of being Server admins.

I think they do rather well all things considered.


Agree with you 100%.
"By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3

ID: 1110737 · Report as offensive
1 · 2 · 3 · 4 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (47) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.