Bits (Oct 09 2007)

Message boards : Technical News : Bits (Oct 09 2007)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 656897 - Posted: 9 Oct 2007, 21:59:10 UTC

Today the usual tuesday outage, which went fine. Of course, we preceeded this by having the project off all night to clean out various backlogged queues. It's at the point that if one part of the backend fails for long enough, the result table gets bloated and wreaks havoc on the whole system. But we were fully drained by this morning, and the database backup/compression went smoothly. We're catching up now.

Somebody asked what "db_purge.x86_64" is. In order to speed up the process of reducing the db_purge queue we wanted to run that process on the system where the actual archives are being stored to disk. This was thumper, a 64 bit machine, so that meant compiling a 64 bit version of the purger. The suffix "x86_64" denotes that.

During the outage Jeff and I reconfigured the workunit volume on our Snap Appliance to be a grouped set of mirrors instead of a big raid 5. The idea is that this will vastly help disk I/O - we'll start putting workunits back on this system in due time and monitor progress. We shall see how well this helps.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 656897 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 656900 - Posted: 9 Oct 2007, 22:04:19 UTC

Matt,

So, did you play any gigs in Kiwiland? Or just pure vacation?
ID: 656900 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 656906 - Posted: 9 Oct 2007, 22:09:12 UTC


Matt, Nice Work @ Berkeley - and to You & Jeff - Keep it up . . .

ID: 656906 · Report as offensive
Profile Neil Blaikie
Volunteer tester
Avatar

Send message
Joined: 17 May 99
Posts: 143
Credit: 6,652,341
RAC: 0
Canada
Message 656910 - Posted: 9 Oct 2007, 22:13:04 UTC

Good to see you are back from vacation nice and refreshed hopefully. It's a very nice part of the world and the prefect time to escape fall in Northern Hemisphere. Keep up the good work and as usual, thanks for keeping us informed.
ID: 656910 · Report as offensive
BenchZowner

Send message
Joined: 19 Jul 07
Posts: 2
Credit: 126,630
RAC: 0
United States
Message 657030 - Posted: 10 Oct 2007, 1:08:45 UTC

Thanks for the update Matt, and hope you had some nice time on your vacations.

One quick question, are you running any software RAID arrays on the project machines ?
ID: 657030 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 657034 - Posted: 10 Oct 2007, 1:15:50 UTC

not a complaint, but...

when I went to get more WU's, some of them came over as "0 bytes" on the "transfers" tab of the client... I've never noticed that before from active WU's!
.

Hello, from Albany, CA!...
ID: 657034 · Report as offensive
n7rfa
Volunteer tester
Avatar

Send message
Joined: 13 Apr 04
Posts: 370
Credit: 9,058,599
RAC: 0
United States
Message 657045 - Posted: 10 Oct 2007, 1:31:02 UTC - in response to Message 657034.  

not a complaint, but...

when I went to get more WU's, some of them came over as "0 bytes" on the "transfers" tab of the client... I've never noticed that before from active WU's!

There's a whole slew of these bad WUs.

I just aborted all them that were on two of my systems. (As soon as my wife gets off the other I'll do it also.)

To abort them you have to find the zero length WU files and then use the file name to find them on the Tasks tab.

Unfortunately, this results in a new file being sent out to someone else.
ID: 657045 · Report as offensive
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 657135 - Posted: 10 Oct 2007, 3:20:44 UTC - in response to Message 657045.  

not a complaint, but...

when I went to get more WU's, some of them came over as "0 bytes" on the "transfers" tab of the client... I've never noticed that before from active WU's!

There's a whole slew of these bad WUs.

I just aborted all them that were on two of my systems. (As soon as my wife gets off the other I'll do it also.)

To abort them you have to find the zero length WU files and then use the file name to find them on the Tasks tab.

Unfortunately, this results in a new file being sent out to someone else.

It would be better if you let your machine try to crunch them. It would help the admins to troubleshoot the splitters if they see a bunch of errors stating "Compute error" and "SETI@home error -6 Bad workunit header" instead of "Aborted by user". If enough of the bad workunit header results kill the work unit instead of aborts, it would be easier for the admins do a database search for which work units need to be resplit.
ID: 657135 · Report as offensive
n7rfa
Volunteer tester
Avatar

Send message
Joined: 13 Apr 04
Posts: 370
Credit: 9,058,599
RAC: 0
United States
Message 657299 - Posted: 10 Oct 2007, 13:03:51 UTC - in response to Message 657135.  

not a complaint, but...

when I went to get more WU's, some of them came over as "0 bytes" on the "transfers" tab of the client... I've never noticed that before from active WU's!

There's a whole slew of these bad WUs.

I just aborted all them that were on two of my systems. (As soon as my wife gets off the other I'll do it also.)

To abort them you have to find the zero length WU files and then use the file name to find them on the Tasks tab.

Unfortunately, this results in a new file being sent out to someone else.

It would be better if you let your machine try to crunch them. It would help the admins to troubleshoot the splitters if they see a bunch of errors stating "Compute error" and "SETI@home error -6 Bad workunit header" instead of "Aborted by user". If enough of the bad workunit header results kill the work unit instead of aborts, it would be easier for the admins do a database search for which work units need to be resplit.

I'll make it easier for them: 17mr07ab and 18mr07aa.
ID: 657299 · Report as offensive
Robert Ribbeck
Avatar

Send message
Joined: 7 Jun 02
Posts: 644
Credit: 5,283,174
RAC: 0
United States
Message 657470 - Posted: 10 Oct 2007, 20:16:04 UTC

Noticed today WU are not being taken according to the due dates.
Is that a seti or boinc problem ?
ID: 657470 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 657487 - Posted: 10 Oct 2007, 20:58:12 UTC - in response to Message 657470.  

Noticed today WU are not being taken according to the due dates.
Is that a seti or boinc problem ?

BOINC takes runs the tasks from any particular project in First In First Out order unless there is a chance that this will miss a deadline. In that case it starts in Earliest Deadline First order. This is by design.


BOINC WIKI
ID: 657487 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 657915 - Posted: 11 Oct 2007, 18:38:27 UTC

I did see a few zero-crunch-length "Client Errors" workunits on one of my machines.
ID: 657915 · Report as offensive

Message boards : Technical News : Bits (Oct 09 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.