The Geese Fly West (Dec 10 2007)

Message boards : Technical News : The Geese Fly West (Dec 10 2007)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 690450 - Posted: 10 Dec 2007, 23:26:47 UTC

We had another batch of "fast" workunits this weekend. No big deal, except we did run out of a ready-to-send queue for a while there. To help alleviate panic I added a couple items to the server status page for your (and our) diagnostic pleasure: count of results returned over the past hour, and their average "turnround" time (i.e. "wall" time between workunit download and its result upload). It seems the current "normal" average is about 60 hours, during the weekend we were as low as 30. It would be be more meaningful to have median instead of average (as there are always slow computers that turnaround mere seconds before the deadline, thus skewing the averages), but mysql doesn't have a "median" function and it's not really worth implementing one of our own - we have so many other fish to fry.

Our air conditioner tech was in today to wrap up work on fixing the current (and hopefully last) coolant leak. No real news there, except it was fun to see our temperatures shoot up 6 degrees Celsius within a few minutes as the air conditioner was temporarily turned off.

I'm about to start the latest donation drive. This will wreak havoc on a few of our isolated servers which are dedicated for such large mass mailings. Hopefully this will happen without incident - people are understandably sensitive about what they perceive as spam.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 690450 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 690457 - Posted: 10 Dec 2007, 23:34:55 UTC
Last modified: 10 Dec 2007, 23:38:54 UTC

. . . as usual - Thanks for the Update Matt

< and Best of Luck with the Donation Drive Sir!

ps - almost forgot - real nice: Sleepfall

- Anthony Phillips - 'The Geese And The Ghost'
BOINC Wiki . . .

Science Status Page . . .
ID: 690457 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 690571 - Posted: 11 Dec 2007, 4:25:45 UTC

I really hate to keep harping on this, but the "client connection stats" is broken - and has been since at least Friday - possibly Thursday (I don't recall!)
.

Hello, from Albany, CA!...
ID: 690571 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 690585 - Posted: 11 Dec 2007, 4:48:15 UTC - in response to Message 690450.  

Median in mySQL? Easy:
SELECT x.val
FROM data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*))

But you should look at the query plans between the above and avg() before deciding to put it into production. Only works with MySQL > 5.0; I got the tip from here:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
ID: 690585 · Report as offensive
Jesse Viviano

Send message
Joined: 27 Feb 00
Posts: 100
Credit: 3,949,583
RAC: 0
United States
Message 690591 - Posted: 11 Dec 2007, 5:27:20 UTC - in response to Message 690585.  

Median in mySQL? Easy:
SELECT x.val
FROM data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*))

But you should look at the query plans between the above and avg() before deciding to put it into production. Only works with MySQL > 5.0; I got the tip from here:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html

Next time if you want to include source code, you can use code tags, like what is shown on this page. The result produces this:
SELECT x.val 
FROM data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*))

It also seems that the code tags seem useful to draw tables using ASCII on these forums if you need to do that because the code tags' contents render in a monospace font.
ID: 690591 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 690671 - Posted: 11 Dec 2007, 15:27:13 UTC - in response to Message 690591.  
Last modified: 11 Dec 2007, 15:27:46 UTC

...Only works with MySQL > 5.0; I got the tip from here:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html

Good quote but not accurate: Just one small but vital ommission...

It should read:

SELECT x.val 
FROM data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2


Further comment:

assumes (COUNT(*)+1)/2 returns an int, if not
then use INT((COUNT(*)+1)/2)
There are variations for the varied defs of
median too.



That lot could also be done on a unix command pipe using sort, head, wc, and bc/dc for those wishing a challenge for a minimalist solution!

Regards,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 690671 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 690727 - Posted: 11 Dec 2007, 23:33:40 UTC - in response to Message 690591.  

SELECT x.val 
FROM data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*))

It also seems that the code tags seem useful to draw tables using ASCII on these forums if you need to do that because the code tags' contents render in a monospace font.

Much thanks for pointing that out. They did not work the way one would want when last I tried them (probably between one and two years ago). Looks like someone fixed something.

ID: 690727 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 690735 - Posted: 12 Dec 2007, 0:08:37 UTC
Last modified: 12 Dec 2007, 0:09:07 UTC

To clarify, I did see all these median functions/queries but (a) the current average query was already complex enough that wrapping this additional logic around it, however simple, was enough to make me move on to a more productive task and (b) I have, perhaps unfounded, fears that this would eat up more memory than an average query and that's enough to move on to something else.

My day is usaully a lot of doing constant triage, picking low hanging fruit, playing whack-a-mole, etc. so if anything relatively low priority that smacks of non-zero difficulty or raising non-zero concern it is immediately put to the bottom of the list.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 690735 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 690740 - Posted: 12 Dec 2007, 0:43:13 UTC - in response to Message 690727.  

[...]
Much thanks for pointing that out. They did not work the way one would want when last I tried them (probably between one and two years ago). Looks like someone fixed something.

It would have been less confusing not to have reposted the incorrect code...

Good find on the code, but looks like we stay on something simpler for the time being...

Anyone game for the unix solution to keep the database bashing minimalist?

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 690740 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 690741 - Posted: 12 Dec 2007, 0:46:43 UTC - in response to Message 690671.  

Reposting for clarity.

Also trying "pre" rather than "code" 'cos the "=" looks too much like "-" when using "code"!

...Only works with MySQL > 5.0; I got the tip from here:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html

Good quote but not accurate: Just one small but vital ommission...

It should read:

SELECT x.val 
FROM data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val))) = (COUNT(*)+1)/2


Further comment:

assumes (COUNT(*)+1)/2 returns an int, if not
then use INT((COUNT(*)+1)/2)
There are variations for the varied defs of
median too.



That lot could also be done on a unix command pipe using sort, head, wc, and bc/dc for those wishing a challenge for a minimalist solution!


Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 690741 · Report as offensive
Profile Uioped1
Volunteer tester
Avatar

Send message
Joined: 17 Sep 03
Posts: 50
Credit: 1,179,926
RAC: 0
United States
Message 690892 - Posted: 12 Dec 2007, 17:45:59 UTC

I seem to recall that S@H uses InnoDB tables, so this probably won't work for them, but I had to post this for completeness sake:

SET @medianRow = 0;

SELECT @medianRow = INT(table_rows/2) + 1
FROM INFORMATION_SCHEMA.TABLES
WHERE table_schema = '<dbName>'
  AND table_name = '<statsTable>';

SELECT <stat>
FROM <statsTable>
ORDER BY <stat>
LIMIT @medianRow, @medianRow


A couple of notes:
A) This defines the median as the middle _result_ not the middle _value_
B) It's been a while since I used mysql, so I may have some syntax errors
C) As I alluded to above, innodb tables maintain the rowcount as an estimate only, thus the median will be only an estimate.
D) It might be possible to turn off statistics generation for tables, and if so S@H probably has...

Performance Explanation:
This statement may or may not be faster than the average aggregate, depending on the index structure of the table. The average aggregate must by definition touch every row in the table (At best an index scan) whereas the query outlined above must at worst perform a table scan and a sort. On the other hand the best case for the query above is one index lookup.
ID: 690892 · Report as offensive

Message boards : Technical News : The Geese Fly West (Dec 10 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.