XML stats going weird


log in

Advanced search

Message boards : Number crunching : XML stats going weird

Author Message
Profile Toby
Volunteer tester
Avatar
Send message
Joined: 26 Oct 00
Posts: 1005
Credit: 5,622,795
RAC: 2
United States
Message 5587 - Posted: 9 Jul 2004, 6:33:09 UTC

Is anyone else watching the XML stats? I am in the process of making a website to display them (my profile URL) but I have noticed some strange things happening over the last couple of days. I only import users/teams with non-zero total credit. This GREATLY speeds things up all around. I just got the new XML files and it seems that the total number of 'active' users has declined since yesterday. 10,721 to 9,865. And the day before, I believe I had over 12,000 but I wasn't watching for anything unusual so I don't remember. Also the size of the downloaded files shrank. The user file used to be 150 MB to download and a gig when unzipped. Now it is 20 MB to download and 120 MB unzipped. The team file shrank as well along with the number of teams with non-zero total credit. Anyone got any clues?

-Confused

Profile bfarrant
Avatar
Send message
Joined: 4 Jun 99
Posts: 228
Credit: 36,710
RAC: 0
Canada
Message 5642 - Posted: 9 Jul 2004, 14:34:08 UTC

I know that there have been an awful lot of people with duplicate clients due to installation problems and the recent URL detach/re-attach debacle and that perhaps the file size was shrinking as a result of people mergeing their duplicate machines together - but I can't see that possibly accounting for the change in size you are talking about.

BTW, where do you get the files from? I've been wanting to do that myself, but the only link I have found to get the files from won't let me in because I'm not authorised. (it used to)


Profile Christopher Hauber
Avatar
Send message
Joined: 10 Feb 01
Posts: 196
Credit: 71,611
RAC: 0
United States
Message 5660 - Posted: 9 Jul 2004, 15:47:21 UTC

I don't know much about it, but I would say that there might be less ACTIVE users because there are probably a decent number of people who have either quit, or gone back to SETI Classic until most of the bugs have been worked out and/or they can actually consistantly get workunits, and thus now have an inactive status.

Profile Kirsten
Volunteer tester
Avatar
Send message
Joined: 7 Jul 00
Posts: 190
Credit: 565,264
RAC: 0
Denmark
Message 5683 - Posted: 9 Jul 2004, 16:45:57 UTC - in response to Message 5642.


> BTW, where do you get the files from? I've been wanting to do that myself, but
> the only link I have found to get the files from won't let me in because I'm
> not authorised. (it used to)

I would like to know, too. I've been doing stats for the members of "my" team since November 2003 during the BOINC Beta Test days and in the start of SAH2.

I cannot figure out why we - apparently just "commoners" in the eyes of the developers - should receive this message when trying to get some stats files: "Forbidden. You don't have permission to access /sah/stats/ on this server."

Profile Toby
Volunteer tester
Avatar
Send message
Joined: 26 Oct 00
Posts: 1005
Credit: 5,622,795
RAC: 2
United States
Message 5820 - Posted: 9 Jul 2004, 21:59:16 UTC - in response to Message 5660.

> I know that there have been an awful lot of people with duplicate clients
> due to installation problems and the recent URL detach/re-attach debacle
> and that perhaps the file size was shrinking as a result of people
> mergeing their duplicate machines together - but I can't see that
> possibly accounting for the change in size you are talking about.

This would ot affect the stats I'm looking at. I am only downloading the team and user files. The hosts file would be affected by people merging but I am not touching that one.

> I don't know much about it, but I would say that there might be less ACTIVE
> users because there are probably a decent number of people who have either
> quit, or gone back to SETI Classic until most of the bugs have been worked out
> and/or they can actually consistantly get workunits, and thus now have an
> inactive status.

I suppose I phrased that incorrectly... my current definition of 'active' is "more than 0 credit". So even someone who crunched a few BOINC units and then went back to classic would be 'active' on my site as long as they had recieved some granted credit. This may change in the future but thats how it is now.

I had a theory that maybe they shut everything down for maintenance while the stats files were being generated but the fact that the root elements are closed correctly seems to discount this theory. Unless maybe they shut down the database and the XML generating script was still running but with no more data from the database, it thought it was done, closed the tag and quit...

Profile Toby
Volunteer tester
Avatar
Send message
Joined: 26 Oct 00
Posts: 1005
Credit: 5,622,795
RAC: 2
United States
Message 6007 - Posted: 10 Jul 2004, 8:44:37 UTC

Well, looks like todays stats are back to normal. So barring any other witnesses, I am going to assume that I was correct in my last theory and proclaim myself supreme ruler of... errr... wait - nevermind :)

Profile bfarrant
Avatar
Send message
Joined: 4 Jun 99
Posts: 228
Credit: 36,710
RAC: 0
Canada
Message 6217 - Posted: 10 Jul 2004, 19:29:38 UTC - in response to Message 6007.

> Well, looks like todays stats are back to normal. So barring any other
> witnesses, I am going to assume that I was correct in my last theory and
> proclaim myself supreme ruler of... errr... wait - nevermind :)
>
Well, c'mon, let's here it. Man, give you KWSN'ers a shrubbery and you think you own the world. NI!







ColdRain
Send message
Joined: 28 Feb 02
Posts: 14
Credit: 1,092,701
RAC: 0
Belgium
Message 6225 - Posted: 10 Jul 2004, 20:14:41 UTC - in response to Message 5683.

> I cannot figure out why we - apparently just "commoners" in the eyes of the
> developers - should receive this message when trying to get some stats files:
> "Forbidden. You don't have permission to access /sah/stats/ on this server."

One cannot browse the directory, but the files are accessable ...

Dusty33
Volunteer tester
Avatar
Send message
Joined: 16 Jul 01
Posts: 24
Credit: 3,368,100
RAC: 0
United States
Message 6391 - Posted: 11 Jul 2004, 6:30:33 UTC - in response to Message 6225.

> > I cannot figure out why we - apparently just "commoners" in the eyes of
> the
> > developers - should receive this message when trying to get some stats
> files:
> > "Forbidden. You don't have permission to access /sah/stats/ on this
> server."
>
> One cannot browse the directory, but the files are accessable ...
>

How.... How do we get them???? It's been asked many time but what is the answer? Please somebody tell us or me how and what is needed to be done so that those of use that would like to make our own stat pages for our teams that work correctly.

Darren
Volunteer tester
Avatar
Send message
Joined: 2 Jul 99
Posts: 259
Credit: 275,618
RAC: 0
United States
Message 6394 - Posted: 11 Jul 2004, 6:42:57 UTC - in response to Message 6391.


> How.... How do we get them???? It's been asked many time but what is the
> answer? Please somebody tell us or me how and what is needed to be done so
> that those of use that would like to make our own stat pages for our teams
> that work correctly.

If you know the name of the file you need you can get it directly - you just can't see the list of filenames to pick and choose from.

The only filenames I know are user_id.gz, host_id.gz and team_id.gz

And be forewarned - they are big files. The user_id.gz file is about 150MB gzipped and opens to about 1 GB.



Dusty33
Volunteer tester
Avatar
Send message
Joined: 16 Jul 01
Posts: 24
Credit: 3,368,100
RAC: 0
United States
Message 6397 - Posted: 11 Jul 2004, 7:01:46 UTC - in response to Message 6394.

>
> > How.... How do we get them???? It's been asked many time but what is the
> > answer? Please somebody tell us or me how and what is needed to be done
> so
> > that those of use that would like to make our own stat pages for our
> teams
> > that work correctly.
>
> If you know the name of the file you need you can get it directly - you just
> can't see the list of filenames to pick and choose from.
>
> The only filenames I know are user_id.gz, host_id.gz and team_id.gz
>
> And be forewarned - they are big files. The user_id.gz file is about 150MB
> gzipped and opens to about 1 GB.
>

Thank you very much.

Profile Toby
Volunteer tester
Avatar
Send message
Joined: 26 Oct 00
Posts: 1005
Credit: 5,622,795
RAC: 2
United States
Message 6400 - Posted: 11 Jul 2004, 7:12:16 UTC

> Well, c'mon, let's here it. Man, give you KWSN'ers a shrubbery and
> you think you own the world. NI!
You mean we don't? Dang. I guess Lizard has been brainwashing us again!

I'm kind of wondering if berkeley turned off the directory listing to make it just a little more difficult to download the files since they probably don't want thousands of people downloading the stats every day...

In addition to the files Darren mentioned, there are also tables.xml which gives you a UNIX time stamp of the last time the files were updated - so you don't have to download the whole 150 MB just to find out if the stats have been updated. There is also db_dump.xml which just describes the other files in XML. those are the only ones I remeber from the short time when the directory listing was still turned on.

Now if I could just get a more efficient XML parser...

ColdRain
Send message
Joined: 28 Feb 02
Posts: 14
Credit: 1,092,701
RAC: 0
Belgium
Message 6477 - Posted: 11 Jul 2004, 12:04:30 UTC

Hmmm, I thought this was obvious ;-)
On http://setiweb.ssl.berkeley.edu/sah/stats.php there is a link to the used stats-format: http://boinc.berkeley.edu/db_dump.php The file describing the different files in de stats directory is db_dump.xml, it's right on that page, RTFM ;-ppp
So just download this XML file from the non-browsable stats folder, and you know which XML files are in it: user_id.gz, host_id.gz and team_id.gz
The db_dump.php page also mentions a tables.xml (containing the unix timestamp of last stats update), just like Toby said ;-)

RD
Send message
Joined: 20 May 99
Posts: 1
Credit: 68,334
RAC: 0
United Kingdom
Message 7149 - Posted: 12 Jul 2004, 20:32:53 UTC - in response to Message 6477.

> Hmmm, I thought this was obvious ;-)
> On http://setiweb.ssl.berkeley.edu/sah/stats.php there is a link to the used
> stats-format: http://boinc.berkeley.edu/db_dump.php The file describing the
> different files in de stats directory is db_dump.xml, it's right on that page,
> RTFM ;-ppp
> So just download this XML file from the non-browsable stats folder, and you
> know which XML files are in it: user_id.gz, host_id.gz and team_id.gz
> The db_dump.php page also mentions a tables.xml (containing the unix timestamp
> of last stats update), just like Toby said ;-)

Actually it's not "obvious ;-)" just getting to /sah/stats.php is a challenge as it's not linked from the help pages and the db_dump.xml file does not say what the file extension is either.
But thanks for the sarcasm, it was most helpful.

ColdRain
Send message
Joined: 28 Feb 02
Posts: 14
Credit: 1,092,701
RAC: 0
Belgium
Message 7316 - Posted: 13 Jul 2004, 7:49:53 UTC - in response to Message 7149.
Last modified: 13 Jul 2004, 7:52:06 UTC

>> Hmmm, I thought this was obvious ;-)
>> On http://setiweb.ssl.berkeley.edu/sah/stats.php there is a link to the
>> ... snip ...
>
> Actually it's not "obvious ;-)" just getting to /sah/stats.php is a challenge
> as it's not linked from the help pages and the db_dump.xml file does not say
> what the file extension is either.
> But thanks for the sarcasm, it was most helpful.

Actually, the Seti@Boinc home page has a link to "Statistics and leaderboards". The 4th link on that page is "Other statistics". Guess what, it links to http://setiweb.ssl.berkeley.edu/sah/stats.php .... Two clicks. A challenge ???

The db_dump.xml file says (had to use [ and ] to parse it in here):
[output]
[compression]gzip[/compression]
[/output]

Ofcourse, you need to know the extention for gzipped files is gz.


ColdRain
Send message
Joined: 28 Feb 02
Posts: 14
Credit: 1,092,701
RAC: 0
Belgium
Message 7407 - Posted: 13 Jul 2004, 21:34:14 UTC

The stats folder is browseable again !
Better be quick than sorry ;-)

Message boards : Number crunching : XML stats going weird

Copyright © 2014 University of California