Is there are way or a place to get the bulk data from: Statistics | Top Computers/Hosts?

Questions and Answers : Web site : Is there are way or a place to get the bulk data from: Statistics | Top Computers/Hosts?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1786904 - Posted: 12 May 2016, 4:04:22 UTC

I was looking at the Top Computers in the Statistics section for GPUs like mine to see if I could figure out the RAC range I "could" get with my old rig and my new GTX 750 Ti.

The listing is available for the top 10,000 PCs but only with 20 listed per page.

I tried copying a few pages into a spreadsheet but noticed that I was getting duplicate Computer IDs by the time I copied the next page.
At first, I thought it was because the pages were being updated at that moment so I waited a few hours only to come across the same issue.

Is there are way to modify the URL to get a longer listing?
If not, is there another place where I could get the same bulk data from: Statistics | Top Computers/Hosts that also includes the GPU details?

Cheers,
Rob :-}
ID: 1786904 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1786974 - Posted: 12 May 2016, 10:55:51 UTC - in response to Message 1786904.  

You can download the same statistics data that the statistics sites do from https://setiathome.berkeley.edu/stats/.
ID: 1786974 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1787092 - Posted: 12 May 2016, 20:10:40 UTC - in response to Message 1786974.  

Thanks Jord!

Just in case others are looking for the same info in the future, I noticed there are no instructions (ReadMe) on what to do with the files in the directory you provided.
I opened the compressed file users.gz and there is one file in it with no file extension so others might have no clue what format it is in.

I did a search in the forums and google for: ".berkeley.edu/stats/"
and the only relevant info I found was:

Your post from 2014: http://boinc.berkeley.edu/dev/forum_thread.php?id=8906
In it you provide a link to: http://boinc.berkeley.edu/trac/wiki/CreditStats

Personally, I don't have experience with dealing with XML files.
Are there other posts or links you could provide as to what an XML newbie should look into next?
FYI, I tried opening the
user
file mentioned above with MS XML Editor but it freezes. My guess is that the 100 MB file is too big for a free desktop app provided by M$.

Cheers,
Rob :-}
ID: 1787092 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1787108 - Posted: 12 May 2016, 21:27:57 UTC - in response to Message 1787092.  
Last modified: 12 May 2016, 21:29:34 UTC

Most of the XML files in the BOINC environment aren't real XML files, but use an XML made for BOINC specifically. So you ought to be able to open these files in Windows Notepad or better yet with the third party Notepad++, because that does execute the end of line characters, whereas Notepad will ignore them and put everything on one line.

Oh, and if Notepad++ can't open it either, there's always Total Commander's Lister which can really open anything, doesn't matter how big it is. (I used it to open an 18 GB log file).
ID: 1787108 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1789367 - Posted: 22 May 2016, 0:03:54 UTC - in response to Message 1787108.  
Last modified: 22 May 2016, 0:11:59 UTC

Thanks for letting me know about the Lister app; I'm impressed!!!

Here is where I am many days later:

The file host.gz is about 655 MB but when it is uncompressed, the host file is 4.3 GB!!!

I was able to import into Excel the last thousand records or so but that only gives me the last computers to attach to S@H during the last few years.
Excel has a limit of slightly more than 1 million rows so I tried to use MS Access ...which I hadn't used in many years!

I've been able to import the last third of the 4.3 GB host file into Access and unfortunately, the GPU details are stuck in the middle of the longest field between the BOINC version, and the VirtualBox version (when it is available).
Here's one of the longest entries in the "coprocs" field:
[BOINC|7.2.42][CUDA|GeForce GTX 660 Ti|1|2048MB|34043|101][CAL|AMD Radeon HD 7870/7950/7970/R9 280X series (Tahiti)|1|3072MB|1.4.1848|102][INTEL|Intel(R) HD Graphics 4000|1|1195MB||102][vbox|4.2.16]

Yes, that's 3 GPUs from 3 different vendors!!! see: http://setiathome.berkeley.edu/show_host_detail.php?hostid=6828567

So after toying with that for a bit too long, I refocused on the other fields for now to get familiar with a much newer version of MS Access.

The date and time fields are not what I have come across in the past.
Q1. Any idea where I can get info on converting something like: 1381254675.63566
All I have been able to figure out is that the period likely separates the date from the time ...but I am far from certain about that.

I have come across a few records that have inconsistencies, such as this one:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=6081632
where the details of the "p_vendor" has the data that should be in "p_model" field.
Q2. Is there someone I should report these inconsistencies to?

Any additional related info, thread or people who might know something about these pseudo-XML files would be greatly appreciated.

Cheers,
Rob :-}
ID: 1789367 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1789475 - Posted: 22 May 2016, 14:59:32 UTC - in response to Message 1789367.  

The date and time fields are not what I have come across in the past.
Q1. Any idea where I can get info on converting something like: 1381254675.63566

1381254675.63566 = 08 Oct 2013 17:51:15 GMT
http://www.onlineconversion.com/unix_time.htm


I have come across a few records that have inconsistencies, such as this one:
http://setiathome.berkeley.edu/show_host_detail.php?hostid=6081632
where the details of the "p_vendor" has the data that should be in "p_model" field.
Q2. Is there someone I should report these inconsistencies to?

No need to report - this computer is using very very old BOINC version 3.20.0

All this info is detected by BOINC running on the computer and then reported to the server in sched_request_setiathome.berkeley.edu.xml (check this file in your BOINC Data directory):

<host_info>
.......
<p_ncpus>3</p_ncpus>
<p_vendor>AuthenticAMD</p_vendor>
<p_model>AMD Athlon(tm) II X3 455 Processor [Family 16 Model 5 Stepping 3]</p_model>
<p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow</p_features>
<p_fpops>3350943396.226415</p_fpops>
<p_iops>7288347395.347965</p_iops>
.......
<os_name>Microsoft Windows XP</os_name>
<os_version>Professional x86 Edition, Service Pack 3, (05.01.2600.00)</os_version>
</host_info>

<coproc_ati>
   <count>1</count>
   <name>ATI Radeon HD 6570 (NI TURKS) {ASUS EAH6570/DI/1GD3(LP)}</name>
   <req_secs>144881.226448</req_secs>
   <req_instances>0.000000</req_instances>
   <estimated_delay>0.000000</estimated_delay>
   <target>18</target>
   <localRAM>1024</localRAM>
   <uncachedRemoteRAM>128</uncachedRemoteRAM>
   <cachedRemoteRAM>64</cachedRemoteRAM>
   <engineClock>800</engineClock>
   <memoryClock>1050</memoryClock>
   <wavefrontSize>64</wavefrontSize>
   <numberOfSIMD>6</numberOfSIMD>
   <doublePrecision>0</doublePrecision>
   <pitch_alignment>256</pitch_alignment>
   <surface_alignment>256</surface_alignment>
   <maxResource1DWidth>16384</maxResource1DWidth>
   <maxResource2DWidth>16384</maxResource2DWidth>
   <maxResource2DHeight>16384</maxResource2DHeight>
   <CALVersion>1.4.1646</CALVersion>
    <atirt_detected/>
</coproc_ati>



Any additional related info, thread or people who might know something about these pseudo-XML files would be greatly appreciated.

All stat sites process the same files, you may ask them
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1789475 · Report as offensive
Profile Stubbles
Volunteer tester
Avatar

Send message
Joined: 29 Nov 99
Posts: 358
Credit: 5,909,255
RAC: 0
Canada
Message 1790896 - Posted: 27 May 2016, 9:51:33 UTC - in response to Message 1789475.  

Thanks BilBg!

I have now been toying with finding the source of the "country" and "timezone" fields and your last post helped a lot.
The reason I went down that rabbit hole is because I noticed that some "user" have many "host" in different TimeZones.
There are even 200+ users with at least 2 hosts in 2 different timezones with
userid = 10248531 (pool.gridcoin.co) having over 250 hosts in 14 different timezones!

From what I can figure out, the "country" is specified by the user within their account setting on the website http://setiathome.berkeley.edu/home.php
...and the "timezone" by the host system clock (minus UTC) as per:
class HOST_INFO {
public:
    int timezone;                 // local STANDARD time - UTC time (in seconds)

so there is no way to determine with accuracy without the IP address the geographical area hosts are located since both fields can easily be set and changed by the user.

I was hoping to figure out where in the different time zones within Canada that the users and hosts are concentrated.
Any idea how I could figure that out with a high level of accuracy?
(please note that I am not looking for a specific location, just a general area with a country's different time zones and/or provinces)

Cheers,
Rob :-}
ID: 1790896 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1790997 - Posted: 27 May 2016, 18:38:01 UTC - in response to Message 1790896.  

Not "high level of accuracy" but you may spot the most active areas on this animation:
http://setiathome.berkeley.edu/kiosk/

"By country" map:
(I doubt "they" (stats sites) can supply better accuracy as IP is private and projects keep it secret)
http://boincstats.com/pl/stats/0/project/detail/country

("By time zones" may apply only to "wide" (W-E) countries (e.g. Russia) and not to small or "tall" (N-S) countries (e.g. Chile))

P.S.
The page at stats.free-dc do not work for me ("Loading Stats Data..." never finishes):
http://stats.free-dc.org/stats.php?page=countries&proj=sah
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1790997 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1791028 - Posted: 27 May 2016, 20:49:54 UTC - in response to Message 1790997.  

Not "high level of accuracy" but you may spot the most active areas on this animation:
http://setiathome.berkeley.edu/kiosk/

If you look at the page code Ctrl+U
view-source:http://setiathome.berkeley.edu/kiosk/

... at the bottom are these links:
http://setiathome.berkeley.edu/kiosk/js/in-locs.js
http://setiathome.berkeley.edu/kiosk/js/out-locs.js

They are tables/lists of
... {"la":"46.81228","lo":"-71.21454","cc":"CA"}, {"la":"51.45625","lo":"-0.97113","cc":"GB"}, ...

I think they change every day, you may save them and find a way to analyse.
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1791028 · Report as offensive

Questions and Answers : Web site : Is there are way or a place to get the bulk data from: Statistics | Top Computers/Hosts?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.