ntpckr questions

Message boards : Number crunching : ntpckr questions
Message board moderation

To post messages, you must log in.

AuthorMessage
merle van osdol

Send message
Joined: 23 Oct 02
Posts: 809
Credit: 1,980,117
RAC: 0
United States
Message 1574436 - Posted: 19 Sep 2014, 17:59:50 UTC

This is a quote from ntpckr FAQ:

In the past we didn't have the computing power to do this kind of analysis in "real time," i.e. as new data arrive. Instead there was a painful process of accumulating resources to analyze our entire, large data set in one pass - what we called "turning the crank." Finding pairs (or triplets, etc.) of similar signals in a database of billions of singals is quite difficult - imagine playing a game of concentration with billions of cards. That's why it usually took years between each time we turned the crank, during which our hardware/tools evolved enough to require reinventing the wheel again and again. However the NTPCkr keeps up with incoming data close to real time, updating our list of most interesting candidates every day (or hour, or minute!).

Is there like a target date of when the next iteration of ntpckr will appear?
What does it depend on talent, money, advances in basic hardware/software?
ID: 1574436 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1574453 - Posted: 19 Sep 2014, 18:25:31 UTC
Last modified: 19 Sep 2014, 18:27:59 UTC

My understanding is.. they got a donated, spec-built server specifically for ntpckr to be run, but shortly after firing that process up, it just about crippled the science database, so it was turned off.

Then work (and more hardware donations) began on improving the I/O throughput for the database, and we now have the hardware for the database server that can handle immense I/O loads, and then it was discovered that the I/O limitations are the software itself (or the drivers for the hardware), which needs tweaking and fine-tuning, or for alternatives to be found and tested and eventually deployed.

This is the stage where I believe it still is. I believe it was generally decided that the database software was not particularly designed for a database as large as what we have, so we are in "uncharted territory" so to speak.


However, I have suggested once before that if ntpckr is such a load on the database, why not use a copy of the weekly back-up and put it directly on the ntpckr machine and let it chew through that for however long it takes for it to get "caught up," and then throw an incremental update to the database at it and it will chew through that in short order, maybe do one more incremental update, and from there, it should be able to handle "near real time" with minimal impact.

But.. I don't know if that's even a feasible option or not.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1574453 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1574465 - Posted: 19 Sep 2014, 18:43:43 UTC - in response to Message 1574453.  

My understanding is.. they got a donated, spec-built server specifically for ntpckr to be run, but shortly after firing that process up, it just about crippled the science database, so it was turned off.

Then work (and more hardware donations) began on improving the I/O throughput for the database, and we now have the hardware for the database server that can handle immense I/O loads, and then it was discovered that the I/O limitations are the software itself (or the drivers for the hardware), which needs tweaking and fine-tuning, or for alternatives to be found and tested and eventually deployed.

This is the stage where I believe it still is. I believe it was generally decided that the database software was not particularly designed for a database as large as what we have, so we are in "uncharted territory" so to speak.


However, I have suggested once before that if ntpckr is such a load on the database, why not use a copy of the weekly back-up and put it directly on the ntpckr machine and let it chew through that for however long it takes for it to get "caught up," and then throw an incremental update to the database at it and it will chew through that in short order, maybe do one more incremental update, and from there, it should be able to handle "near real time" with minimal impact.

But.. I don't know if that's even a feasible option or not.

I think Matt had mentioned they were looking into splitting the DB into smaller chunks. So that they could actually work with it.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1574465 · Report as offensive
merle van osdol

Send message
Joined: 23 Oct 02
Posts: 809
Credit: 1,980,117
RAC: 0
United States
Message 1574545 - Posted: 19 Sep 2014, 20:18:38 UTC - in response to Message 1574465.  

I, of course, am not an expert in these matters, BUT it sure sounds like this problem is not at all insurmountable even in the short term. Well somebody has to do the work. I can't, we can't, but somebody is responsible to an awful large force of people and machines who are doing their part. There must be some way out of this logjam. Can we at least dialogue about it? Yes, I am new here and I don't know much about all this, BUT maybe you need a new voice to ask these questions? Just my 2 pennies and it ain't worth much, that I'll concede.

I shouldn't even be sending this and I know it, but can't help it. Sorry.
ID: 1574545 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1574579 - Posted: 19 Sep 2014, 21:13:50 UTC - in response to Message 1574545.  

I, of course, am not an expert in these matters, BUT it sure sounds like this problem is not at all insurmountable even in the short term. Well somebody has to do the work. I can't, we can't, but somebody is responsible to an awful large force of people and machines who are doing their part. There must be some way out of this logjam. Can we at least dialogue about it? Yes, I am new here and I don't know much about all this, BUT maybe you need a new voice to ask these questions? Just my 2 pennies and it ain't worth much, that I'll concede.

I shouldn't even be sending this and I know it, but can't help it. Sorry.

I'd say it's not insurmountable but there are limits. Manpower, time and as stated above technological issues that crop up along the way. I have no doubt the folks in the lab are chipping away at things, it just takes some time. Imagine if you were building a new cruncher but had only ten minutes each day to work on it. It'd take a while to get it finished :)

Member of the People Encouraging Niceness In Society club.

ID: 1574579 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1574624 - Posted: 19 Sep 2014, 21:39:31 UTC

Well since you are new here and don't know much about the history, in short.. the S@H program was a project that was started and only funded/planned/scheduled to last just a few short years. Once the initial funding/grant ran out, it has continued on ever since solely on donations, either from any of us on the forum sending $5 their way, or, for instance, the GPU Users Group (GPUUG) pooling resources together to buy all the parts and assemble a complete server (or have one custom-built through HP so it has a real warranty) to donate so the project can keep going.

In short, the three employees are basically part-time volunteers. My memory may be a little fuzzy, but I think the last I heard, there's three main people that effectively share the paycheck of just one part-time employee. So of course, along the way, there have been improvements that require less human intervention, specifically, physical intervention (network KVMs, being able to "telnet" in and kick things, and in extreme circumstances, the power strips are also networked, so they can power cycle individual outlets to power cycle a server remotely).

The move to the co-location facility resolved several issues. It allowed for full use of the gigabit connection that we've had for many years, but could only use 100mbit of it, it solves the problem with the barely-adequate, not terribly reliable air-conditioning system that was "up the hill" in the lab, and best of all, the co-lo is staffed 24/7/365, so as soon as a problem arises, somebody is there to respond and put in the order for it to be fixed.

But the bottom line is, these guys have a job/task/duty log that would easily fill several months of 40 hours/week work weeks for all three of them, but they're not getting paid anywhere near enough for that, and often, don't have much time to spend working on improvements since the little bit of time they DO have is spent trying to fix problems that have come up.

This whole distributed computing concept is an ongoing experiment, essentially. New problems are always popping up, so those need fixing first before forward progress can be made. Like Sparky said.. imagine you were putting together a new cruncher and you had none of the parts, and you had to research everything, as well. Now imagine that you can only work on that for 3-5 minutes every other day. What seems like something that should only take maybe an hour to research and order parts, and then 2 hours to assemble parts, and another 2 hours to install an OS and get the software going is going to be stretched out into a 6-months process. And by the time you have a finished product, half the hardware has been replaced by better components, so you basically have to start over if you still want the "bleeding edge" of what's available.

If you want a really good insight to what's been going on and the problems that have been faced for the past few years, go over to the Technical News and start reading from about 2009 or so. That'll give you a pretty good picture of the past 3-5 years.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1574624 · Report as offensive
merle van osdol

Send message
Joined: 23 Oct 02
Posts: 809
Credit: 1,980,117
RAC: 0
United States
Message 1574640 - Posted: 19 Sep 2014, 21:58:37 UTC - in response to Message 1574624.  

Thanks guys for giving of your time to respond to my questions. Now it is easy to understand 'how things are'. Also, thanks for not flaming me. I sort of expected it.
ID: 1574640 · Report as offensive

Message boards : Number crunching : ntpckr questions


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.