Panic Mode On (109) Server Problems?

Message boards : Number crunching : Panic Mode On (109) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 36 · Next

AuthorMessage
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1913140 - Posted: 15 Jan 2018, 6:49:57 UTC

34 channels left on the three remaining tapes. Could be getting dried up ...
ID: 1913140 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13835
Credit: 208,696,464
RAC: 304
Australia
Message 1913141 - Posted: 15 Jan 2018, 6:50:38 UTC - in response to Message 1913140.  
Last modified: 15 Jan 2018, 7:24:37 UTC

34 channels left on the three remaining tapes. Could be getting dried up ...

That'll take a load off of the servers.

Edit- now down to 24 on 1 file.
Edit- make that 16.
Edit- make that 0.
Last 6 are in progress.

No more till they load up some new files, hopefully tomorrow some time.
Grant
Darwin NT
ID: 1913141 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1913158 - Posted: 15 Jan 2018, 10:19:47 UTC - in response to Message 1913141.  

No more till they load up some new files, hopefully tomorrow some time.

Soon, I hope. Already below freezing and snow and colder temps are in the forecast. May have to add some Einstein when I awake ...
ID: 1913158 · Report as offensive
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1913162 - Posted: 15 Jan 2018, 10:38:51 UTC - in response to Message 1913158.  

No more till they load up some new files, hopefully tomorrow some time.

Soon, I hope. Already below freezing and snow and colder temps are in the forecast. May have to add some Einstein when I awake ...

Nothing coming down the pipe for hours now, "no tasks available".
Doesn't bode well for tomorrow's outrage.
Humans may rule the world...but bacteria run it...
ID: 1913162 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1913168 - Posted: 15 Jan 2018, 10:55:31 UTC - in response to Message 1913162.  

No more till they load up some new files, hopefully tomorrow some time.

Soon, I hope. Already below freezing and snow and colder temps are in the forecast. May have to add some Einstein when I awake ...

Nothing coming down the pipe for hours now, "no tasks available".
Doesn't bode well for tomorrow's outrage.

Yeah, no tapes in queue to get split, so no work until some get loaded, apparently manually.
I might have to borrow a cat to stay warm :)
ID: 1913168 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1913176 - Posted: 15 Jan 2018, 12:28:53 UTC

Eric tried to get things going last night by remote, but could not.
He said he will go at it again this morning after he confers with Jeff.

Meow.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1913176 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1913204 - Posted: 15 Jan 2018, 15:50:15 UTC - in response to Message 1913139.  

If I/O contention is the issue, I suppose it would depend on what I/O is causing the bottleneck. The GBT splitters are all on Centurion, and I don't see any other processes that would contend with them there. However, they must be hitting the BOINC DB, probably on Oscar, and feeding the split files to the scheduler over on Synergy. The file deleters are over on Georgem and Bruno, so it wouldn't seem as if they would contend with the splitters anywhere but at the DB.

Keep in mind they are all dealing with the same data files on the one sever, and the one database on the one server.
Also the rate work is being returned is also having an impact on things; when the received-last-hour falls back the splitters & deleters are able to both do a bit more work. There's a lot of disk activity when just a single WU is sent out, and then when it's result is returned. With 145k WUs per hour being returned & sent out- that's one hell of a file server load, not to mention the database keeping track of everything.
These comments have lead me to a possibly naive, but very basic question. Is BOINC truly scalable? My gut feeling, having never done a single credit on any other projects and thus don't know their volumes, is that it must be, but if so then is it a problem with how our work is configured and distributed, or how SETI was originally designed to do work, the hardware just isn't up to the task, or something else?

I believe (but don't quote me) that SETI was the original distributed computing project. Which is cool and all that, but that might carry some drawbacks as well. Being the originator or something (oh, I don't know, cell phones, "high speed" internet) has sometimes locks you into a certain, usually quite expensive set of circumstances, and when the 2nd gen of whatever comes along from competitors who take your thing and improve on it, you're often stuck with what you have, and it's very expensive to upgrade, especially if there has been a paradigm shift.

I am wondering if we are sort of in that situation right now? We were the first, blazed the trail and lead the way, and then those that followed us, saw it was great, but saw the shortcomings on how we initially did it as well, and then made adjustments and improvements to theirs that we couldn't easily make, due to the investment in time and treasure already invested, as well possibly in trying to keep with the stated goal that we are trying to support almost device, old to new (not really, but you know what I mean), and that inability to optimize due to this might be helping cause these headaches?

I honestly have no idea if what I proposed has any basis in reality or not, as I haven't been on the other side of things from the day I processed my first WU. I am just tossing out an idea as to what might be part of the reason we're dealing with DB issues and such. Is it hardware limitations? Software limitations? Inherent design limitations? I guess that in the scheme of all things computing, we really are pretty small fry. I mean, think of all the data that the NSA is processing every day. Yes, I know, look at their budget, all the hardware and personnel they can toss at any problem, I guess I'm just babbling about proof of concept, or maybe nothing at all, and just wanted to get some thoughts from others who know all of this stuff _Much_ more deeply than I will ever know.

ID: 1913204 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1913205 - Posted: 15 Jan 2018, 15:57:09 UTC

I wonder if they could sort the database and move most of it into an archive database. All the work that is done and has nothing left in the field, thus leaving a much more manageable database active and online. Then the weekly outage would just sort the active database and move whatever has been completed during the week to the archive.

I know this sounds too simple to be possible, but might that be possible? Some database inquiries would have to be rewritten to access the archived information.

I dunno, just spitballing.

Meow?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1913205 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14674
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1913206 - Posted: 15 Jan 2018, 16:00:07 UTC

Tapes are beginning to appear, slowly. IIRC, there's quite a long pre-processing phase, which we don't see: as they emerge from that, they pop up automatically in the splitter queue.
ID: 1913206 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51477
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1913207 - Posted: 15 Jan 2018, 16:03:06 UTC - in response to Message 1913206.  

Tapes are beginning to appear, slowly. IIRC, there's quite a long pre-processing phase, which we don't see: as they emerge from that, they pop up automatically in the splitter queue.

Well done, Eric and Jeff!
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1913207 · Report as offensive
Profile Chris904395093209d Project Donor
Volunteer tester

Send message
Joined: 1 Jan 01
Posts: 112
Credit: 29,923,129
RAC: 6
United States
Message 1913208 - Posted: 15 Jan 2018, 16:09:12 UTC

Looks like break time is over. Work is starting to slowly build up again.
~Chris

ID: 1913208 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11408
Credit: 29,581,041
RAC: 66
United States
Message 1913215 - Posted: 15 Jan 2018, 16:47:36 UTC - in response to Message 1913205.  

I wonder if they could sort the database and move most of it into an archive database. All the work that is done and has nothing left in the field, thus leaving a much more manageable database active and online. Then the weekly outage would just sort the active database and move whatever has been completed during the week to the archive.

I know this sounds too simple to be possible, but might that be possible? Some database inquiries would have to be rewritten to access the archived information.

I dunno, just spitballing.

Meow?

+1
ID: 1913215 · Report as offensive
Profile Piotr

Send message
Joined: 24 May 17
Posts: 18
Credit: 20,069,282
RAC: 41
Poland
Message 1913239 - Posted: 15 Jan 2018, 19:53:17 UTC

Hello,
Everytime then the WU are withdrown, my crunchers encounts the same behavior: the slower & less core one, which has few WU's left to crunch, receives new WUs , before the second cruncher (faster & more cores) which is totally withdrown of CPU WUs , receive it's new batch. Also when it receivs it got first GPUs (which some are already from the previous batch) and no CPU WUs. How schould I configure it for letting more CPU WUs be downloaded in advance instead of GPUs? Every outrage I would need some 100 CPU WUs more to have the work sustained on the faster machine.
ID: 1913239 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1913240 - Posted: 15 Jan 2018, 20:01:30 UTC
Last modified: 15 Jan 2018, 20:02:09 UTC

Results ready to send : one ????
EDIT: Just changed to zero.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1913240 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1913244 - Posted: 15 Jan 2018, 20:06:21 UTC - in response to Message 1913239.  

Hello,
Everytime then the WU are withdrown, my crunchers encounts the same behavior: the slower & less core one, which has few WU's left to crunch, receives new WUs , before the second cruncher (faster & more cores) which is totally withdrown of CPU WUs , receive it's new batch. Also when it receivs it got first GPUs (which some are already from the previous batch) and no CPU WUs. How schould I configure it for letting more CPU WUs be downloaded in advance instead of GPUs? Every outrage I would need some 100 CPU WUs more to have the work sustained on the faster machine.


Generally, if you allow both CPU and GPU work, the servers determine which to send you. You can temporarily turn off fetching of GPU WUs in BOINC (in Advanced mode) under the Activity tab. Until you turn GPU fetch back on, you will get only CPU WUs.
ID: 1913244 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1913245 - Posted: 15 Jan 2018, 20:06:36 UTC

So would now be a good time to panic?? ;)

Just kidding....why I have back up projects. Giving beta a test, thought I'm not sure what we are looking at there...(only 1 machine and only 1 work unit per card)...Einstein is getting a boost from this.
ID: 1913245 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1913261 - Posted: 15 Jan 2018, 21:43:25 UTC

My backup project is to improve the code and run local tests.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1913261 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1913262 - Posted: 15 Jan 2018, 21:58:03 UTC - in response to Message 1913261.  

My backup project is to improve the code and run local tests.
Recovering some of those 5,300+ ghosts you've created wouldn't be a bad use of your time, either. Having those locked away when other people are starving for work is not particularly helpful.
ID: 1913262 · Report as offensive
Profile Stargate (SA)
Volunteer tester
Avatar

Send message
Joined: 4 Mar 10
Posts: 1854
Credit: 2,258,721
RAC: 0
Australia
Message 1913263 - Posted: 15 Jan 2018, 22:14:27 UTC

How does one have so many errors and invalids?
ID: 1913263 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1913264 - Posted: 15 Jan 2018, 22:18:01 UTC - in response to Message 1913263.  

He is testing a cuda alpha application, so there is bound to be some teething issues.
ID: 1913264 · Report as offensive
Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 36 · Next

Message boards : Number crunching : Panic Mode On (109) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.