Panic Mode On (116) Server Problems?

Message boards : Number crunching : Panic Mode On (116) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 47 · Next

AuthorMessage
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 1989082 - Posted: 7 Apr 2019, 9:16:08 UTC

Carolyn error message

Warning: number_format() expects parameter 1 to be double, string given in /disks/carolyn/b/home/boincadm/projects/sah/html/seti_boinc_html/sah_status.php on line 606


bad numbers won't display with SetiathomeV8 column ..

blc22_2bit_guppi_58406_24925_HIP20215_0097 tape with a lot of errors ..
ID: 1989082 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13746
Credit: 208,696,464
RAC: 304
Australia
Message 1989084 - Posted: 7 Apr 2019, 9:26:48 UTC
Last modified: 7 Apr 2019, 9:33:27 UTC

Ready-to send-values are now more realistic, system has just managed to get a response from the Scheduler and report work. And the forums have sped up considerably.

In progress values are still way down.

Edit-
Ready-to-send values have gone back to massive numbers again.


This is proving to be a rockier than usual recovery.
Grant
Darwin NT
ID: 1989084 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 1989086 - Posted: 7 Apr 2019, 9:28:09 UTC

Carolyn Error corrected ^^
ID: 1989086 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1989087 - Posted: 7 Apr 2019, 9:32:42 UTC - in response to Message 1989080.  

I've managed to contact the scheduler and report all completed work - very slow response. It's not issuing new work yet ("Project has no tasks available"), but we seem to be making progress down the usual slow recovery route.

. . Same here!

Stephen
ID: 1989087 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1989088 - Posted: 7 Apr 2019, 9:56:56 UTC
Last modified: 7 Apr 2019, 10:05:14 UTC

Greetings,

What gives:
4/7/2019 4:47:24 AM | SETI@home | Requesting new tasks for CPU and NVIDIA GPU
4/7/2019 4:47:24 AM | SETI@home | [sched_op] CPU work request: 2592000.00 seconds; 12.00 devices
4/7/2019 4:47:24 AM | SETI@home | [sched_op] NVIDIA GPU work request: 216000.00 seconds; 1.00 devices
4/7/2019 4:47:25 AM | SETI@home | Scheduler request completed: got 0 new tasks
4/7/2019 4:47:25 AM | SETI@home | [sched_op] Server version 709
4/7/2019 4:47:25 AM | SETI@home | Project has no tasks available
4/7/2019 4:47:25 AM | SETI@home | Project requested delay of 303 seconds
4/7/2019 4:47:25 AM | SETI@home | [sched_op] Deferring communication for 00:05:03
4/7/2019 4:47:25 AM | SETI@home | [sched_op] Reason: requested by project

SETI@home v8 has nearly 2 million WUs ready to send out, and 84 AP to send. I have 2 computers and a tablet that are bone dry, sitting doing nothing. I suppose they could use the breather. ;) My Pis are getting close to being empty as well.

Why is it that every freakin' weekend there has to be some crisis or another? The project "has no tasks available" yet the status shows dang near 2 million. I don't get it. :|

Have a great day! :)

Siran

[edit]
Ok, someone is getting WUs. the APs went from 84 to 60 ready to send. I'm not getting any and I'll bet most, if not all, here are not either. :|
[/edit]
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1989088 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1989095 - Posted: 7 Apr 2019, 10:20:03 UTC - in response to Message 1989088.  

The project "has no tasks available" yet the status shows dang near 2 million. I don't get it. :|
I think the tasks are all in the warehouse, but the elves haven't brought them round to the front desk yet.
ID: 1989095 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1989096 - Posted: 7 Apr 2019, 10:22:30 UTC - in response to Message 1989095.  

The project "has no tasks available" yet the status shows dang near 2 million. I don't get it. :|
I think the tasks are all in the warehouse, but the elves haven't brought them round to the front desk yet.


I like the idea but as a retired truck driver I think its a matter of getting them to the shipping dock and loading the trailer :)
A proud member of the OFA (Old Farts Association).
ID: 1989096 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1989101 - Posted: 7 Apr 2019, 10:39:24 UTC - in response to Message 1989079.  


1. Even the newest servers are 3-4 years old.
2. Both the Informix and Mysql software used for the databases is considerably overstretched compared for what they were designed for
3. Why is it necessary to keep 20 year old data current when nothing is done with it
4. What happened to Nitpicker and Nebula
---edit----
Why not start by making a master backup of everything on SSD's and stick them in a cupboard. Then start again with new clean databases. make a business case for upgrading or replacing servers. Why not call in some 3rd party consultants to give an overview of what the project is trying to achieve, how it is going about it, and whether the h/w and s/w used is up to the task. Also review the methodology employed, why do we need schedulers, feeders, splitters, transitioners, validators, assimilators etc etc. We might have done 20 years ago, is it still valid now? How do other Boinc projects operate?


I am pretty sure that I have seen commentary on what happened to "Nitpicker" and what is happening with "Nebula".
Nitpicker died when a fast enough server wasn't funded.
Nebula is being run on an offsite Supercomputer. The ideas about a methodology and software for extracting a reliable up/down vote continue to evolve.

It is entirely possible from a production POV that we don't really need "all" the sciencetific data "online". Moving all the "static" data offline into an "unloaded" format as really big files might be useful in that the Infomix DB would not misbehave so often. The user/messages etc data in the MySQL databases could also be archived. Based on "last time accessed" userids with all related records could be unloaded/moved offline. As a byproduct, this would reduce the # of Seti Members by weeding out people who haven't used their Seti id's in a decade or more.

To answer your last paragraph, I would say we need a capitol campaign aimed at redesigning the way Seti goes about it. The problem is I am not sure how much of what Seti does is unique to Seti and how much is unique to BOINC. In any case we are probably talking about Millions of USD to hire the talent to do any kind of major redesign. Much less upgrade the operations hardware.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1989101 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1989104 - Posted: 7 Apr 2019, 10:49:24 UTC

Sun 07 Apr 2019 05:47:49 AM CDT | SETI@home | Scheduler request completed: got 0 new tasks
Sun 07 Apr 2019 05:47:49 AM CDT | SETI@home | [sched_op] Server version 709
Sun 07 Apr 2019 05:47:49 AM CDT | SETI@home | Project has no tasks available
Sun 07 Apr 2019 05:47:49 AM CDT | SETI@home | Project requested delay of 303 seconds

A proud member of the OFA (Old Farts Association).
ID: 1989104 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22220
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1989106 - Posted: 7 Apr 2019, 11:01:40 UTC - in response to Message 1989079.  

Chris, you demonstrate a degree of lack of understanding of the project situation and its finances in your last post, coupled with a high degree of negativity.

Let's take you four number points first:

1. Even the newest servers are 3-4 years old.

Then why not dig deep in your pockets and replace them, or at least get four new servers, and do a cascade so the newest are the main client facing ones, and the oldest are doing more mundane tasks.

2. Both the Informix and Mysql software used for the databases is considerably overstretched compared for what they were designed for

This statement is not correct as both are well within their limitations.

3. Why is it necessary to keep 20 year old data current when nothing is done with it

That's actually a fairly good point, however some of that data is required for the smooth running of the project (user records, tape records and the like) Sadly, the way the data is structured is not as amenable as it should be to splitting and hiving as you suggest.

4. What happened to Nitpicker and Nebula

Nitpicker was formally abandoned when Nebula came along, but had been abandoned a good few years ago as it was found to be impossible the the then current hardware. If you cared to read the Nebula thread you would see that work is being done, and there are steps being taken to scale up to a larger sample of the data collected over the last twenty years.

Why not start by making a master backup of everything on SSD's and stick them in a cupboard. Then start again with new clean databases. make a business case for upgrading or replacing servers. Why not call in some 3rd party consultants to give an overview of what the project is trying to achieve, how it is going about it, and whether the h/w and s/w used is up to the task. Also review the methodology employed, why do we need schedulers, feeders, splitters, transitioners, validators, assimilators etc etc. We might have done 20 years ago, is it still valid now? How do other Boinc projects operate?

I'll split this comment into a few sections.
SSDs in a cupboard - do you have the money to purchase them - given the amount of data the project has I would say a couple of million should do....
Third party consultants cost money, big money, very big money.
We do need all those processes, there are two ways of doing it, all in one big mass on some mega server, or do it on a distributed server mesh, which is less expensive, and easier to maintain. Even the likes of Amazon and Google use a number of servers to do different tasks.
Other projects are nowhere near as big, but, in the main, they have the same structure, albeit spread around their servers in different ways.

In the online world a small handful are getting all wound up over it, in the real world Seti is now seen as a 5 1/2 - 6 days/week project and we are closing fast to be below the 5% active users level. We've lost nearly 1000 users in the last few months. Stephen asks for a cure, well that is easier said than done. My view is that we simply cannot carry on with more of the same old. There is only so long you can put sticking plasters on a broken leg.


It's no good asking people here why people are leaving, or not joining the project. But I would hazard a guess at some of the reasons:
- Find the project isn't to their liking
- The lack of a screen-saver on the majority of GPUs
- They've just seen their power bill jump
- They read posts that say SETI is dead
-They realise that credits don't mean a thing
- They think that other projects credits are worth more
- The "bitcoin bubble" is bursting, and they realise it's now costing more to do one than they are getting back
- They are put off by the cold attitude and incorrect information they are given by some people.
(These are based on help desk questions over the last few months)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1989106 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1989109 - Posted: 7 Apr 2019, 11:18:03 UTC - in response to Message 1989101.  

Why not start by making a master backup of everything on SSD's and stick them in a cupboard. Then start again with new clean databases. make a business case for upgrading or replacing servers. Why not call in some 3rd party consultants to give an overview of what the project is trying to achieve, how it is going about it, and whether the h/w and s/w used is up to the task. Also review the methodology employed, why do we need schedulers, feeders, splitters, transitioners, validators, assimilators etc etc. We might have done 20 years ago, is it still valid now? How do other Boinc projects operate?
What that quote fails to take into account is that we have two quite different types of database, used for radically different purposes.

We have the 'Master Science Database' which holds 20 years of archived data, and is growing daily. This is the database which Nitpckr and Nebula would analyse: Nitpckr failed because the task was too huge to process in real time. Nebula is still under active development and periodic test. I'm in a position to get a bird's eye view of progress: the most recent change was three days ago

- change sky coverage logic so that a pointing is counted
    in all the pixels its beam overlaps, not just the center one
The other database is the one we see and I'm typing into right now: the BOINC database. This is the one which holds 3,361,245 tasks out in the field, and receives 256,108 tasks per hour (on current figures). That means the entire thing is re-written in about 13 hours (today), or every 36 hours in normal running. And it's the one which gives us the most trouble. "Sticking everything in a cupboard and starting again with a clean database" is what we do every Tuesday.
ID: 1989109 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1989115 - Posted: 7 Apr 2019, 11:54:54 UTC
Last modified: 7 Apr 2019, 11:55:51 UTC

Something seems to be happening, one machine just downloaded 100 GPU tasks!!

Nothing yet for the other 2 and 7/8ths ;-)

Edit:
Linux box now downloading!!
ID: 1989115 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1989116 - Posted: 7 Apr 2019, 12:00:33 UTC - in response to Message 1989095.  

The project "has no tasks available" yet the status shows dang near 2 million. I don't get it. :|
I think the tasks are all in the warehouse, but the elves haven't brought them round to the front desk yet.

lol :)
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1989116 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1989120 - Posted: 7 Apr 2019, 12:27:37 UTC - in response to Message 1989088.  

Greetings,
[edit]
Ok, someone is getting WUs. the APs went from 84 to 60 ready to send. I'm not getting any and I'll bet most, if not all, here are not either. :|
[/edit]


. . It's that one guy/gal in 100,000, rather like winning the lottery. :)

Stephen

:)
ID: 1989120 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1989121 - Posted: 7 Apr 2019, 12:43:20 UTC - in response to Message 1989115.  

Something seems to be happening, one machine just downloaded 100 GPU tasks!!

Nothing yet for the other 2 and 7/8ths ;-)

Edit:
Linux box now downloading!!


. . Well Bernie you seem to be ahead of the curve ... still getting nada here :(

Stephen

:(
ID: 1989121 · Report as offensive
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 766
Credit: 354,398,348
RAC: 11,693
United States
Message 1989122 - Posted: 7 Apr 2019, 13:23:48 UTC

Why does my slower machine always seem to get tasks first after these upsets? At least it gave me a chance to upgrade the wifi on the beast.
ID: 1989122 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1989123 - Posted: 7 Apr 2019, 13:24:28 UTC
Last modified: 7 Apr 2019, 13:26:48 UTC

Oh joy!!! One of my Pis just got a few WUs. None of the others or the PCs or tablet got any though. The Pi that got some wasn't even out of WUs. :|

Have a great day! :)

Siran

[edit]
My Winders PC got some GPU WUs just now. None of the rest... again.
[/edit]
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1989123 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1989124 - Posted: 7 Apr 2019, 13:25:37 UTC

The machine I had that was completely dry now has 18 gpu tasks "in progress" :) Not going to last very long though.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1989124 · Report as offensive
Boiler Paul

Send message
Joined: 4 May 00
Posts: 232
Credit: 4,965,771
RAC: 64
United States
Message 1989126 - Posted: 7 Apr 2019, 13:28:14 UTC

Well it's good that some are getting tasks, but alas I'm still getting "project has no tasks available"
ID: 1989126 · Report as offensive
Sixkid

Send message
Joined: 10 Jan 12
Posts: 17
Credit: 8,248,305
RAC: 21
Netherlands
Message 1989127 - Posted: 7 Apr 2019, 13:44:41 UTC

Hi , just wanted to post that i dont get any downloads for SOG and while typing i recieved a few SOG files for my vidcard.
Yesterday all files where done and waiting to be uploaded for a very long time,
after a retry all of em got send and i recieved a few CPU tasks.
Can someone pat the server gently and see if it has to burp ?

Happy Weekend
ID: 1989127 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 47 · Next

Message boards : Number crunching : Panic Mode On (116) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.