Message boards :
Number crunching :
Panic Mode On (116) Server Problems?
Message board moderation
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 47 · Next
Author | Message |
---|---|
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
For those interested about how much processing power is needed roughly to keep up with the incoming work a post dated 16 Jan 19 saying . . But that is the crunching that the whole project is presently completing in one day, not the data that GBT is producing in one day :). Since the daily output data from one GBT channel (blcnn) currently takes about a week to 10 days to get through and there are at least 24 channels being recorded then it would take about 170 days or more to complete, which translates to something like 200,000 plus 'Titan V or Gtx2080ti' cards at full tilt to clear in one day. That would be "real time" processing. So when we have at least 100,000 active volunteers, each running at least 2 of the above cards working 24/7/52 we will be able to keep up with the input data from GBT. Of course we will need the same again to cater for the data from Parkes when it is finally online and another smaller contingent to deal with the data than comes from Arecibo. When we actually do have the volunteer workforce to do all that work, where the heck do we find the servers to cope with it???? . . Sadly 'real time' processing is at this point in time nothing more than a pipe dream. Stephen :( |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
I am sure if Jeff, Eric or Matt could give us a list of parts required there would be people prepared to start a fundraiser to pull a system that could cope with the work volume |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
That was easy. I just got 10 of my friends to stop crunching for SETI. Interesting reaction, mine was to convert my linux box to Windows 10, just brought a new SSD and Win 10 Licence, ( I am letting my tasks run down rather than abort) and will spruce up my 2 old Dell's, possible new MB's SSD's and modern processors, during the summer so that I don't have to put the heating on next winter ;-) |
rob smith Send message Joined: 7 Mar 03 Posts: 22199 Credit: 416,307,556 RAC: 380 |
Right, after a few hours to cool things down this thread is back to life. Now, remember, don't stray too far discussing server issues, an do obey all the forum rules. And above all else have fun Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
It is nice to see the higher return rate of over 127,000 this I can only assume means that the multibeam work is being returned. My pending validation is the highest it has been in a long time at 180. This is due to having lots of multibeam work which form a band between 3 and 5 minutes per task |
Sleepy Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360 |
We are back! |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
We are back! HURRAY!!!! A proud member of the OFA (Old Farts Association). |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I'm seeing Stalled downloads on all machines. Of course, you can't download any new work with a stalled download... |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes all my hosts have stalled downloads with accumulated 6 hours of elapsed time trying to download. So right around 3 AM my local time the servers couldn't access the requested work. I've had this problem for about a week now. Solution is to stop and restart BOINC on the affected hosts and the tasks come down finally at restart. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Yes all my hosts have stalled downloads with accumulated 6 hours of elapsed time trying to download. So right around 3 AM my local time the servers couldn't access the requested work. I've had this problem for about a week now. Solution is to stop and restart BOINC on the affected hosts and the tasks come down finally at restart.I'd be interested to hear how you diagnosed cause and effect - what was stalled, and how did restarting BOINC clear it? FWIW, I've had a look around the systems here - seven machines, four projects, and nothing is stuck, either upload or download. None of the machines have had a BOINC restart since Patch Wednesday, except the single one I'm testing #3076 on. Oh, and one I moved to a different room last week. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, earlier this morning all was well. It just began recently, and it does clear them by selecting all of them and abusing the retry button. However, it is reoccurring; Wed 24 Apr 2019 12:17:17 PM EDT | SETI@home | Reporting 5 completed tasks |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
So, which 'transient HTTP error' is it throwing this time? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
So, which 'transient HTTP error' is it throwing this time? . . That I cannot say, but I just noticed that it has happened here too at about 1:40 am AEST. It seems to have lasted for a short while but somehow self-corrected. Stephen ? ? |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
looks like we're back (forums at least) Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes all my hosts have stalled downloads with accumulated 6 hours of elapsed time trying to download. So right around 3 AM my local time the servers couldn't access the requested work. I've had this problem for about a week now. Solution is to stop and restart BOINC on the affected hosts and the tasks come down finally at restart.I'd be interested to hear how you diagnosed cause and effect - what was stalled, and how did restarting BOINC clear it? FWIW, I've had a look around the systems here - seven machines, four projects, and nothing is stuck, either upload or download. None of the machines have had a BOINC restart since Patch Wednesday, except the single one I'm testing #3076 on. Oh, and one I moved to a different room last week. What I am describing is not the normal "stalled" download. The tasks are labelled "active" with no backoff and 0% progress. The elapsed timer continues to count and by the time I notice them is in the order of several hours since they stalled during the night when I'm asleep. The fix is to stop BOINC, wait for the client to fully stop in the System Monitor and then restart BOINC. The host immediately finishes the "stuck" downloads as soon as the client initializes. Normally less than a half a dozen is common. The host is continuously downloading and uploading normally all during this time except for these "stuck" downloads. They are being ignored until BOINC is stopped and restarted. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
So, which 'transient HTTP error' is it throwing this time? I wonder if the cause was a problem with a bad hard drive in the Seti servers which is probably the reason for the RAID rebuild. A bad hard drive could have been the reason why my requests for a downloaded task goes unfulfilled from the servers until I ask for it again with a BOINC restart. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Looking at the Haveland graphs I see that the servers have been having conniptions again for the last couple of hours. I myself didn't have any stalled downloads, just the instant timeout resulting in daily backoffs variety. Much abuse of the retry button seems to be clearing the backlog, and has managed to result in 3 stalled downloads- where the Elapsed time counts down, but nothing is actually happening- after about 2min they eventually downloaded. Grant Darwin NT |
Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489 |
Well I havn't had any download backoffs for the last hour now so I guess that the problem earlier is somehow fixed. Cheers. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Well I havn't had any download backoffs for the last hour now so I guess that the problem earlier is somehow fixed. Or at least resolved itself. The last couple of requests for work have downloaded without assistance. Download speeds are still slower than usual, but at least they are downloading. Grant Darwin NT |
Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 |
could the download problems be caused by too many people trying to get WUs all at the same time? Too many connections at once? Is it usually after an outage like we had today and yesterday? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.