Panic Mode On (115) Server Problems?

Message boards : Number crunching : Panic Mode On (115) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 31 · Next

AuthorMessage
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1987092 - Posted: 25 Mar 2019, 18:35:22 UTC

Greetings,

I don't get it! WUs are at 100% progress and then go into resend mode. How the heck does THAT work? The only thing I can think of, which doesn't sound logical, is that the data was sent 100% from the host yet the recipient has not gotten it so the WU goes back into resend mode.

Have a nice day, if you can. ;)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1987092 · Report as offensive
Cherokee150

Send message
Joined: 11 Nov 99
Posts: 192
Credit: 58,513,758
RAC: 74
United States
Message 1987098 - Posted: 25 Mar 2019, 19:11:02 UTC - in response to Message 1987092.  
Last modified: 25 Mar 2019, 19:19:09 UTC

Hi Siran.
From the technical side of the Internet, this most likely means either the receiver never sent the acknowledgement that the final packet in the file was received properly, or that the last packet was not received before the time limit was reached. Of course, there could be a number of reasons why the file was not received correctly or within the required time fame. That is most likely what someone is, or soon will be, researching at Berkeley.

I hope that they will find the problem quickly, and that the fix is easy to do.

I have found today that (if you have the time), going into the Transfer tab, selecting all the units trying to upload, then clicking the "Retry Now" button each time all have backed off, will get them completed within a few minutes of retrying. This only works when the cause of the failed updates is related to certain types of technical problems. Unfortunately, many times the issue at hand during one of these events will preclude the success of this retry procedure. Today, though, we are in luck, i.e., the retry6 does work.

I hope this helps.
ID: 1987098 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 1987102 - Posted: 25 Mar 2019, 19:20:09 UTC - in response to Message 1987098.  

Hi Siran.
From the technical side of the Internet, this most likely means either the receiver never sent the acknowledgement that the final packet in the file was received properly, or that the last packet was not received before the time limit was reached. Of course, there could be a number of reasons why the file was not received correctly or within the required time fame. That is most likely what someone is, or soon will be, researching at Berkeley.

I hope that they will find the problem quickly, and that the fix is easy to do.

I have found today that (if you have the time), going into the Transfer tab, selecting all the units trying to upload, then clicking the "Retry Now" button each time all have backed off, will get them completed within a few minutes of retrying. This only works when the cause of the failed updates is related to certain types of technical problems. Unfortunately, many times the issue at hand during one of these events will preclude the success of this retry procedure. Today, though, we are in luck, i.e., the retry6 does work.

I hope this help.

Hi Cherokee,

Yeah, that's about what I was getting at. I guess it was more logical than I thought. ;)

I do try the "Retry Now" and some do get through. I don't like to do it too often though since there are thousands of us trying to upload our finished work. :)

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 1987102 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1987104 - Posted: 25 Mar 2019, 19:27:46 UTC

splitting isn't happening properly either from the looks of it. It is around lunchtime for the Seti crew. Does anyone know if they are aware there is an issue??
ID: 1987104 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1855
Credit: 268,616,081
RAC: 1,349
United States
Message 1987105 - Posted: 25 Mar 2019, 19:36:12 UTC - in response to Message 1987092.  

I don't get it! WUs are at 100% progress and then go into resend mode. How the heck does THAT work?


That last Ack/Nak to finish the transfer either didn't get sent or got eaten by a router. Quite a few also die in mid-stream. All it takes is to time out on retries looking for that last handshake.
ID: 1987105 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1987106 - Posted: 25 Mar 2019, 19:40:13 UTC - in response to Message 1987104.  

I haven't seen any notice or post that anyone has contacted staff. So they might not be aware of the issue if they haven't looked at the forums or the servers stats.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1987106 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11366
Credit: 29,581,041
RAC: 66
United States
Message 1987109 - Posted: 25 Mar 2019, 19:55:22 UTC
Last modified: 25 Mar 2019, 19:56:56 UTC

My avg. work done as is reported on Boinc manager has been stuck for several hrs. This is not normal. Here on the web site it moves around as is normal.
ID: 1987109 · Report as offensive
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 1987110 - Posted: 25 Mar 2019, 20:11:10 UTC - in response to Message 1986953.  

Try looking at who #28 is listed as belonging to, https://setiathome.berkeley.edu/top_hosts.php?sort_by=expavg_credit&offset=20
ID: 8682108
#28 Magne 171,989.52 8,889,326 7.8.3
GenuineIntel Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz [Family 6 Model 60 Stepping 3] (8 processors)
NVIDIA GeForce GTX 1060 6GB (4095MB) driver: 418.43 OpenCL: 1.2
Linux Ubuntu Ubuntu 18.04.2 LTS [4.15.0-46-generic]

Now look at #8, https://setiathome.berkeley.edu/top_hosts.php
ID: 8690734
#8 Magne 363,269.95 18,080,802 7.4.44
GenuineIntel Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz [Family 6 Model 60 Stepping 3] (8 processors)
[5] NVIDIA GeForce GTX 1060 6GB (4095MB) driver: 418.43
Linux 4.15.0-46-generic

Now note Magne only has One Linux Machine, https://setiathome.berkeley.edu/hosts_user.php?userid=118177
You should also note the RAC on his User page, Recent average credit 199,322.93
Most people would conclude it is the SAME machine.

You can also look at Juan's page and note the Credit; https://setiathome.berkeley.edu/show_user.php?userid=8606388
Total credit: 450,616,983
But when you look at his host, https://setiathome.berkeley.edu/hosts_user.php?userid=8606388
ID: 8662921: Total credit: 1,375,790,492
So, how does a Total of 450,616k translate to a Host of 1,375,790k?
Strange things around here...


possibliy using more than one session of boinc and then make fusion between them ?
ID: 1987110 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51469
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1987111 - Posted: 25 Mar 2019, 20:11:11 UTC

Eric is working on the problems.

Meow!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1987111 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1987115 - Posted: 25 Mar 2019, 20:56:17 UTC - in response to Message 1987111.  

Eric is working on the problems.

Meow!

Thanks for the installation Mark
ID: 1987115 · Report as offensive
rcthardcore

Send message
Joined: 23 Nov 08
Posts: 48
Credit: 1,306,006
RAC: 0
United States
Message 1987116 - Posted: 25 Mar 2019, 21:01:31 UTC

Uploads are definitely very iffy today.
ID: 1987116 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1987119 - Posted: 25 Mar 2019, 21:22:19 UTC - in response to Message 1987024.  

. . Aaahh! Here we go again, 24 hours (or so) to maintenance outage and the uploads are playing up .... :(

Just like a switch was flipped ...


. . Yep, one minute everything is fine, then no uploads without kicking them over and over ... :(

Stephen

:(
ID: 1987119 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1987120 - Posted: 25 Mar 2019, 21:22:31 UTC - in response to Message 1987110.  
Last modified: 25 Mar 2019, 21:36:56 UTC

Look at the User Stats. A User can't have a Host with better stats than the User. Here's Magne's stats, https://boincstats.com/en/stats/-1/user/detail/204994/lastDays
The machine that was at #28 had OpenCL installed, the machine that appeared at #8 Doesn't have OpenCL installed. Not having OpenCL is common on a New Build as the installer doesn't install OpenCL by default.
We Know what happened to Juan. He broke his One computer's networking and had to reinstall the OS. When it came back online with the New system the SETI Server gave it a New ID with an Extra One BILLION Credits and Over an Extra One MILLION RAC. Juan wasn't doing anything out of the ordinary, certainly not 'fusioning'. It appears the Exact same thing happened with Magne, it blinked out with an RAC of around 172k and blinked back in with an RAC of 363k which is impossible for that machine, and his User Stats.

Try looking at who #28 is listed as belonging to, https://setiathome.berkeley.edu/top_hosts.php?sort_by=expavg_credit&offset=20
ID: 8682108
#28 Magne 171,989.52 8,889,326 7.8.3
GenuineIntel Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz [Family 6 Model 60 Stepping 3] (8 processors)
NVIDIA GeForce GTX 1060 6GB (4095MB) driver: 418.43 OpenCL: 1.2
Linux Ubuntu Ubuntu 18.04.2 LTS [4.15.0-46-generic]

Now look at #8, https://setiathome.berkeley.edu/top_hosts.php
ID: 8690734
#8 Magne 363,269.95 18,080,802 7.4.44
GenuineIntel Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz [Family 6 Model 60 Stepping 3] (8 processors)
[5] NVIDIA GeForce GTX 1060 6GB (4095MB) driver: 418.43
Linux 4.15.0-46-generic

Now note Magne only has One Linux Machine, https://setiathome.berkeley.edu/hosts_user.php?userid=118177
You should also note the RAC on his User page, Recent average credit 199,322.93
Most people would conclude it is the SAME machine.

You can also look at Juan's page and note the Credit; https://setiathome.berkeley.edu/show_user.php?userid=8606388
Total credit: 450,616,983
But when you look at his host, https://setiathome.berkeley.edu/hosts_user.php?userid=8606388
ID: 8662921: Total credit: 1,375,790,492
So, how does a Total of 450,616k translate to a Host of 1,375,790k?
Strange things around here...


possibliy using more than one session of boinc and then make fusion between them ?
ID: 1987120 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1987121 - Posted: 25 Mar 2019, 21:27:53 UTC - in response to Message 1987085.  

I'm also getting a lot of my uploads stuck at 100% while others timeout and just go into retry loops.


. . They are the most frustrating, after kicking them and kicking them you see one get to 100%, then ... nada, it just sits there taunting you ... aaarrggghh!

Stephen

:(
ID: 1987121 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22260
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1987123 - Posted: 25 Mar 2019, 21:37:28 UTC

We've seen this sort of things a few times when the database gets its bits in a twist, and that has been followed by a disk handing its notice in....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1987123 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1987142 - Posted: 25 Mar 2019, 23:46:56 UTC

welp, the 1mil system is out of work, with about 2800 uploads just waiting to go through.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1987142 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1987145 - Posted: 25 Mar 2019, 23:57:40 UTC - in response to Message 1987142.  

Are they all reported?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1987145 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1855
Credit: 268,616,081
RAC: 1,349
United States
Message 1987147 - Posted: 26 Mar 2019, 0:08:32 UTC - in response to Message 1987145.  

Are they all reported?

Reporting happens after upload is complete, right? At least here :)
ID: 1987147 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1987150 - Posted: 26 Mar 2019, 0:17:31 UTC - in response to Message 1987145.  
Last modified: 26 Mar 2019, 0:18:16 UTC

Nope. That’s the problem. They won’t upload (well they do eventually. Just a lot slower than normal). So I can’t report them. And with so many uploads pending, I haven’t been able to download new tasks all day. So I’ve run dry. Only the slower systems have been able to keep a low enough cache of pending uploads to still get downloads.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1987150 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1987152 - Posted: 26 Mar 2019, 0:20:29 UTC - in response to Message 1987150.  

I've been babysitting the farm all day. As long as a couple get uploaded, I can then do an update to report 100 tasks. Then I can get more work.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1987152 · Report as offensive
Previous · 1 . . . 23 · 24 · 25 · 26 · 27 · 28 · 29 . . . 31 · Next

Message boards : Number crunching : Panic Mode On (115) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.