The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 57 · 58 · 59 · 60 · 61 · 62 · 63 . . . 94 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2029491 - Posted: 27 Jan 2020, 4:42:28 UTC - in response to Message 2029485.  
Last modified: 27 Jan 2020, 5:30:57 UTC

Stuck Uploads....ALL my machines have Many Uploads waiting Retry.

Something changed a few minutes ago. The machine I had placed in 'Suspend GPU' for over an hour was finally able to Upload the Pages of files the Server had been refusing to take. So, I resumed crunching and so far there are only a few Uploads waiting for Retry instead of Pages (Hundreds). The other machines look better as well. We'll see how long this lasts...

...20 Minutes later the Server is back to refusing Uploads. I suspended the GPUs again after the Retries exceeded a page in length.

Now it seems to be nearly normal again.
ID: 2029491 · Report as offensive
BetelgeuseFive Project Donor
Volunteer tester

Send message
Joined: 6 Jul 99
Posts: 158
Credit: 17,117,787
RAC: 19
Netherlands
Message 2029498 - Posted: 27 Jan 2020, 10:49:46 UTC

Are the validators missing stuff ?

https://setiathome.berkeley.edu/workunit.php?wuid=3847744137

Why is this still waiting for validation ?

Tom
ID: 2029498 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22815
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2029499 - Posted: 27 Jan 2020, 10:55:08 UTC
Last modified: 27 Jan 2020, 10:58:06 UTC

This sort of thing happens from time to time. It appears to be more common when the servers have been having problems. It is normally cleared when the tasks reach their deadlines which triggers the validators to wake up and grant credit as appropriate.

However in the case you highlight the job will have to be sent out to another computer as it has not been validated - two of the three computers "disagree", the third has submitted a result that looks as if it should validate, but that needs to be confirmed by someone else.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2029499 · Report as offensive
BetelgeuseFive Project Donor
Volunteer tester

Send message
Joined: 6 Jul 99
Posts: 158
Credit: 17,117,787
RAC: 19
Netherlands
Message 2029503 - Posted: 27 Jan 2020, 11:14:29 UTC - in response to Message 2029499.  

This sort of thing happens from time to time. It appears to be more common when the servers have been having problems. It is normally cleared when the tasks reach their deadlines which triggers the validators to wake up and grant credit as appropriate.

However in the case you highlight the job will have to be sent out to another computer as it has not been validated - two of the three computers "disagree", the third has submitted a result that looks as if it should validate, but that needs to be confirmed by someone else.


I'm not sure you are correct. I think the two first results didn't match, but the third one has not yet been compared to the first two. If either the first or the second task is a match with the third the unit will be validated (and removed from the results table ...) and no new task needs to be sent.

Tom
ID: 2029503 · Report as offensive
Profile xpozd
Avatar

Send message
Joined: 26 Jan 15
Posts: 88
Credit: 280,183
RAC: 1
Canada
Message 2029504 - Posted: 27 Jan 2020, 11:29:06 UTC

Almost a week now with no work units/tasks at all...
i see others say they get them.
yet all i see is this:

-Project communication failed: attempting access to reference site
-Internet access OK - project servers may be temporarily down.

should i be worried ?

  • win7starter
  • boinc: 7.14.2
  • boinc tasks: 1.78
  • Lunatics Win32 v0.44

ID: 2029504 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029505 - Posted: 27 Jan 2020, 11:38:04 UTC - in response to Message 2029504.  

You're running a very old version of BOINC - v6.6.38 (over 10 years old!)

You need to run a newer version because of updated communications security protocols.
ID: 2029505 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2029509 - Posted: 27 Jan 2020, 11:59:49 UTC - in response to Message 2029498.  
Last modified: 27 Jan 2020, 12:40:21 UTC

Are the validators missing stuff ?

https://setiathome.berkeley.edu/workunit.php?wuid=3847744137

Why is this still waiting for validation ?

Tom
This is pretty common, and has been happening forever. It happens on the Maintenance days when the Servers are shut down. To see more of them just go back to the Tuesdays of each week and look for them. If you were active during that period it's likely you have a few, some have more than others. It might be interesting to estimate just how many there are, since everyone gets a few every Tuesday.

I just checked one of My hosts for the Tuesday Jan 7th Maintenance. Looks like quite a few sitting there, wasting space, https://setiathome.berkeley.edu/workunit.php?wuid=3826315572, if that link is dead just go to that date in the list as there's plenty more where that one came from, https://setiathome.berkeley.edu/results.php?hostid=6813106&offset=23800
ID: 2029509 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029524 - Posted: 27 Jan 2020, 16:30:36 UTC
Last modified: 27 Jan 2020, 16:43:24 UTC

People complaining about stuck uploads, what boinc version are you using? I haven't been suffering from this problem so can it be version dependent?

One curious thing I have observed is that when my upload fails, the client announces several minute long backoff time on the event log, but it will actually retry in about half a minute. And almost always succesfully.
ID: 2029524 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 2029525 - Posted: 27 Jan 2020, 16:40:14 UTC

I have had stuck upload problems with both 7.14.2 & 7.16.3 for some weeks I guess.
ID: 2029525 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029526 - Posted: 27 Jan 2020, 16:54:42 UTC - in response to Message 2029525.  

I have had stuck upload problems with both 7.14.2 & 7.16.3 for some weeks I guess.
My version is between your versions: 7.15.0, but not a release version. I downloaded the development source some time in last autumn to add spoofing to it. The most visible chance compared to the version I ran before it that came from Ubuntu repositories was that boincmgr doesn't scroll the list to the end whenever the size of the list changes, which was extremely annoying (but it still resets the sorting columns, which is only slightly less annoying).

A bigger annoyance is that boincmgr forces its windows on top whenever they gain focus. But I'm not sure if this is a boinc bug or qt bug.
ID: 2029526 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2029537 - Posted: 27 Jan 2020, 18:19:14 UTC - in response to Message 2029526.  

You'll have to pull from the 7.16 branch to get rid of the jumping tasks in the tasklist. Was fixed in #3064
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2029537 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029538 - Posted: 27 Jan 2020, 18:19:29 UTC
Last modified: 27 Jan 2020, 18:20:44 UTC

I wonder are the results corresponding to 'Workunits waiting for assimilation' on the SSP still counted in 'Results returned and awaiting validation'? That would explain the mismatch between 'Results waiting for db purging' that is now really low and the number of workunits of my hosts the website lists in 'Valid' state that is way higher than normal.

That would also explain part of 'Results returned and awaiting validation' being very high.
ID: 2029538 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029539 - Posted: 27 Jan 2020, 18:30:34 UTC - in response to Message 2029537.  
Last modified: 27 Jan 2020, 19:08:44 UTC

You'll have to pull from the 7.16 branch to get rid of the jumping tasks in the tasklist. Was fixed in #3064
It is fixed in my version. It's not the official 7.15 release but some random point in the development between 7.15 and 7.16. The remaining problem affects the sorting order only. The viewport won't jump anywhere when new tasks arrive, but If I had sorted my list by some column, this sorting order will reset to default. Has 7.16 fixed this?

Looks like the jump fix was committed in April. I downloaded my source some time in September.

If some version fixes the super annoying 'window brings itself on top of other windows when it receives focus' bug, then please tell me immediately!

You can't observe that bug on Windows, Mac, or a Linux whose window manager tries to emulate Windows or Mac because on those systems the system itself does the same thing. The bug is a bug even on those environments because it's doing a redundant thing. But when you have a different environment, it becomes a really serious bug. Windows/Mac window management was originally designed for using only one application at a time in the eighties. Properly multiprocessing environments like Amiga and classic X11 kept the windows stacked like the user wanted so that using more than one application is possible without constant alt-tabbing. That's how my desktop works.

Unfortunately most linux distributions these days use window managers emulating windows by default because thay want users migrating from windows to feel the environment familiar. With all its limitations.
ID: 2029539 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029540 - Posted: 27 Jan 2020, 18:37:52 UTC - in response to Message 2029537.  

You'll have to pull from the 7.16 branch to get rid of the jumping tasks in the tasklist. Was fixed in #3064
That one was me. I didn't have a Linux machine at the time, so I was working blind. Nobody mentioned a sorting problem, so I didn't fix one. It may have been solved elsewhere - I've got a Linux machine now, so I'll try it later.
ID: 2029540 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2029544 - Posted: 27 Jan 2020, 19:29:36 UTC

I'm trying to get the sorting order to change when new downloaded tasks arrive. Sorting order never changes for me. Whatever column I have changed the sorting order in, stays put. Maybe I don't understand your problem correctly. Running the 7.16.3 branch.

Just a FYI and some persuasion . . . you can get rid of the "finish file not found" errors if you update to the 7.16 branch also.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2029544 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029545 - Posted: 27 Jan 2020, 19:29:48 UTC

Actually the sorting problem doesn't affect the primary sorting column. That stays selected. The problem is in sorting by multiple columns.

When I click one column, then another, I get a sorting that sorts by the second clicked column but rows, that don't differ by that coluimn, are sorted by the first clicked column. I can sort by any number of columns this way. I think the manager doesn't support multi-column sorting explicitly, but it is just a side effect from using a stable sort algorithm. However this stability is lost when the list updates. The rows stay sorted by the primary column but rows with identical primary column go into some arbitrary order that doesn't match any column.

This could be fixed by maintaining a list of column indices and whenever some column is clicked for sorting, move that column's index to the front of the list. Then use all columns in the list order as sorting key. The result would be an explicitly supported multi-column sorting but using the same user interface as the current serendipitous multi-column sorting.
ID: 2029545 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2029546 - Posted: 27 Jan 2020, 19:37:27 UTC - in response to Message 2029545.  

Ahhh, gotcha. Yes you are correct. It seems the Manager does not support multi-order sorting. Just single. I just tried with time remaining, status and application. The AP task get jumping around.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2029546 · Report as offensive
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2029547 - Posted: 27 Jan 2020, 19:39:11 UTC
Last modified: 27 Jan 2020, 19:45:35 UTC

I don't want to update my boinc client version without a good reason because updating involves quite a bit of effort. First I would have to port my spoofing hack to the new version and then I would have to run my queue empty because in my experience updating the boinc version involves the high risk of the new version deleting every task downloaded by the old version. Even the completed ones!

And whenever I intentionally run my queue empty, there's a guaranteed unplanned server outage exactly when the queue becomes empty!
ID: 2029547 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1859
Credit: 268,616,081
RAC: 1,349
United States
Message 2029557 - Posted: 27 Jan 2020, 21:04:06 UTC - in response to Message 2029525.  

I have had stuck upload problems with both 7.14.2 & 7.16.3 for some weeks I guess.

Ditto
ID: 2029557 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2029560 - Posted: 27 Jan 2020, 21:13:24 UTC

no real problems with stuck uploads here.

i see a handful of the uploads occasionally hitting a 3-5min backoff but they clear on the next retry so they don't stack up, and that's on the fastest systems on the project. doesn't seem to be a huge issue really. none of my systems had more than like 5 in this state.

also using 7.16.3. doubt its at all related to the BOINC version.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2029560 · Report as offensive
Previous · 1 . . . 57 · 58 · 59 · 60 · 61 · 62 · 63 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.