Panic Mode On (101) Server Problems?

Message boards : Number crunching : Panic Mode On (101) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 27 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1740442 - Posted: 7 Nov 2015, 17:53:25 UTC - in response to Message 1740439.  

Keep arguing Semantics. We are watching you dance around an Obvious problem with the SETI Stats on a SETI webpage from a SETI Server.
Obviously if you don't return results you will Not receive a GF reading.
Likewise, the more results the higher that GF reading will be on that page.
It's all very interesting...

Yes, it is. And it can be even more interesting if you understand it.

Although it isn't publicly advertised, exactly the same generic BOINC code runs at Milkyway: http://milkyway.cs.rpi.edu/milkyway/gpu_list.php

They have Macs running NVidia cards, but not running ATI cards - exactly the opposite of the display here. So, who's running a sample NV card on Mac here, so we can look at the timings?
ID: 1740442 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1740445 - Posted: 7 Nov 2015, 18:03:34 UTC - in response to Message 1740442.  

Here is One returning results every few minutes, http://setiathome.berkeley.edu/results.php?hostid=7644315&offset=120
There are many, Many Mac Laptops with nVidia cards returning many more than that one host.
ID: 1740445 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1740452 - Posted: 7 Nov 2015, 18:18:26 UTC
Last modified: 7 Nov 2015, 18:19:03 UTC

http://www.primegrid.com/gpu_list.php shows both Nvidia and AMD GPUs on Macs for a variety of applications.
ID: 1740452 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1740454 - Posted: 7 Nov 2015, 18:22:12 UTC - in response to Message 1740442.  
Last modified: 7 Nov 2015, 19:16:24 UTC

They have Macs running NVidia cards, but not running ATI cards - exactly the opposite of the display here. So, who's running a sample NV card on Mac here, so we can look at the timings?

Computer 7297852 NVIDIA Quadro K5000/GeForce GT 640

Computer 7599680 NVIDIA GeForce GTX 770

Computer 6863747 NVIDIA GeForce GTX 680MX

Computer 7321524 NVIDIA GeForce GTX 680MX

Computer 6883710 NVIDIA GeForce GTX 660 Ti

Computer 7414871 NVIDIA GeForce GTX 680MX

Computer 7453012 NVIDIA GeForce GTX 675MX

Computer 7088836 NVIDIA GeForce GTX 680MX

Computer 7163398 NVIDIA GeForce GTX 780M

Computer 7719315 NVIDIA GeForce GTX 675MX

Computer 7773885 NVIDIA GeForce GTX 780M

Computer 3502292 NVIDIA GeForce GTX 680MX

Computer 6902460 NVIDIA GeForce GTX 675MX

Computer 7718750 NVIDIA GeForce GTX 680MX

Computer 7494607 NVIDIA GeForce GTX 780M

Computer 4726043 NVIDIA GeForce GTX 950

Computer 7183121 NVIDIA GeForce GTX 775M

Computer 7752206 NVIDIA GeForce GTX 680MX

Computer 7334464 NVIDIA GeForce GT 755M

Computer 7592123 NVIDIA GeForce GTX 970

Computer 7453810 NVIDIA GeForce GTX 675MX

Computer 7254833 NVIDIA GeForce GT 755M

Claggy
ID: 1740454 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1740459 - Posted: 7 Nov 2015, 18:44:42 UTC

I was beginning to get worried about the Oct 20, 2014, bugfix adding anonymous platform hosts to the lists (3136342575c42e7e3b2dc13fbb04feb54ae0c77d). Neither Milkyway not PrimeGrid seem to have that, judging by the missing ", measured by average elapsed time of tasks," in the rubric.

Specifically, the first host we looked at - Jason's Xeon - is running anonymous platform: and the handler for anonymous platform ("if ($vendor == "cuda") {$av_ids .= "-3";}") uses "cuda" (only) as the vendor string. But that seems uniform throughout the code, and Claggy's first two examples are both running the stock app.

Back to the drawing board...
ID: 1740459 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1740460 - Posted: 7 Nov 2015, 18:58:23 UTC - in response to Message 1740459.  

I believe you will find most of the Mac nVidia results are coming from Laptops, running the Stock App, on the same Mobile GPUs. Which means it shouldn't be very difficult for the Server to see Hundreds of results from the exact same nVidia mobile GPU. I have seen Jason's 780 displayed a few times, and when I had My 750Ti in my Mac it did appear a couple times even though it was running 24/7 for weeks.
ID: 1740460 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1740461 - Posted: 7 Nov 2015, 19:00:04 UTC

David Anderson has found some Mac GPUs by increasing the result enumeration limit to 2000 per vendor. Evidently a low proportion of Macs are using their GPUs so far.
ID: 1740461 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1740466 - Posted: 7 Nov 2015, 19:20:37 UTC - in response to Message 1740461.  

David Anderson has found some Mac GPUs by increasing the result enumeration limit to 2000 per vendor. Evidently a low proportion of Macs are using their GPUs so far.

As well as some Windows and especially Linux GPUs.

Claggy
ID: 1740466 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1740467 - Posted: 7 Nov 2015, 19:29:55 UTC

Yes, finally some Mac nVidia GPUs. Thank you very much.
Now to see how long it lasts.
;-)
ID: 1740467 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1740475 - Posted: 7 Nov 2015, 20:49:43 UTC - in response to Message 1740078.  

14 WUs, (processed between both my crunchers), have been marked Invalid. Most of them from Prometheus, my GTX-750 TI SC machine.

:-(


14? Is that all ? I got 298 invalids over 3 machines since the mix up happened . Mistakes happen that's life !

Now up to 44 due to the Server issue... Plus, another 43 in Validation Inconclusive, ready to become more Invalids... :-(

Now it's Exeter in the "lead" of most Invalids. (GTX-760.)

Computer IDs:

7301566 - Exeter, GTX-760

7440554 - Prometheus, GTX-750 TI SC


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1740475 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1740483 - Posted: 7 Nov 2015, 21:47:30 UTC - in response to Message 1740475.  
Last modified: 7 Nov 2015, 22:11:37 UTC

Was the issue of random abandonment of caches ever resolved?
This host here appears to have suffered from the issue.

They received a cache full of work on 15th Oct, every 5 minutes picking up 5-10 WUs.
7th Nov, 17:15:25 UTC they received another WU (doesn't appear to be any other communications between those dates), and at the same time all existing WUs were abandoned, 7th Nov, 17:15:25 UTC.
On their machine, "Number of time client has contacted server" stands at 3, there were at least 6 contacts just to fill the cache on 15th Oct.



EDIT- and I've noticed a lot of errored out WUs from ARM based systems, most common Stderr output,

error: only position independent executables (PIE) are supported.

this being on hardware with the application version
SETI@home v7 v7.21 (armv7-vfpv3d16-nopie)

eg
ARM WU error
Grant
Darwin NT
ID: 1740483 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1740541 - Posted: 8 Nov 2015, 7:34:12 UTC - in response to Message 1740227.  

rob smith wrote:
I'm not sure why you see "three results the same" - the stderr of the two results that validated look very similar, while the one that failed is very different.

IIRC, two AP results were from (MAC) GPUs.
My AP result was from the J1900 CPU.

The result:
single pulses: 0
repetitive pulses: 0
...was at all 3 results the same.

- - - - - - - - - -

I myself wrote:
ap_23jl11ac_B2_P1_00115_20150924_21168.wu
http://setiathome.berkeley.edu/workunit.php?wuid=1910752354

3 same results, but 1 of them invalid. Why?


TBar wrote:
I'd say it's the same thing mentioned in this post, http://setiathome.berkeley.edu/forum_thread.php?id=78410&postid=1739228#1739228
OK, WTH is this? http://setiathome.berkeley.edu/workunit.php?wuid=1922047397
How can you get an Invalid on a Blanking too much RFI? percent blanked: 100.00
I think I see a pattern here. I was sent a Resend, the other Host completed the task before I did.
So, I GET AN INVALID! For Successfully completing the Resend that was sent Me?
I remember this from a long time ago, Why has it resurfaced?

Yes, I was given an Invalid on a task where there wasn't any computation. Looking at the results it was clear I had been sent a Resend from a host that had timed out. Before I completed the task the host that had timed out reported the task and I was given an Impossible Invalid when I reported.

If you remember back when the APv6 tasks were cleared there was New Code entered to Clear All the Hanging APv6 tasks. Unfortunately what this New Code did was give EVERYONE who had been sent a Resend an Invalid for completing that Resend. That problem was quickly noticed and corrected, except people's APv6 Consecutive Valid Task scores were Destroyed, and will Never be restored.
It would Appear that 'New Code' has resurfaced and again My 10000 Consecutive Valid Task Score has been Destroyed, http://setiathome.berkeley.edu/host_app_versions.php?hostid=6796479
AstroPulse v6 (anonymous platform, ATI GPU)
Number of tasks completed: 22635
Max tasks per day: 9790
Number of tasks today: 0
Consecutive valid tasks: 0

AstroPulse v7 (anonymous platform, ATI GPU)
Number of tasks completed: 9323
Max tasks per day: 12121
Number of tasks today: 55
Consecutive valid tasks: 318

Yep, SAME host I received My resends from, http://setiathome.berkeley.edu/show_host_detail.php?hostid=7484988
Apparently he had a few APs Time Out, the APs were sent to others, then He reported the overdue APs and the People with the Resends were given an Invalid for their troubles.


It's the 2nd time 'I noticed' that my PC got an invalid for an AP result (because of the above mentioned).
I don't look every day to the WUs/results overview...


This 'resend thing which result in invalid result because a host which missed the deadline and send the result after deadline nevertheless' is already known by the admins?
ID: 1740541 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1740709 - Posted: 9 Nov 2015, 0:45:35 UTC

[Rant On]

Today, I'm up to 53 Invalid, 36 Inconclusive. This is getting really, REALLY stupid!!!

These faulty units should just be purged from the system; especially if Eric and the rest know that these are faulty. Why are we being left to take the hit on these ludicrous units???

[Rant Off.]

On a more positive note; both my computers put enough work in to raise my RAC back to a more reasonable level. I hope that trend holds. :-)


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1740709 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1740766 - Posted: 9 Nov 2015, 4:54:58 UTC - in response to Message 1740709.  

[Rant On]

Today, I'm up to 53 Invalid, 36 Inconclusive. This is getting really, REALLY stupid!!!

These faulty units should just be purged from the system; especially if Eric and the rest know that these are faulty. Why are we being left to take the hit on these ludicrous units???

[Rant Off.]

On a more positive note; both my computers put enough work in to raise my RAC back to a more reasonable level. I hope that trend holds. :-)

TL

They could be taking this opportunity to verify other systems in place are working correctly when there are known bad workunits in play.
As I understand it. Trying to track down all of the bad workunits and flag them to be canceled by the server would be more work than just letting them run through the system.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1740766 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30639
Credit: 53,134,872
RAC: 32
United States
Message 1740770 - Posted: 9 Nov 2015, 5:22:20 UTC - in response to Message 1740766.  

[Rant On]

Today, I'm up to 53 Invalid, 36 Inconclusive. This is getting really, REALLY stupid!!!

These faulty units should just be purged from the system; especially if Eric and the rest know that these are faulty. Why are we being left to take the hit on these ludicrous units???

[Rant Off.]

On a more positive note; both my computers put enough work in to raise my RAC back to a more reasonable level. I hope that trend holds. :-)

TL

They could be taking this opportunity to verify other systems in place are working correctly when there are known bad workunits in play.
As I understand it. Trying to track down all of the bad workunits and flag them to be canceled by the server would be more work than just letting them run through the system.

He may have to do that as he really should give credit for work done on known bad work units.
ID: 1740770 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1740797 - Posted: 9 Nov 2015, 9:48:25 UTC - in response to Message 1740483.  

Was the issue of random abandonment of caches ever resolved?
This host here appears to have suffered from the issue.

They received a cache full of work on 15th Oct, every 5 minutes picking up 5-10 WUs.
7th Nov, 17:15:25 UTC they received another WU (doesn't appear to be any other communications between those dates), and at the same time all existing WUs were abandoned, 7th Nov, 17:15:25 UTC.
On their machine, "Number of time client has contacted server" stands at 3, there were at least 6 contacts just to fill the cache on 15th Oct.

Richard and I managed to find and eliminate one missing safety check, which could lead to tasks being abandoned though present on the host.
Are you in touch with the user? If he did a reattach the tasks were rightly abandoned. Only if they are still crunching on the host but marked abandoned on the server, it is a problem.
These days if you reattach you should get your old host id but times contacted server reset. [and any tasks still in progress from before marked as abandoned]

EDIT- and I've noticed a lot of errored out WUs from ARM based systems, most common Stderr output,

error: only position independent executables (PIE) are supported.

this being on hardware with the application version
SETI@home v7 v7.21 (armv7-vfpv3d16-nopie)

eg
ARM WU error

Wu is already purged. No idea if Eric has the time to look into it.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1740797 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1740799 - Posted: 9 Nov 2015, 10:01:29 UTC - in response to Message 1740797.  

EDIT- and I've noticed a lot of errored out WUs from ARM based systems, most common Stderr output,

error: only position independent executables (PIE) are supported.

this being on hardware with the application version
SETI@home v7 v7.21 (armv7-vfpv3d16-nopie)

eg
ARM WU error

Wu is already purged. No idea if Eric has the time to look into it.


I'll keep an eye out.
Grant
Darwin NT
ID: 1740799 · Report as offensive
Profile Louis Loria II
Volunteer tester
Avatar

Send message
Joined: 20 Oct 03
Posts: 259
Credit: 9,208,040
RAC: 24
United States
Message 1740816 - Posted: 9 Nov 2015, 11:44:21 UTC

So, it wasn't just me. Good to know. 142 invalids, 154 errors. Some of those errors were mine, playing with drivers. Otherwise, I hope it gets straighten out.

Good luck crunching folks!
ID: 1740816 · Report as offensive
Swibby Bear

Send message
Joined: 1 Aug 01
Posts: 246
Credit: 7,945,093
RAC: 0
United States
Message 1740988 - Posted: 10 Nov 2015, 0:56:21 UTC

FYI - Matt posted a big update under Technical News. Thanks, Matt. Now we're eager for your next writeup!
ID: 1740988 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1740993 - Posted: 10 Nov 2015, 1:08:06 UTC

Up to 58 Invalids; down to 23 Inconclusives...

Yet, tomorrow is another Outrage day... What will that bring, next??? :-O


TL


P.S.

[Rant On:]

I'm with Gary; we should definitely be granted some credit for these faulty WUs... It's NOT the fault of our systems that they all error out! It's NOT our fault that time is wasted on them!

We spend the time, we should get credit!

[Rant Off.]
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1740993 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 27 · Next

Message boards : Number crunching : Panic Mode On (101) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.