Suddenly BOINC Decides to Abandon 71 APs...WTH?

Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · Next

AuthorMessage
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1697258 - Posted: 1 Jul 2015, 7:41:37 UTC - in response to Message 1697255.  

What we need - and it might well be in your code - is a one-off "resend lost tasks" check triggered as part of the seqno/detach/authentication path we're exploring.

No, I didn;t do that. I proposed it as the more sophisticated solution:
Assume 'backup' and run the 'resend_lost_tasks' logic against the client request (that does send 'other tasks' anyway).
Actually I said just check the 'other tasks' list against the server held list and mark those abandoned that are not present.
How old are people's backups? weekly? half a year? 'when I remember to do one'?
Does the client abort 'not started by deadline' on its own?
What about tasks you crunched ages ago? no need to redo them.
Dang. that's getting complicated.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1697258 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1697259 - Posted: 1 Jul 2015, 7:43:03 UTC - in response to Message 1697255.  
Last modified: 1 Jul 2015, 7:43:37 UTC

my client trigger just leaves a reportable task off the server request every so often (reporting it in a later request). Haven't tried it lately, because I haven't had ghost tasks, so your guess is as good as mine as to whether that logic is intact.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1697259 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1697261 - Posted: 1 Jul 2015, 7:46:44 UTC - in response to Message 1697258.  

Does the client abort 'not started by deadline' on its own?

Yes.
ID: 1697261 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1697262 - Posted: 1 Jul 2015, 7:50:52 UTC - in response to Message 1697243.  

alright, sounds reasonable, as it's quite possible they fiddled with the RPC timeouts, meither making them shorter or breaking them.
For the sakes of eliminating the easy stuff, that we're all very certain there is no problem with... Is there any Mac equivalent to the windows pathping or Linux mtr command ?
some example pinging your local router that looks like this would just confirm It's Boinc's/Setis issues, for everyone to see:
C:\Users\Jason>pathping 192.168.0.1
Tracing route to 192.168.0.1 over a maximum of 30 hops
  0  Apollo [192.168.0.10]
  1  192.168.0.1
Computing statistics for 25 seconds...
            Source to Here   This Node/Link
Hop  RTT    Lost/Sent = Pct  Lost/Sent = Pct  Address
  0                                           Apollo [192.168.0.10]
                                0/ 100 =  0%   |
  1    0ms     0/ 100 =  0%     0/ 100 =  0%  192.168.0.1
Trace complete.
C:\Users\Jason>pathping setiathome.berkeley.edu
Tracing route to setiathome.berkeley.edu [169.229.217.150]
over a maximum of 30 hops:
  0  Apollo [192.168.0.10]
  1  192.168.0.1
  2     *        *        *
Computing statistics for 25 seconds...
            Source to Here   This Node/Link
Hop  RTT    Lost/Sent = Pct  Lost/Sent = Pct  Address
  0                                           Apollo [192.168.0.10]
                                0/ 100 =  0%   |
  1    0ms     0/ 100 =  0%     0/ 100 =  0%  192.168.0.1
Trace complete.

Okay, I get a really bad 100% with 169.229.217.150, so I tried the last Hop according to Traceroute;
17 ucb--oak-agg4-10g.cenic.net (137.164.50.31) 77.528 ms 78.321 ms 79.539 ms
PING 137.164.50.31 (137.164.50.31): 56 data bytes
64 bytes from 137.164.50.31: icmp_seq=0 ttl=55 time=80.274 ms
64 bytes from 137.164.50.31: icmp_seq=1 ttl=55 time=79.100 ms
64 bytes from 137.164.50.31: icmp_seq=2 ttl=55 time=81.445 ms
64 bytes from 137.164.50.31: icmp_seq=3 ttl=55 time=81.127 ms
64 bytes from 137.164.50.31: icmp_seq=4 ttl=55 time=80.898 ms
64 bytes from 137.164.50.31: icmp_seq=5 ttl=55 time=79.710 ms
64 bytes from 137.164.50.31: icmp_seq=6 ttl=55 time=79.273 ms
64 bytes from 137.164.50.31: icmp_seq=7 ttl=55 time=78.990 ms
64 bytes from 137.164.50.31: icmp_seq=8 ttl=55 time=80.793 ms
64 bytes from 137.164.50.31: icmp_seq=9 ttl=55 time=79.878 ms

--- 137.164.50.31 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 78.990/80.149/81.445/0.843 ms

Next to last Hop works fine to;
16 dc-oak-agg4--svl-agg4-100ge.cenic.net (137.164.46.144) 77.928 ms 77.400 ms 76.975 ms
PING 137.164.46.144 (137.164.46.144): 56 data bytes
64 bytes from 137.164.46.144: icmp_seq=0 ttl=247 time=77.160 ms
64 bytes from 137.164.46.144: icmp_seq=1 ttl=247 time=79.483 ms
64 bytes from 137.164.46.144: icmp_seq=2 ttl=247 time=77.551 ms
64 bytes from 137.164.46.144: icmp_seq=3 ttl=247 time=79.791 ms
64 bytes from 137.164.46.144: icmp_seq=4 ttl=247 time=79.575 ms
64 bytes from 137.164.46.144: icmp_seq=5 ttl=247 time=78.412 ms
64 bytes from 137.164.46.144: icmp_seq=6 ttl=247 time=77.490 ms
64 bytes from 137.164.46.144: icmp_seq=7 ttl=247 time=77.817 ms
64 bytes from 137.164.46.144: icmp_seq=8 ttl=247 time=76.883 ms
64 bytes from 137.164.46.144: icmp_seq=9 ttl=247 time=78.700 ms

--- 137.164.46.144 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 76.883/78.286/79.791/1.009 ms

This one works too, 18 t2-3.inr-202-reccev.berkeley.edu (128.32.0.39)
PING 128.32.0.39 (128.32.0.39): 56 data bytes
64 bytes from 128.32.0.39: icmp_seq=0 ttl=246 time=80.550 ms
64 bytes from 128.32.0.39: icmp_seq=1 ttl=246 time=80.260 ms
64 bytes from 128.32.0.39: icmp_seq=2 ttl=246 time=82.059 ms
64 bytes from 128.32.0.39: icmp_seq=3 ttl=246 time=84.547 ms
64 bytes from 128.32.0.39: icmp_seq=4 ttl=246 time=83.935 ms
64 bytes from 128.32.0.39: icmp_seq=5 ttl=246 time=79.321 ms
64 bytes from 128.32.0.39: icmp_seq=6 ttl=246 time=78.513 ms
64 bytes from 128.32.0.39: icmp_seq=7 ttl=246 time=81.457 ms
64 bytes from 128.32.0.39: icmp_seq=8 ttl=246 time=80.215 ms
64 bytes from 128.32.0.39: icmp_seq=9 ttl=246 time=79.279 ms

--- 128.32.0.39 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 78.513/81.014/84.547/1.894 ms

This one works too,
19 e3-47.inr-310-ewdc.berkeley.edu (128.32.0.99) 76.926 ms 77.590 ms 77.689 ms
PING 128.32.0.99 (128.32.0.99): 56 data bytes
64 bytes from 128.32.0.99: icmp_seq=0 ttl=54 time=79.564 ms
64 bytes from 128.32.0.99: icmp_seq=1 ttl=54 time=80.873 ms
64 bytes from 128.32.0.99: icmp_seq=2 ttl=54 time=77.894 ms
64 bytes from 128.32.0.99: icmp_seq=3 ttl=54 time=79.462 ms
64 bytes from 128.32.0.99: icmp_seq=4 ttl=54 time=80.543 ms
64 bytes from 128.32.0.99: icmp_seq=5 ttl=54 time=81.589 ms
64 bytes from 128.32.0.99: icmp_seq=6 ttl=54 time=78.557 ms
64 bytes from 128.32.0.99: icmp_seq=7 ttl=54 time=79.633 ms
64 bytes from 128.32.0.99: icmp_seq=8 ttl=54 time=81.454 ms
64 bytes from 128.32.0.99: icmp_seq=9 ttl=54 time=80.530 ms

--- 128.32.0.99 ping statistics ---
10 packets transmitted, 10 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 77.894/80.010/81.589/1.145 ms

What's up here?
PING 169.229.217.150 (169.229.217.150): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8

--- 169.229.217.150 ping statistics ---
10 packets transmitted, 0 packets received, 100.0% packet loss

Bad Address? Is that an Aussie Addy?
ID: 1697262 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1697263 - Posted: 1 Jul 2015, 7:51:12 UTC - in response to Message 1697259.  

my client trigger just leaves a reportable task off the server request every so often (reporting it in a later request). Haven't tried it lately, because I haven't had ghost tasks, so your guess is as good as mine as to whether that logic is intact.

Are you referring to Claggy's resend trigger? I thought that relied on having a report rejected for duplication - in which case, the trigger would be skipping an ack in the server reply, so a reported task remained live in client_state.
ID: 1697263 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1697264 - Posted: 1 Jul 2015, 7:54:30 UTC - in response to Message 1697262.  

What's up here?
PING 169.229.217.150 (169.229.217.150): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8

--- 169.229.217.150 ping statistics ---
10 packets transmitted, 0 packets received, 100.0% packet loss

Bad Address? Is that an Aussie Addy?

Some servers don't react to pings, even if they work fine.
might be a bad apple or just an unresponsive apple.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1697264 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1697265 - Posted: 1 Jul 2015, 7:59:25 UTC - in response to Message 1697263.  

my client trigger just leaves a reportable task off the server request every so often (reporting it in a later request). Haven't tried it lately, because I haven't had ghost tasks, so your guess is as good as mine as to whether that logic is intact.

Are you referring to Claggy's resend trigger? I thought that relied on having a report rejected for duplication - in which case, the trigger would be skipping an ack in the server reply, so a reported task remained live in client_state.


Not sure. I recall bouncing it back and forth with claggy and coding it not exactly as was described, after manual attempts at connection chopping didn't work. certainly wasn't that complex in implementation. Anyway, I made it, it's all MINE *muahahahaha*
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1697265 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1697266 - Posted: 1 Jul 2015, 8:01:51 UTC - in response to Message 1697264.  

169.229.217.150 is a fine and active address - probably just stealthed to frustrate denial of service attacks.

http://169.229.217.150/
http://whois.urih.com/record/169.229.217.150/
ID: 1697266 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1697267 - Posted: 1 Jul 2015, 8:02:42 UTC - in response to Message 1697265.  
Last modified: 1 Jul 2015, 8:49:21 UTC

my client trigger just leaves a reportable task off the server request every so often (reporting it in a later request). Haven't tried it lately, because I haven't had ghost tasks, so your guess is as good as mine as to whether that logic is intact.

Are you referring to Claggy's resend trigger? I thought that relied on having a report rejected for duplication - in which case, the trigger would be skipping an ack in the server reply, so a reported task remained live in client_state.


Not sure. I recall bouncing it back and forth with claggy and coding it not exactly as was described, after manual attempts at connection chopping didn't work. certainly wasn't that complex in implementation. Anyway, I made it, it's all MINE *muahahahaha*

I'll see if I can get one of my Parallella's to report a task twice, and see if a once only 'resend lost tasks' is triggered. (Edit 2: This host will have a task ready for reporting in 7 hours, I've suspended network activity so it won't report itself)

All tasks for computer 7506529

Edit: Change of plan, i'm doing this host first at Seti Beta as it had a task ready for reporting:

http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=74138

I've backed my CS's up as well as my Wu's, reported the 12jl12ab.27089.75183.438086664199.16.223_2 task, as well as getting three fresh Wu's,
then backed the CS and Wu's up again, before deleting them from the CS and restoring the 223_2 Wu.

I suspended Seti Main, so Beta asked for work, and reported the previously reported Wu again, the Wu's were resent (this doesn't prove that it works still as i think 'resend lost tasks' is still active at Seti Beta):

Wed 01 Jul 2015 08:23:26 UTC | SETI@home Beta Test | [sched_op] Starting scheduler request
Wed 01 Jul 2015 08:23:26 UTC | SETI@home Beta Test | Sending scheduler request: To fetch work.
Wed 01 Jul 2015 08:23:26 UTC | SETI@home Beta Test | Reporting 1 completed tasks
Wed 01 Jul 2015 08:23:26 UTC | SETI@home Beta Test | Requesting new tasks for CPU
Wed 01 Jul 2015 08:23:26 UTC | SETI@home Beta Test | [sched_op] CPU work request: 259200.00 seconds; 3.00 devices
Wed 01 Jul 2015 08:23:29 UTC | SETI@home Beta Test | Scheduler request completed: got 3 new tasks
Wed 01 Jul 2015 08:23:29 UTC | SETI@home Beta Test | [sched_op] Server version 707
Wed 01 Jul 2015 08:23:29 UTC | SETI@home Beta Test | Resent lost task 12jl12ab.22861.28594.261993005067.16.74_1
Wed 01 Jul 2015 08:23:29 UTC | SETI@home Beta Test | Resent lost task 12jl12ab.22861.75183.438086664203.16.65_2
Wed 01 Jul 2015 08:23:29 UTC | SETI@home Beta Test | Resent lost task 12jl12ab.22861.75183.438086664203.16.66_0
Wed 01 Jul 2015 08:23:29 UTC | SETI@home Beta Test | Project requested delay of 7 seconds
Wed 01 Jul 2015 08:23:29 UTC | SETI@home Beta Test | [sched_op] estimated total CPU task duration: 264596 seconds
Wed 01 Jul 2015 08:23:29 UTC | SETI@home Beta Test | [sched_op] handle_scheduler_reply(): got ack for task 12jl12ab.27089.75183.438086664199.16.223_2
Wed 01 Jul 2015 08:23:29 UTC | SETI@home Beta Test | [sched_op] Deferring communication for 00:00:07
Wed 01 Jul 2015 08:23:29 UTC | SETI@home Beta Test | [sched_op] Reason: requested by project
Wed 01 Jul 2015 08:23:31 UTC | SETI@home Beta Test | File 12jl12ab.22861.28594.261993005067.16.74 exists already, skipping download
Wed 01 Jul 2015 08:23:31 UTC | SETI@home Beta Test | File 12jl12ab.22861.75183.438086664203.16.65 exists already, skipping download
Wed 01 Jul 2015 08:23:31 UTC | SETI@home Beta Test | File 12jl12ab.22861.75183.438086664203.16.66 exists already, skipping download
Wed 01 Jul 2015 08:23:31 UTC | SETI@home Beta Test | Starting task 12jl12ab.22861.28594.261993005067.16.74_1
Wed 01 Jul 2015 08:23:31 UTC | SETI@home Beta Test | Starting task 12jl12ab.22861.75183.438086664203.16.65_2
Wed 01 Jul 2015 08:23:31 UTC | SETI@home Beta Test | Starting task 12jl12ab.22861.75183.438086664203.16.66_0
Wed 01 Jul 2015 08:23:33 UTC | SETI@home | project resumed by user

Claggy
ID: 1697267 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1697268 - Posted: 1 Jul 2015, 8:06:41 UTC - in response to Message 1697264.  
Last modified: 1 Jul 2015, 8:15:00 UTC

What's up here?
PING 169.229.217.150 (169.229.217.150): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5
Request timeout for icmp_seq 6
Request timeout for icmp_seq 7
Request timeout for icmp_seq 8

--- 169.229.217.150 ping statistics ---
10 packets transmitted, 0 packets received, 100.0% packet loss

Bad Address? Is that an Aussie Addy?

Some servers don't react to pings, even if they work fine.
might be a bad apple or just an unresponsive apple.

Yep, this works;
Lookup has started…

Trying "150.217.229.169.in-addr.arpa"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38688
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;150.217.229.169.in-addr.arpa. IN PTR

;; ANSWER SECTION:
150.217.229.169.in-addr.arpa. 86400 IN PTR muarae1.SSL.Berkeley.EDU.
ID: 1697268 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1697279 - Posted: 1 Jul 2015, 9:05:57 UTC - in response to Message 1697037.  

sched/handle_request.cpp#L387

has more consistent logic - it actually checks whether the client reported tasks before abandoning the lot.

but THAT is the code that should get called if we are looking at a detach/reattach where no tasks should be present. That's when you really want to mark tasks as abandoned - you detached (silent, server never gets told, tasks idle out) you reattach and the DB is cleaned up.
So he does a security check there but not when the rpc_seqno goes out of sync ?!

So what if

if ((g_request->allow_multiple_clients != 1)
                    && (g_request->other_results.size() == 0)
                ) {
                    mark_results_over(host);


i.e. the check for tasks on the host is put into line 426 that currently has no such security check?

Looks as if David meant to do that anyway:

4222d744e88163a4b02d269349c51cd22d6826ca

- Scheduler: in no-host-ID case, don't mark results as "detached"
if request contains any in-progress results

but he missed the line 426 case.
ID: 1697279 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1697282 - Posted: 1 Jul 2015, 9:46:37 UTC - in response to Message 1697279.  

He possibly missed line 426 because the allow_multiple_clients case is already handled by lines 417-419.

So line 426 only needs the (g_request->other_results.size() == 0) part of the test.
ID: 1697282 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1697302 - Posted: 1 Jul 2015, 12:03:46 UTC - in response to Message 1697282.  

I've submitted this workround to David and Eric - waiting for a response when the Pacific coast wakes up.
ID: 1697302 · Report as offensive
Profile S@NL Etienne Dokkum
Volunteer tester
Avatar

Send message
Joined: 11 Jun 99
Posts: 212
Credit: 43,822,095
RAC: 0
Netherlands
Message 1697343 - Posted: 1 Jul 2015, 14:50:37 UTC

didn't read through the whole thread yet but my PC decided to abandon the entire project as a whole after a needed reboot due to the heat ???!!?!

http://setiathome.berkeley.edu/results.php?userid=191736

Damn that those resends aren't up and running...

Not a happy camper right now :(((
ID: 1697343 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1697461 - Posted: 1 Jul 2015, 21:57:41 UTC

Look at that, it appears you can receive Stock CPU tasks at Beta with BOINC 7.2.33. I had never tried the Stock CPU tasks, it was the Stock GPU tasks that wanted 7.2.42.
So, we'll see how 7.2.33 works at Beta for a couple hours.
http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=71141
ID: 1697461 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1697489 - Posted: 1 Jul 2015, 23:12:25 UTC - in response to Message 1697461.  

Just had 273 APs abandoned due to to short a deadline. Shortest time to report was 12 minutes. Machine number 7248689.

This problem has been around for years. When is someone going to fix it...
ID: 1697489 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1697492 - Posted: 1 Jul 2015, 23:28:01 UTC - in response to Message 1697489.  

Just had 273 APs abandoned due to to short a deadline. Shortest time to report was 12 minutes. Machine number 7248689.

This problem has been around for years. When is someone going to fix it...

Do you mean Error tasks for computer 7248689?

I see no sign of a short deadline, let alone "abandoned due to...". All the tasks I've spot-checked had the normal 25-day deadlines for AP tasks.

Unless we're careful and precise in our error reporting - as I and others have spent the last six days trying to demonstrate - it's unlikely it'll ever get fixed.
ID: 1697492 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1697579 - Posted: 2 Jul 2015, 3:28:00 UTC - in response to Message 1697492.  

Just had 273 APs abandoned due to to short a deadline. Shortest time to report was 12 minutes. Machine number 7248689.

This problem has been around for years. When is someone going to fix it...

Do you mean Error tasks for computer 7248689?

I see no sign of a short deadline, let alone "abandoned due to...". All the tasks I've spot-checked had the normal 25-day deadlines for AP tasks.

Unless we're careful and precise in our error reporting - as I and others have spent the last six days trying to demonstrate - it's unlikely it'll ever get fixed.


Richard, I don't know what you were looking at but when I go to the Error tasks section it shows tasks that have very short deadlines. I have just taken a text grab of the first two (see below). The first one was sent 1 Jul 2015, 13:03:07 UTC and had a report deadline of 1 Jul 2015, 13:15:41 UTC. Status Abandoned. The second was sent 1 Jul 2015, 12:57:26 UTC with a reprt deadline of 1 Jul 2015, 13:15:41 UTC. Status Abandoned.

I did not see any tasks that have a 25 day deadline in the Error section (based on the quick scan that I did).

As for fixing the problem, this issue of incredibly short deadlines has been popping up for years now.


Task
click for details
Show names Work unit
click for details Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Credit Application
4244414700 1834746877 1 Jul 2015, 13:03:07 UTC 1 Jul 2015, 13:15:41 UTC Abandoned 0.00 0.00 --- AstroPulse v7
Anonymous platform (NVIDIA GPU)
4244402595 1834727011 1 Jul 2015, 12:57:26 UTC 1 Jul 2015, 13:15:41 UTC Abandoned 0.00 0.00 --- AstroPulse v7
Anonymous platform (NVIDIA GPU)
ID: 1697579 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1697594 - Posted: 2 Jul 2015, 4:22:24 UTC - in response to Message 1697579.  

Well, I certainly can't speak for Richard, but if I look at the task detail page for that first task in your list, I see:

Report deadline 26 Jul 2015, 13:03:07 UTC

That would have been the original 25-day deadline. The workunit detail page is showing when the task was "reported", or in this case, abandoned. It looks like all your tasks were abandoned at the same time, so you might take a look in your Event Log around that time to see if you've had a timeout similar to what TBar reported in the first post in this thread. If you did, then that really is what they're busily trying to fix here (and may very well have, if their proposed solution passes muster).
ID: 1697594 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1697616 - Posted: 2 Jul 2015, 5:25:50 UTC

Yes, definitely looks as though the deadline was good. You can find a copy of the Log in stdoutdae.txt. Look on either side of the Abandoned time, there should be a Failed Request before and a reference to an Unlogged Request a short while after.
ID: 1697616 · Report as offensive
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · Next

Message boards : Number crunching : Suddenly BOINC Decides to Abandon 71 APs...WTH?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.