Normal for Astropulse units to slow down with time?

Questions and Answers : GPU applications : Normal for Astropulse units to slow down with time?
Message board moderation

To post messages, you must log in.

AuthorMessage
Thunder

Send message
Joined: 3 May 03
Posts: 65
Credit: 993,581
RAC: 0
United States
Message 1778772 - Posted: 13 Apr 2016, 17:39:24 UTC

I have one linux machine with an Nvidia GT730 that's acting a bit odd with an Astropulse v7 task.

Quite a while back it downloaded 1 with an estimated completion time of about 2.25h. It started it and I watched as it finished the first few percent and saw that it was completing about 1% every minute or minute and a half. I figured the estimate was about right.

About 18 hours later, I had the chance to glance at that machine and was surprised to find it still working on the same task and the rate had slowed to about 1 1/1000th of a percent every 10 minutes. However, it was beyond 99%, so I figured I'd let it finish.

About 45 minutes later, I decided it wasn't progressing at all and ended up restarting BOINC. Sadly, I learned that apparently those don't checkpoint because it reverted to 0% done. :-(

Fast forward to today and it's restarted that same task again. It's now at about 3.5h (estimate was 2.25h) and has obviously slowed to where it's finishing about 1% of the task every 10 minutes.

I can't tell a lot about what the GPU is doing other than by temperature and it's sitting coolly about 1C above the temp I would expect it to be at idle, so it's clearly not doing much.

Is this just a bad task? Is it normal behavior?
ID: 1778772 · Report as offensive
Thunder

Send message
Joined: 3 May 03
Posts: 65
Credit: 993,581
RAC: 0
United States
Message 1779024 - Posted: 14 Apr 2016, 13:06:16 UTC - in response to Message 1778772.  

4/14/2016 8:01:10 AM Aborting task ap_24jl10ac_B6_P1_00323_20160408_21551.wu_0: exceeded elapsed time limit 82718.30 (18210521.14G/220.15G)

This was the message I got after 23h 15m of time and the task being at 99.995% complete. :-P

Between the 2 attempts, there's about 41 hours of processing time that could have actually gone to something useful.

In the absence of any response, I guess I'm just turning off Astropulse work.
ID: 1779024 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15183
Credit: 4,362,181
RAC: 3
Netherlands
Message 1779040 - Posted: 14 Apr 2016, 15:28:33 UTC - in response to Message 1778772.  

About 45 minutes later, I decided it wasn't progressing at all and ended up restarting BOINC.

Have you ever tried to restart the computer, to clear the GPU's memory in case something was stuck there?

Sadly, I learned that apparently those don't checkpoint because it reverted to 0% done. :-(

Then something was definitely wrong with your computer, because even though Astropulse tasks don't checkpoint as many times as the Multibeam tasks do, they do checkpoint. Something like 10 times or so.

Of course, there are APs that don't run well on all computers, but normally they err in different ways, they don't tend to hold up the GPU for many hours before crashing.

But a next time, try to reboot the computer first. You never know what may have been stuck until you try. :)
ID: 1779040 · Report as offensive
Thunder

Send message
Joined: 3 May 03
Posts: 65
Credit: 993,581
RAC: 0
United States
Message 1782771 - Posted: 27 Apr 2016, 1:57:31 UTC

Well, the good news is that another AP task finally floated down to that machine and it finished completely normally.

I'm going to guess the previous one just had "issues".
ID: 1782771 · Report as offensive

Questions and Answers : GPU applications : Normal for Astropulse units to slow down with time?


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.