Bug "inside long FFA"?

留言板 : Number crunching : Bug "inside long FFA"?
留言板合理

To post messages, you must log in.

作者消息
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14152
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1595080 - 发表于:31 Oct 2014, 19:58:10 UTC - 回复消息 1595058.  

There's a fix for that being pushed through testing,
                                                                  Joe

Test results for these instances added to the test matrix.
ID: 1595080 · 举报违规帖子
Josef W. Segur
志愿者开发人员
志愿者测试人员

发送消息
已加入:30 Oct 99
贴子:4504
积分:1,414,761
近期平均积分:0
United States
消息 1595058 - 发表于:31 Oct 2014, 19:33:07 UTC - 回复消息 1594988.  

I have two AP tasks that have been looping with repeated "exited with zero status but no 'finished' file."

Looking inside the result reports, they both say this:

ERROR: some exception inside long FFA, probably video-driver restart, restarting app...

The command line options are as follows:

-use_sleep -hp -unroll 16 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 8 1 -tune 2 64 8 1 


Since the GTX680 card that these tasks have run on behaves quite well with all other tasks, and still repeatedly denies to finish just these two, I think I may have stumbled upon a reproduceable error.

Anyone want to investigate? Any additional info needed?

Edit: Lunatics 0.43, running three tasks on the card.


That "ERROR: some exception inside long FFA..." is what shows when the app runs out of memory while trying to handle thousands of repetitive pulses above threshold. There's a fix for that being pushed through testing, meanwhile reducing how many signals have to be handled simultaneously by reducing the -ffa_block and -ffa_block_fetch is the way to go.
                                                                  Joe
ID: 1595058 · 举报违规帖子
JohnDK Crowdfunding Project Donor*Special Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:28 May 00
贴子:1200
积分:451,243,443
近期平均积分:1,127
Denmark
消息 1595033 - 发表于:31 Oct 2014, 18:44:17 UTC

Had a WU like that last night, don't know how many restarts it had, but I decided to abort it...

http://setiathome.berkeley.edu/workunit.php?wuid=1627971982
ID: 1595033 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14152
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1595017 - 发表于:31 Oct 2014, 18:23:13 UTC - 回复消息 1595010.  
最近的修改日期:31 Oct 2014, 19:05:29 UTC

That's my point exactly, Richard.
I'm a software engineer myself, and when I see a reoccurring error like this, I know that a developer with a debugger can normally find out exactly what goes wrong.

I don't have a debugger, but I do have a GTX 670 - close match - and I'm starting to run tests under bench conditions, starting with default cmdline parameters. One thing to watch out for will be abnormally high memory usage on these tasks.

Edit - seemed to settle at 244 MB usage after a couple of minutes. High, but not as high as we've seen under bug conditions.

Edit2 - memory consumption over a gigabyte with the same command line, single instance. This does look like the bug which was already under investigation - will run with the bugfix version already under test next.

So, Mike's suggestion might be a temporary palliative while we wait for the bugfix to complete acceptance testing - or leave the commandline as it is if you're prepared to risk the same thing happening again, and supply additional test cases for the testing pool.
ID: 1595017 · 举报违规帖子
Profile Mike Special Project $75 donor
志愿者测试人员
Avatar

发送消息
已加入:17 Feb 01
贴子:32233
积分:79,922,639
近期平均积分:80
Germany
消息 1595013 - 发表于:31 Oct 2014, 18:20:29 UTC - 回复消息 1595010.  

That's my point exactly, Richard.
I'm a software engineer myself, and when I see a reoccurring error like this, I know that a developer with a debugger can normally find out exactly what goes wrong.


So i dont have to worry any longer.
With each crime and every kindness we birth our future.
ID: 1595013 · 举报违规帖子
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:15 May 99
贴子:220
积分:349,610,548
近期平均积分:1,728
Norway
消息 1595010 - 发表于:31 Oct 2014, 18:11:45 UTC - 回复消息 1595001.  

That's my point exactly, Richard.
I'm a software engineer myself, and when I see a reoccurring error like this, I know that a developer with a debugger can normally find out exactly what goes wrong.
ID: 1595010 · 举报违规帖子
Profile Mike Special Project $75 donor
志愿者测试人员
Avatar

发送消息
已加入:17 Feb 01
贴子:32233
积分:79,922,639
近期平均积分:80
Germany
消息 1595008 - 发表于:31 Oct 2014, 18:08:45 UTC - 回复消息 1595001.  

Reduce unroll to 12 and ffa_block to 12288 6144.
See if this helps.

With 1061 valid AP v7 tasks so far, I think he knows how to drive the application. The question was why these two (and only these two, as I understand him) should have behaved differently.


Do you want to tell me how the app works ?
With each crime and every kindness we birth our future.
ID: 1595008 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14152
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1595001 - 发表于:31 Oct 2014, 17:54:26 UTC - 回复消息 1594994.  

Reduce unroll to 12 and ffa_block to 12288 6144.
See if this helps.

With 1061 valid AP v7 tasks so far, I think he knows how to drive the application. The question was why these two (and only these two, as I understand him) should have behaved differently.
ID: 1595001 · 举报违规帖子
Profile Mike Special Project $75 donor
志愿者测试人员
Avatar

发送消息
已加入:17 Feb 01
贴子:32233
积分:79,922,639
近期平均积分:80
Germany
消息 1594994 - 发表于:31 Oct 2014, 17:48:09 UTC

Reduce unroll to 12 and ffa_block to 12288 6144.
See if this helps.
With each crime and every kindness we birth our future.
ID: 1594994 · 举报违规帖子
Richard Haselgrove Project Donor
志愿者测试人员

发送消息
已加入:4 Jul 99
贴子:14152
积分:200,643,578
近期平均积分:874
United Kingdom
消息 1594990 - 发表于:31 Oct 2014, 17:42:00 UTC - 回复消息 1594988.  

ID: 1594990 · 举报违规帖子
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
志愿者测试人员
Avatar

发送消息
已加入:15 May 99
贴子:220
积分:349,610,548
近期平均积分:1,728
Norway
消息 1594988 - 发表于:31 Oct 2014, 17:34:59 UTC
最近的修改日期:31 Oct 2014, 17:41:51 UTC

I have two AP tasks that have been looping with repeated "exited with zero status but no 'finished' file."

Looking inside the result reports, they both say this:

ERROR: some exception inside long FFA, probably video-driver restart, restarting app...

The command line options are as follows:

-use_sleep -hp -unroll 16 -oclFFT_plan 256 16 256 -ffa_block 16384 -ffa_block_fetch 8192 -tune 1 64 8 1 -tune 2 64 8 1 


Since the GTX680 card that these tasks have run on behaves quite well with all other tasks, and still repeatedly denies to finish just these two, I think I may have stumbled upon a reproduceable error.

Anyone want to investigate? Any additional info needed?

Edit: Lunatics 0.43, running three tasks on the card.
ID: 1594988 · 举报违规帖子

留言板 : Number crunching : Bug "inside long FFA"?


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.