AP v7.09 Testing

Message boards : AstroPulse : AP v7.09 Testing
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Jeff Buck
Volunteer tester

Send message
Joined: 11 Dec 14
Posts: 96
Credit: 1,240,941
RAC: 0
United States
Message 53935 - Posted: 31 Mar 2015, 4:49:14 UTC

Okay, I fired up my T7400 this evening to test v7.09. Since the box has 3 NVIDIA GPUs (GTX 660, GTX 670, GTX 780), I ran several tests in groups of 3 so that I'd get one on each GPU.

Initial run was simply using defaults. The v7.09 (r2745) application files seemed to download normally, including the empty device-specific configuration file. Tasks seemed to run normally, as follows:

GTX 660: 18810857
GTX 670: 18810865
GTX 780: 18810861

Another group used basic device-specific values for each of the 3 GPUs, as follows:
<device0>
   <unroll>16</unroll>
   <ffa_block>12288</ffa_block>
   <ffa_block_fetch>6144</ffa_block_fetch>
</device0>   
<device1>
   <unroll>12</unroll>
   <ffa_block>4096</ffa_block>
   <ffa_block_fetch>2048</ffa_block_fetch>
</device1>   
<device2>
   <unroll>14</unroll>
   <ffa_block>8192</ffa_block>
   <ffa_block_fetch>2048</ffa_block_fetch>
</device2>


Results look normal, with each GPU getting the correct overrides.
GTX 660: 18811272
GTX 670: 18811249
GTX 780: 18811263

So, that brings me to testing the device-specific config file when it has a typo in one of the tags, the issue I raised in my earlier message in the AP 7.06 thread. Recreating what was originally an inadvertent typo, I changed the second device end tag from the correct </device1> to </device0>, giving:
<device0>
   <unroll>16</unroll>
   <ffa_block>12288</ffa_block>
   <ffa_block_fetch>6144</ffa_block_fetch>
</device0>   
<device1>
   <unroll>12</unroll>
   <ffa_block>4096</ffa_block>
   <ffa_block_fetch>2048</ffa_block_fetch>
</device0>   
<device2>
   <unroll>14</unroll>
   <ffa_block>8192</ffa_block>
   <ffa_block_fetch>2048</ffa_block_fetch>
</device2>


Unfortunately, this still results in the same erroneous overrides being applied to the GTX 660 as I experienced originally with r2742 over on Seti Main, with no apparent error handling. Device 1 (GTX 660) initially appears to have the correct overrides applied, but then they are superseded by the overrides for Device 2 (GTX 670).

Here are the results:
GTX 660: 18810956 INCORRECT OVERRIDES
GTX 670: 18810948
GTX 780: 18810972

That's all I have time for tonight, but I'll try some more experiments tomorrow. After all, since the device-specific config file is going to rely on manual input, it needs to have some error handling capability. As I said in my earlier message:

I don't think the application should choke on malformed xml tags like this, but it seems to me that, at a minimum, it should identify that a problem has been encountered and write a Warning message to the Stderr. The same would apply if a correct corresponding end tag for the device wasn't found before the next device start tag was encountered.

It also seems to me that the application should do something other than simply continue to read override parameters from the file when a bad or missing end tag has been identified. After issuing the warning message, I think it would be reasonable to either go ahead and use whatever parameters have already been successfully parsed for the device, or else just revert to the defaults (either from the application or, if present, from the ap_cmdline).
ID: 53935 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 16 Jun 05
Posts: 2530
Credit: 1,074,556
RAC: 0
Germany
Message 53936 - Posted: 31 Mar 2015, 11:51:46 UTC
Last modified: 31 Mar 2015, 12:01:02 UTC


<device1> <unroll>12</unroll>
<ffa_block>4096</ffa_block>
<ffa_block_fetch>2048</ffa_block_fetch>
</device0>


You put wrong device number at the end for device 1.

Needs to be </device1>

So commands for device 1 are not finnished and it continues with next command.
With each crime and every kindness we birth our future.
ID: 53936 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,266,428
RAC: 0
United Kingdom
Message 53938 - Posted: 31 Mar 2015, 14:58:26 UTC - in response to Message 53936.  

You put wrong device number at the end for device 1.

Yes, he said that it was a deliberate mistake (but a simple one - it's the sort of thing we've all done in a hurry). The idea was to test the application's error handling and error reporting.
ID: 53938 · Report as offensive
Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 10 Sep 10
Posts: 21
Credit: 852,516
RAC: 0
United States
Message 53942 - Posted: 31 Mar 2015, 21:41:27 UTC

Someone needs to feed the splitter. I'm attempting to test a GTX750Ti & a GTX660 that I pulled out of my parts bin, but need data to test.


I don't buy computers, I build them!!
ID: 53942 · Report as offensive
Profile Jeff Buck
Volunteer tester

Send message
Joined: 11 Dec 14
Posts: 96
Credit: 1,240,941
RAC: 0
United States
Message 53943 - Posted: 1 Apr 2015, 4:21:22 UTC

This evening I ran 3 more groups of tests focusing on theoretical user-induced error conditions within the device-specific config file.

The first group is similar to my earlier test which introduced a typo into a device end tag. This time I simply omitted an end tag altogether, in this case the </device0> tag, as follows:

<device0>
   <unroll>16</unroll>
   <ffa_block>12288</ffa_block>
   <ffa_block_fetch>6144</ffa_block_fetch>
<device1>
   <unroll>12</unroll>
   <ffa_block>4096</ffa_block>
   <ffa_block_fetch>2048</ffa_block_fetch>
</device1>   
<device2>
   <unroll>14</unroll>
   <ffa_block>8192</ffa_block>
   <ffa_block_fetch>2048</ffa_block_fetch>
</device2>

The resulting behavior was essentially the same as the earlier test, with the application ignoring the presence of the new <device1> and <device2> start tags because it had not yet found an end tag for device0. Therefore, for the GTX 780 (device0) task, it first loaded the device0 config values, overlaid them with the device1 values, and then overlaid them again with the device2 values. The GTX 660 and GTX 670 were not affected by this condition.

Results:
GTX 780 (device0): 18811268
GTX 660 (device1): 18811264
GTX 670 (device2): 18811258

My second group was intended to test invalid (alpha) characters in each of the 3 configuration options I've been using. I spread them across the 3 devices, with each one getting a single bad value, as follows:

<device0>
   <unroll>1t</unroll>
   <ffa_block>12288</ffa_block>
   <ffa_block_fetch>6144</ffa_block_fetch>
</device0>   
<device1>
   <unroll>12</unroll>
   <ffa_block>4a96</ffa_block>
   <ffa_block_fetch>2048</ffa_block_fetch>
</device1>   
<device2>
   <unroll>14</unroll>
   <ffa_block>8192</ffa_block>
   <ffa_block_fetch>s048</ffa_block_fetch>
</device2>

The application did end up rejecting all 3 invalid values and substituting defaults. However, I don't think it initially rejected the bad values flat out, but rather first truncated the values at the bad character, then rejected the resulting truncated value as perhaps out-of-range. In any event, WARNING messages were generated in the Stderr and default values were substituted for the invalid ones. In the case of the ffa_block and ffa_block_fetch errors, it substituted default values for both if either one was invalid, which certainly seems sensible.

Results:
GTX 780 (device0): 18811261
GTX 660 (device1): 18811257
GTX 670 (device2): 18811274

My third test was similar to the second, but instead of using invalid values, I simply omitted values completely (a different one for each device), while leaving the start and end tags for those values in place, as follows:

<device0>
   <unroll>16</unroll>
   <ffa_block></ffa_block>
   <ffa_block_fetch>6144</ffa_block_fetch>
</device0>   
<device1>
   <unroll>12</unroll>
   <ffa_block>4096</ffa_block>
   <ffa_block_fetch></ffa_block_fetch>
</device1>   
<device2>
   <unroll></unroll>
   <ffa_block>8192</ffa_block>
   <ffa_block_fetch>2048</ffa_block_fetch>
</device2>


No anomalies here. In this instance, the application appears to have treated each missing value as a zero value, which was then flagged as incorrect. WARNING messages were posted to the Stderr and defaults were substituted, the same as in my previous test.

Results:
GTX 780 (device0): 18811278
GTX 660 (device1): 18811280
GTX 670 (device2): 18811284

That's all for tonight.
ID: 53943 · Report as offensive
Profile Jeff Buck
Volunteer tester

Send message
Joined: 11 Dec 14
Posts: 96
Credit: 1,240,941
RAC: 0
United States
Message 53963 - Posted: 3 Apr 2015, 4:40:31 UTC

And tonight I managed to get in three more groups of tests focused on potential user mistakes in the device-specific config file.

The first group focused on typos in the individual parameter end tags, either omitting a character or adding an extra character, so that the end tag no longer matched its corresponding start tag, as follows:

<device0>
   <unroll>16<unroll>
   <ffa_block>12288</ffa_block>
   <ffa_block_fetch>6144</ffa_block_fetch>
</device0>   
<device1>
   <unroll>12</unroll>
   <ffa_block>4096</ffa_block
   <ffa_block_fetch>2048</ffa_block_fetch>
</device1>   
<device2>
   <unroll>14</unroll>
   <ffa_block>8192</ffa_block>
   <ffa_block_fetch>2048</fffa_block_fetch>
</device2>

Unlike the problem caused by having a typo in a "device" end tag, these incorrect tags did not cause any problems and were apparently ignored by the application, since no warning message showed up in the Stderr. The tasks on all 3 GPUs ran with the desired device-specific values.

Results:
GTX 780 (device0): 18811286
GTX 660 (device1): 18829664
GTX 670 (device2): 18829672

Another group took the above test one step farther, by completing omitting the end tag for one of the parameters specified for each device, as follows:

<device0>
   <unroll>16</unroll>
   <ffa_block>12288</ffa_block>
   <ffa_block_fetch>6144
</device0>   
<device1>
   <unroll>12
   <ffa_block>4096</ffa_block>
   <ffa_block_fetch>2048</ffa_block_fetch>
</device1>   
<device2>
   <unroll>14</unroll>
   <ffa_block>8192
   <ffa_block_fetch>2048</ffa_block_fetch>
</device2>

Again, unlike the problem caused by having a missing "device" end tag, these missing parameter end tags caused no ill effects. The tasks on all 3 GPUs ran with the desired device-specific values.

Results:
GTX 780 (device0): 18829654
GTX 660 (device1): 18829677
GTX 670 (device2): 18829801

The other test group this evening was intended to find out if the XML tags are case-sensitive. One or more letters were capitalized in the start tag for one of the parameters on each device, as follows:

<device0>
   <unroll>16</unroll>
   <FFA_block>12288</ffa_block>
   <ffa_block_fetch>6144</ffa_block_fetch>
</device0>   
<device1>
   <unroll>12</unroll>
   <ffa_block>4096</ffa_block>
   <FFA_block_fetch>2048</ffa_block_fetch>
</device1>   
<device2>
   <Unroll>14</unroll>
   <ffa_block>8192</ffa_block>
   <ffa_block_fetch>2048</ffa_block_fetch>
</device2>

Yep, they're case sensitive. There's no warning message in the Stderr that an invalid tag was found. The application just ignores that entire parameter. In the case of the ignored <FFA_block> parameter, the resulting use of a default value meant that the corresponding <ffa_block_fetch> from the config file was then flagged as invalid and a WARNING message was posted to the Stderr to that effect. In my test, the reverse situation, with a default value used for the ignored <FFA_block_fetch> parameter, didn't happen to cause a conflict with the entered <ffa_block> parameter, but I would guess that it could if other values had been entered.

Results:
GTX 780 (device0): 18829678
GTX 660 (device1): 18829667
GTX 670 (device2): 18829662

Is there a particular need for these tags to be case-sensitive? If not, I would think that allowing them to be case-independent could avoid the occasional headache down the line.
ID: 53963 · Report as offensive

Message boards : AstroPulse : AP v7.09 Testing


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.