Power Tip: Schematron Validation

It is expected that organizations and communities of interest may have additional constraints on the data in their DDMS Resources, besides the rules in the DDMS specification. DDMSence provides support for these rules through the ISO Schematron standard. Using a combination of a configurable XSLT engine and XOM, DDMSence can validate a Resource against a custom Schematron file (.sch) and return the results of validation as a list of ValidationMessages. The XSLT transformation makes use of Rick Jelliffe's mature implementation of ISO Schematron.

Creating a custom Schematron file is outside the scope of this documentation, but there are plenty of Schematron tutorials available online, and I have also codified several complex rules from the DDMS Specification for example's sake in the Explorations section. There are two very simple examples in the /data/sample/schematron/ directory. The file, testPublisherValueXslt1.sch examines the surname of person designated as a publisher and fails if the surname is "Uri".

<iso:pattern title="Fixed Surname Value">
   <iso:rule context="//ddms:publisher/ddms:person/ddms:surname">
      <iso:report test="normalize-space(.) = 'Uri'">Members of the Uri family cannot be publishers.</iso:report>

Figure 1. The test from testPublisherValueXslt1.sch

The file, testPositionValuesXslt2.sch forces any positions to match an exact location in Reston, Virginia. It makes use of the XPath 2.0 function, tokenize(), so it must be handled with an XSLT2-compatible engine. DDMSence decides whether to use XSLT1 or XSLT2 based on the queryBinding attribute on the root element of your Schematron file. The supported values are xslt or xslt2, and the former will be the default if this attribute does not exist.

<iso:pattern id="FGM_Reston_Location">
   <iso:rule context="//gml:pos">
      <iso:let name="firstCoord" value="number(tokenize(text(), ' ')[1])"/>
      <iso:let name="secondCoord" value="number(tokenize(text(), ' ')[2])"/>
      <iso:assert test="$firstCoord = 38.95">The first coordinate in a gml:pos element must be 38.95 degrees.</iso:assert>
      <iso:assert test="$secondCoord = -77.36">The second coordinate in a gml:pos element must be -77.36 degrees.</iso:assert>

Figure 2. The test from testPositionValuesXslt2.sch

The following code sample will build a DDMS Resource from one of the sample XML files, and then validate it through Schematron:

File resourceFile = new File("data/sample/5.0-ddmsenceExample.xml");
File schFile = new File("data/sample/schematron/testPublisherValueXslt1.sch");

DDMSReader reader = new DDMSReader();
Resource resource = reader.getDDMSResource(resourceFile);
List<ValidationMessage> schematronMessages = resource.validateWithSchematron(schFile);
for (ValidationMessage message : schematronMessages) {
   System.out.println("Location: " + message.getLocator());
   System.out.println("Message: " + message.getText());

Figure 3. Sample code to validate 5.0-ddmsenceExample.xml with testPublisherValueXslt1.sch

Location: //*[local-name()='Resource' and namespace-uri()='urn:us:mil:ces:metadata:ddms:5']
   /*[local-name()='publisher' and namespace-uri()='urn:us:mil:ces:metadata:ddms:5']
   /*[local-name()='person' and namespace-uri()='urn:us:mil:ces:metadata:ddms:5']
   /*[local-name()='surname' and namespace-uri()='urn:us:mil:ces:metadata:ddms:5']
Message: Members of the Uri family cannot be publishers.

Figure 4. Ouput of the code from Figure 3

Schematron files are made up of a series of patterns and rules which assert rules and report information. The raw output of Schematron validation is a series of failed-assert and successful-report elements in Schematron Validation Report Language (SVRL). DDMSence converts this output into ValidationMessages with a locator value taken from the location attribute in SVRL. The type returned is "warning" for "successful-report" messages and "error" for "failed-assert" messages. It is important to notice that 1) Schematron validation can only be performed on Resources which are already valid according to the DDMS specification and 2) the results of Schematron validation will never invalidate the DDMSence object model. It is the responsibility of the Schematron user to react to any ValidationMessages.

Schematron files contain the XML namespaces of any elements you might traverse -- please make sure you use the correct namespaces for the version of DDMS you are employing. The sample files described above are written only for DDMS 5.0.

Validating with Intelligence Community Schematron Files

The Intelligence Community specifications (ISM, NTK, and VIRT) include Schematron files for validating logical constraints on the IC attributes. DDMSence does not include these files, but you can download the Public Release versions from the ODNI website, and your organization might have access to versions from higher classification levels as well. The top-level Schematron file for ISM, ISM/Schematron/ISM/ISM_XML.sch is the orchestration point for each of the supporting files and the vocabularies needed for validation. A similar pattern can be seen in NTK and VIRT.

Here is an example which validates one of the sample DDMS metacards against the ISM.XML Schematron files. It assumes that the top-level file and all of the files and subdirectories it depends on have been copied into the working directory.

File schematronFile = new File("ISM_XML.sch");
Resource resource = new DDMSReader().getDDMSResource(new File("data/sample/4.1-ddmsenceExample.xml"));
List<ValidationMessage> messages = resource.validateWithSchematron(schematronFile);
for (ValidationMessage message : messages) {
   System.out.println("Location: " + message.getLocator());
   System.out.println("Message: " + message.getText());

Figure 5. Sample code to validate with ISM.XML Schematron Files

Running this code will not display any errors or warnings, but we can make the output more exciting by intentionally breaking a rule. One of the rules described in the DES states that ISM:ownerProducer token values must be in alphabetical order (ISM-ID-00100). If you edit this attribute on the root node of the DDMS resource file so the value is "USA AUS" and then run the code again, you should get the following output.

Location: //*:resource[namespace-uri()='urn:us:mil:ces:metadata:ddms:4'][1]
Message: [ISM-ID-00100][Error] If ISM-CAPCO-RESOURCE and attribute ownerProducer is specified, then each of its values must 
   be ordered in accordance with CVEnumISMOwnerProducer.xml. The following values are out of order [AUS] for [USA AUS]

Figure 6. Schematron output when intentionally flaunting the rules

Be aware that a DDMS 5.0 assertion is not a complete record on its own -- it is intended for insertion into an IC Trusted Data Object. Because of this, using IC Schematron files on the assertion alone may not be successful. You will need to validate the entire TDO record with the Schematron files. This operation is outside the scope of what DDMSence offers.

Supported XSLT Engines

DDMSence comes bundled with Saxon Home Edition because it supports both XSLT1 and XSLT2 transformations. Support for alternate engines is provided through the xml.transform.TransformerFactory configurable property, which can be set to the class name of another processor. Please see the Power Tip on Configurable Properties for details on how to set this property. The table below lists the engines I have tested with. None of the engines listed will work with XSLT 2 Schema-Aware (SA) Schematron files.

Name and VersionClass NameXSLT1XSLT2
Saxon HE
Xalan interpretive, v2.7.1org.apache.xalan.processor.TransformerFactoryImplsupportedfails, doesn't support XSLT 2.0
Xalan XSLTC, v2.7.1org.apache.xalan.xsltc.trax.TransformerFactoryImplfails, SVRL transformation doesn't seem to occur properlyfails, doesn't support XSLT 2.0
Xalan XSLTC, bundled with Java 1.5com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImplfails, Xalan bug treats XSLT warning as an errorfails, doesn't support XSLT 2.0
Xalan XSLTC, bundled with Java 1.6com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImplsupportedfails, doesn't support XSLT 2.0

Table 1. XSLT Engines for Schematron Validation

Be aware that DDMSence also uses the Saxon engine for some utility library functions. Even if you choose an alternate XSLT engine for Schematron validation, you will still need to include the Saxon JAR file in your classpath.

Back to Top
Back to Power Tips