Programming
18 April 2012 3 Comments

Fixing SAXParser Error “The system cannot find the file specified” for DTD files

When parsing an XML file with the SAXParser class, you may run into an error related to a .dtd file that cannot be found.

Example: We are parsing the file D:\homologene\build65\homologene.xml.

The first lines of the XML are:

<?xml version="1.0"?>
<!DOCTYPE HG-EntrySet PUBLIC "-//NCBI//HomoloGene/EN" "HomoloGene.dtd">
<HG-EntrySet>
  <HG-EntrySet_entries>
    <HG-Entry>
      <HG-Entry_hg-id>3</HG-Entry_hg-id>

We see a DOCTYPE declaration that points to a DTD file. DTD stands for Document Type Definition, and it is used to define the format of the XML file. The SAXParser will automatically look for this file in the same directory as the XML file.

When parsing we get the following error:

java.io.FileNotFoundException: D:\homologene\build65\HomoloGene.dtd (The system cannot find the file specified)
  at java.io.FileInputStream.open(Native Method)
  at java.io.FileInputStream.<init>(Unknown Source)
  at java.io.FileInputStream.<init>(Unknown Source)
  at sun.net.www.protocol.file.FileURLConnection.connect(Unknown Source)
  at sun.net.www.protocol.file.FileURLConnection.getInputStream(Unknown Source)
  at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
  at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
  at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
  at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
  at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
  at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
  at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
  at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
  at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
  at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
  at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
  at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
  at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
  at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
  at loader.homologene.HomologeneParser.parseFile(HomologeneParser.java:95)
  at loader.homologene.HomologeneParser.parse(HomologeneParser.java:70)
  at loader.homologene.HomologeneMain.parse(HomologeneMain.java:33)

The reason for the error is that the DTD file does not exist on the filesystem.

To suppress this error and parse the XML without using the DTD, we override the default resolveEntity() function in the EntityResolver interface:

public class ParseTest implements DefaultHandler
{
  public void parse()
  {
    SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
    SAXParser saxParser = saxParserFactory.newSAXParser();
    saxParser.parse("D:\\homologene\\build65\\homologene.xml");
  }
 
  public InputSource resolveEntity(String publicId, String systemId)
  {
    return new InputSource(new ByteArrayInputStream("<xml version='1.0' encoding='UTF-8'>".getBytes()));
  }
 
  // ...other handler functions...
}

Now the file will be parsed without problems. Of course it is better to find and use the DTD file, so use this workaround at your own risk.

3 Responses to “Fixing SAXParser Error “The system cannot find the file specified” for DTD files”

  1. John 23 April 2013 at 6:57 pm #

    <!DOCTYPE Radio[

    ]>

  2. John 23 April 2013 at 7:01 pm #

    ]>

  3. Anonymous 24 April 2013 at 12:09 pm #

    where r u la? i am missin u lds ked! cum bck home where u belong pls. Or 1 bell us laa


Leave a Reply

You can use: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">

Examples:
<code>This is some inline code<
<pre lang"text">Block of code or text (not syntax highlighted)</pre>
<pre lang"python">Block of Python code (syntax highlighted)</pre>