Check out the OpenSP toolkit , which has programs for handling SGML files. Perhaps your easiest option is to use the osx program to get the XML version of the input file, after which you can use the XML processing tools.
There may be some tweaking at first, since the OpenSP package does not come with the EDGAR DTD or its SGML declaration (the first part of the material in your link is on page 48, starting with <!SGML "ISO 8879-1986" ). You will have to get them as text files and add them to the directories where the SP parser can find them.
UPDATE : This document seems to be a more modern version. However, a random Google search does not mean that all processed versions can be immediately processed by the machine. You may need to copy-paste from a PDF.
However, if you do this, some external formatting will appear that you will need to remove: it looks like there are page break indicators designated as "C-1", "C-2", etc. They are not part of SGML and must be removed.
You can either add the SGML declaration or EDGAR DTD to the directory (in this case, the DTD file should only have a part inside [after <!DOCTYPE submission and matching] at the end), or you can create a βprologβ consisting of both parts together, as it is (i.e. including <!DOCTYPE submission [ and ]> ), and run any program in the prolog toolkit and your SGML file β that is, put both names on the command line with the prolog file first, so that the analyzer reads both files are in the correct order. To understand what is happening, you need to know that the SGML parser needs three pieces of information for parsing: an SGML declaration for setting some environmental parameters and processing, then a DTD to describe the structural constraints for the document and, finally, the document itself.
source share