Suppose I have some kind of HTML code like this (generated from Markdown or Textile or something else):
<h1>A header</h1> <p>Foo</p> <h2>Another header</h2> <p>More content</p> <h2>Different header</h2> <h1>Another toplevel header <!-- and so on -->
How can I generate a table of contents for it using Python?
Use an HTML parser like lxml or BeautifulSoup to find all the title elements.
Here is an example using lxml and xpath.
from lxml import etree doc = etree.parse("test.xml") for node in doc.xpath('//h1|//h2|//h3|//h4|//h5'): print node.tag, node.text