Jsoup: get all header tags

I am trying to parse an html document with Jsoup to get all the header tags. Also, I need to group the header tags as [h1] [h2], etc.

hh = doc.select("h[0-6]"); 

but it gives me an empty array.

+10
source share
3 answers

Your selector means h-Tag with attribute "0-6" here is not a regular expression. But instead, you can combine several selectors: hh = doc.select("h0, h1, h2, h3, h4, h5, h6"); .

Grouping: Do you need a group with all h-tags + a group for each h1, h2, ... tag, or only for each h1, h2, ... tag?

Here is an example of how you can do this:

 // Group of all h-Tags Elements hTags = doc.select("h1, h2, h3, h4, h5, h6"); // Group of all h1-Tags Elements h1Tags = hTags.select("h1"); // Group of all h2-Tags Elements h2Tags = hTags.select("h2"); // ... etc. 

If you need a group for each h1, h2, ... tag, you can remove the first selector and replace hTags with doc in the rest.

+21
source

Use doc.select ("h1, h2, h3, h4, h5, h6") to get all the header tags. Use doc.select ("h1") to get each of these tags separately. See the various actions you can do with the select statement at http://preciselyconcise.com/apis_and_installations/jsoup/j_selector.php

+2
source

Here is a version of Scala's answer that uses Ammonite syntax to specify Maven coordinates for Jsoup:

 import $ivy.'org.jsoup:jsoup:1.11.3' val html = scala.io.Source.fromURL("https://scalacourses.com").mkString val doc = org.jsoup.Jsoup.parse(html) doc.select("h1, h2, h3, h4, h5, h6, h7").eachText() 
0
source

All Articles