I am new to AI. I use an application that classifies text through machine learning. An application should classify the various parts of an HTML document. For example, most web pages have a head, menu, sidebar, footer, main content, etc. I want to use a text classifier to classify these parts of an HTML document and to identify the different types of forms on the page.
- It would be very helpful if someone could provide detailed recommendations on this.
- Examples of such applications will also be very useful.
I am looking for additional technical suggestions regarding code and implementation.
I can assign tags to html tag attributes like class or id
<div class="menu-1"> <div id="entry"> <div id="content"> <div id="footer"> <div id="comment-12"> <div id="comment-title">
as for the first element:
TrainClassifier (label: "Menu", value: "menu-1", attribute: "class", line position: "21%", tag: "div");
Inputs
- "menu-1" (attribute value)
- List item
- "class" (attribute name)
- "21" (tag position in line)
- "div" (tag name)
Exit
- Menu (classified as a label)
Which neural network library can accept the above inputs and classify them into shortcuts (for example, menus).
All users cannot create regular expressions or xpath, they need a simpler approach, therefore it is important that the software is intelligent, the user can select a part of the required html document using the web browser control and train the software until it can work itself by oneself.
but I donβt know how to make a program using AI,
The AI ββthat I am looking for, as if it should be able to accept various input data and classify on the basis of this, as I said, a newcomer to AI, know little about it.
It would be useful if I got an answer to a question that I asked, for example, which library should I use and how to implement, the answers suggesting Xpath or Regex or other methods do not answer, it often happens that you get all the suggestions, except the ones you need.