Use Talend to extract HTML search pages into .txt files based on input keywords. How can I parse this data at the end and write in MySQL?

To add to the title: now I have a two-step workflow.

1) I retrieve the HTML search result pages for each keyword specified in the input.txt file. - eg:

SAP; 
Business Intelligence;

Talend saved these results and writes them as HTML in keywords_SAP.txtand keywords_Business Intelligence.txt. Attached is an image of a talent assignment.

Talend workflow

2) I use Java code to import these files (one by one). Parse data from the DOM structure using the JSoup library. Data is deleted, data is written to the MySQL database.

: , - , .

Java- Talend, , mysql jsoup.jar.

- , Talend, , , , .

, , , , . , , . , . .

+4
3

, tLoadLibrary jar <talendInstallDir>/lib/java

onSubJobOk onComponentOK .

tLibraryLoad , .

/ tJava, tJavaRow " " , - :

import org.apache.commons.lang3.math.NumberUtils;

( Apache Commons NumberUtils).

+3

tLoadLibrary , OnSubjobOk, tJava-. enter image description here

+2

Although this topic is 2 years old, and you may have already solved this problem. I recently did a similar mini-project, and this may help you. I use simple string manipulation instead of the JSoup library. There is also a related video step-by-step instructions. Hope this helps.

Talend Webpage Analysis Project

0
source

All Articles