I am looking for a way to extract / copy data from Word files to a database. Our corporate procedures contain customer meeting protocols documented in MS Word files, mainly due to history and inertia.
I want to be able to pull action items from these meeting protocols into a database so that we can access them from the web interface, turn them into tasks, and update them as they are completed.
What is the best way to do this:
- a VBA macro from inside Word to create a CSV and then load into a DB?
- VBA macro in Word with database connection (how to connect to MySQL from VBA?)
- Python script via win32com, then load into DB?
The latter is attractive to me, since the web interface is built using Django, but I never used win32com or tried to use the Word script from python.
EDIT: I started extracting text using VBA because it makes it easier to work with the Word object model. I have a problem: all the text in the tables, and when I pull the rows from CELLS that I want, a strange little character appears at the end of each row. My code looks like this:
sFile = "D:\temp\output.txt"
fnum = FreeFile
Open sFile For Output As #fnum
num_rows = Application.ActiveDocument.Tables(2).Rows.Count
For n = 1 To num_rows
Descr = Application.ActiveDocument.Tables(2).Cell(n, 2).Range.Text
Assign = Application.ActiveDocument.Tables(2).Cell(n, 3).Range.Text
Target = Application.ActiveDocument.Tables(2).Cell(n, 4).Range.Text
If Target = "" Then
ExportText = ""
Else
ExportText = Descr & Chr(44) & Assign & Chr(44) & _
Target & Chr(13) & Chr(10)
Print #fnum, ExportText
End If
Next n
Close #fnum
What happens to a small control field? Is some kind of character code that comes from Word?
source
share