I have a small C # application that extracts text from a Microsoft Publisher file through the COM Interop API. This works fine, but I'm afraid if I have several styles in one section. Potentially, each character in a word could have a different font, format, etc.
Do I need to compare character after character? Or is there something that gives me different sections of the style? It seems I can get different paragraphs?
foreach (Microsoft.Office.Interop.Publisher.Shape shp in pg.Shapes)
{
if (shp.HasTextFrame == MsoTriState.msoTrue)
{
text.Append(shp.TextFrame.TextRange.Text);
for(int i = 0; i< shp.TextFrame.TextRange.WordsCount; i++)
{
TextRange range = shp.TextFrame.TextRange.Words(i+1, 1);
string test = range.Text;
}
}
}
Or is there a generally better way to extract text from a Publisher file? But I should be able to actually write it with the same formatting. This is for translation.
source
share