Can openoffice count words from the console?

I have a little problem, I need to count the words inside the console to read doc, docx, pptx, ppt, xls, xlsx, odt, pdf ... so don’t offer me | wc -w or grep, because they work only with text or console output, and they only count spaces, and in Japanese, Chinese, Arabic, Hindu, Hebrew they use a different separator, so the number of words is wrong, and I tried to count with of this

pdftotext file.pdf -| wc -w /usr/local/bin/docx2txt.pl < file.docx | wc -w /usr/local/bin/pptx2txt.pl < file.pptx | wc -w antiword file.doc -| wc -w antiword file.word -| wc -w 

In some cases, the word microsoft, openoffice prints 1000 words, and the counters return 10 or 300 words if the language (Japanese, Chinese, Hindu ect ...), but if I use normal characters, then I have no problem with the biggest mistake in in some case 3 characters less witch "OK"

I tried to convert using soffice, openoffice and then try WC -w, but I can’t even convert,

 soffice --headless --nofirststartwizard --accept=socket,host=127.0.0.1,port=8100; --convert-to pdf some.pdf /var/www/domains/vocabridge.com/devel/temp_files/23/0/东京_1000_words_Docx.docx 

OR

  openoffice.org --headless --convert-to ........ 

OR

 openoffice.org3 --invisible 

therefore, if someone knows some way to correctly calculate or display document statistics using openoffice or something else or linux with a console, please share it.

thanks.

+5
source share
5 answers

I found the answer by creating one service

 #!/bin/sh # # chkconfig: 345 99 01 # # description: your script is a test service # (while sleep 1; do ls pathwithfiles/in | while read file; do libreoffice --headless -convert-to pdf "pathwithfiles/in/$file" --outdir pathwithfiles/out rm "pathwithfiles/in/$file" done done) & 

then the php script that I need counted everything

  $ext = pathinfo($absolute_file_path, PATHINFO_EXTENSION); if ($ext !== 'txt' && $ext !== 'pdf') { // Convert to pdf $tb = mktime() . mt_rand(); $tempfile = 'locationofpdfs/in/' . $tb . '.' . $ext; copy($absolute_file_path, $tempfile); $absolute_file_path = 'locationofpdfs/out/' . $tb . '.pdf'; $ext = 'pdf'; while (!is_file($absolute_file_path)) sleep(1); } if ($ext !== 'txt') { // Convert to txt $tempfile = tempnam(sys_get_temp_dir(), ''); shell_exec('pdftotext "' . $absolute_file_path . '" ' . $tempfile); $absolute_file_path = $tempfile; $ext = 'txt'; } if ($ext === 'txt') { $seq = '/[\s\.,;:!\? ]+/mu'; $plain = file_get_contents($absolute_file_path); $plain = preg_replace('#\{{{.*?\}}}#su', "", $plain); $str = preg_replace($seq, '', $plain); $chars = count(preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY)); $words = count(preg_split($seq, $plain, -1, PREG_SPLIT_NO_EMPTY)); if ($words === 0) return $chars; if ($chars / $words > 10) $words = $chars; return $words; } 
+1
source

If you have Microsoft Word (and Windows, obviously), you can write a VBA macro, or if you want to run directly from the command line, you can write a VBScript script with something like the following:

 wordApp = CreateObject("Word.Application") doc = ... ' open up a Word document using wordApp docWordCount = doc.Words.Count ' Rinse and repeat... 

If you have OpenOffice.org/LibreOffice, you have similar (but more) options. If you want to stay in the office application and run the macro, you can probably do it. I don't know the StarBasic API well enough to tell you, but I can give you an alternative: create a Python script to get the number of words from the command line. Roughly speaking, you do the following:

+2
source

I think this can do what you strive for.

 # Continuously updating word count import unohelper, uno, os, time from com.sun.star.i18n.WordType import WORD_COUNT from com.sun.star.i18n import Boundary from com.sun.star.lang import Locale from com.sun.star.awt import XTopWindowListener #socket = True socket = False localContext = uno.getComponentContext() if socket: resolver = localContext.ServiceManager.createInstanceWithContext('com.sun.star.bridge.UnoUrlResolver', localContext) ctx = resolver.resolve('uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext') else: ctx = localContext smgr = ctx.ServiceManager desktop = smgr.createInstanceWithContext('com.sun.star.frame.Desktop', ctx) waittime = 1 # seconds def getWordCountGoal(): doc = XSCRIPTCONTEXT.getDocument() retval = 0 # Only if the field exists if doc.getTextFieldMasters().hasByName('com.sun.star.text.FieldMaster.User.WordCountGoal'): # Get the field wordcountgoal = doc.getTextFieldMasters().getByName('com.sun.star.text.FieldMaster.User.WordCountGoal') retval = wordcountgoal.Content return retval goal = getWordCountGoal() def setWordCountGoal(goal): doc = XSCRIPTCONTEXT.getDocument() if doc.getTextFieldMasters().hasByName('com.sun.star.text.FieldMaster.User.WordCountGoal'): wordcountgoal = doc.getTextFieldMasters().getByName('com.sun.star.text.FieldMaster.User.WordCountGoal') wordcountgoal.Content = goal # Refresh the field if inserted in the document from Insert > Fields > # Other... > Variables > Userdefined fields doc.TextFields.refresh() def printOut(txt): if socket: print txt else: model = desktop.getCurrentComponent() text = model.Text cursor = text.createTextCursorByRange(text.getEnd()) text.insertString(cursor, txt + '\r', 0) def hotCount(st): '''Counts the number of words in a string. ARGUMENTS: str st: count the number of words in this string RETURNS: int: the number of words in st''' startpos = long() nextwd = Boundary() lc = Locale() lc.Language = 'en' numwords = 1 mystartpos = 1 brk = smgr.createInstanceWithContext('com.sun.star.i18n.BreakIterator', ctx) nextwd = brk.nextWord(st, startpos, lc, WORD_COUNT) while nextwd.startPos != nextwd.endPos: numwords += 1 nw = nextwd.startPos nextwd = brk.nextWord(st, nw, lc, WORD_COUNT) return numwords def updateCount(wordCountModel, percentModel): '''Updates the GUI. Updates the word count and the percentage completed in the GUI. If some text of more than one word is selected (including in multiple selections by holding down the Ctrl/Cmd key), it updates the GUI based on the selection; if not, on the whole document.''' model = desktop.getCurrentComponent() try: if not model.supportsService('com.sun.star.text.TextDocument'): return except AttributeError: return sel = model.getCurrentSelection() try: selcount = sel.getCount() except AttributeError: return if selcount == 1 and sel.getByIndex(0).getString == '': selcount = 0 selwords = 0 for nsel in range(selcount): thisrange = sel.getByIndex(nsel) atext = thisrange.getString() selwords += hotCount(atext) if selwords > 1: wc = selwords else: try: wc = model.WordCount except AttributeError: return wordCountModel.Label = str(wc) if goal != 0: pc_text = 100 * (wc / float(goal)) #pc_text = '(%.2f percent)' % (100 * (wc / float(goal))) percentModel.ProgressValue = pc_text else: percentModel.ProgressValue = 0 # This is the user interface bit. It looks more or less like this: ############################### # Word Count _ ox # ############################### # _____ # # 451 / |500| # # ----- # # ___________________________ # # |############## | # # --------------------------- # ############################### # The boxed `500' is the text entry box. class WindowClosingListener(unohelper.Base, XTopWindowListener): def __init__(self): global keepGoing keepGoing = True def windowClosing(self, e): global keepGoing keepGoing = False setWordCountGoal(goal) e.Source.setVisible(False) def addControl(controlType, dlgModel, x, y, width, height, label, name = None): control = dlgModel.createInstance(controlType) control.PositionX = x control.PositionY = y control.Width = width control.Height = height if controlType == 'com.sun.star.awt.UnoControlFixedTextModel': control.Label = label elif controlType == 'com.sun.star.awt.UnoControlEditModel': control.Text = label elif controlType == 'com.sun.star.awt.UnoControlProgressBarModel': control.ProgressValue = label if name: control.Name = name dlgModel.insertByName(name, control) else: control.Name = 'unnamed' dlgModel.insertByName('unnamed', control) return control def loopTheLoop(goalModel, wordCountModel, percentModel): global goal while keepGoing: try: goal = int(goalModel.Text) except: goal = 0 updateCount(wordCountModel, percentModel) time.sleep(waittime) if not socket: import threading class UpdaterThread(threading.Thread): def __init__(self, goalModel, wordCountModel, percentModel): threading.Thread.__init__(self) self.goalModel = goalModel self.wordCountModel = wordCountModel self.percentModel = percentModel def run(self): loopTheLoop(self.goalModel, self.wordCountModel, self.percentModel) def wordCount(arg = None): '''Displays a continuously updating word count.''' dialogModel = smgr.createInstanceWithContext('com.sun.star.awt.UnoControlDialogModel', ctx) dialogModel.PositionX = XSCRIPTCONTEXT.getDocument().CurrentController.Frame.ContainerWindow.PosSize.Width / 2.2 - 105 dialogModel.Width = 100 dialogModel.Height = 30 dialogModel.Title = 'Word Count' lblWc = addControl('com.sun.star.awt.UnoControlFixedTextModel', dialogModel, 6, 2, 25, 14, '', 'lblWc') lblWc.Align = 2 # Align right addControl('com.sun.star.awt.UnoControlFixedTextModel', dialogModel, 33, 2, 10, 14, ' / ') txtGoal = addControl('com.sun.star.awt.UnoControlEditModel', dialogModel, 45, 1, 25, 12, '', 'txtGoal') txtGoal.Text = goal #addControl('com.sun.star.awt.UnoControlFixedTextModel', dialogModel, 6, 25, 50, 14, '(percent)', 'lblPercent') ProgressBar = addControl('com.sun.star.awt.UnoControlProgressBarModel', dialogModel, 6, 15, 88, 10,'' , 'lblPercent') ProgressBar.ProgressValueMin = 0 ProgressBar.ProgressValueMax =100 #ProgressBar.Border = 2 #ProgressBar.BorderColor = 255 #ProgressBar.FillColor = 255 #ProgressBar.BackgroundColor = 255 addControl('com.sun.star.awt.UnoControlFixedTextModel', dialogModel, 124, 2, 12, 14, '', 'lblMinus') controlContainer = smgr.createInstanceWithContext('com.sun.star.awt.UnoControlDialog', ctx) controlContainer.setModel(dialogModel) controlContainer.addTopWindowListener(WindowClosingListener()) controlContainer.setVisible(True) goalModel = controlContainer.getControl('txtGoal').getModel() wordCountModel = controlContainer.getControl('lblWc').getModel() percentModel = controlContainer.getControl('lblPercent').getModel() ProgressBar.ProgressValue = percentModel.ProgressValue if socket: loopTheLoop(goalModel, wordCountModel, percentModel) else: uthread = UpdaterThread(goalModel, wordCountModel, percentModel) uthread.start() keepGoing = True if socket: wordCount() else: g_exportedScripts = wordCount, 

Link for more information

https://superuser.com/questions/529446/running-word-count-in-openoffice-writer

Hope this helps review volume

EDIT: Then I found this

http://forum.openoffice.org/en/forum/viewtopic.php?f=7&t=22555

0
source

wc can understand Unicode and uses the system iswspace function to determine if the Unicode character is a space. "The iswspace () function checks if wc is a wide-character code representing a class space character in the current program locale." Thus, wc -w should be able to correctly count words if your locale ( LC_CTYPE ) is configured correctly.

wc source code

Manual page for iswspace function

0
source

Just relying on what @Yawar wrote. Here are the more explicit steps for counting words using MS word from the console.

I also use the more accurate Range.ComputeStatistics(wdStatisticWords) value instead of the Words property. See here for more information: https://support.microsoft.com/en-za/help/291447/word-count-appears-inaccurate-when-you-use-the-vba-words-property

  1. Create a script called wc.vbs and paste the following into it:

     Set word = CreateObject("Word.Application") word.Visible = False Set doc = word.Documents.Open("<replace with absolute path to your .docx/.pdf>") docWordCount = doc.Range.ComputeStatistics(wdStatisticWords) word.Quit Dim StdOut : Set StdOut = CreateObject("Scripting.FileSystemObject").GetStandardStream(1) WScript.Echo docWordCount & " words" 
  2. Open PowerShell in the directory you saved wc.vbs and run cscript .\wc.vbs and you will return to the number of words :)

0
source

All Articles