I am using Apache POI to manipulate Microsoft Word.docx files - i.e. open a document that was originally created in Microsoft Word, change it, save it in a new document.
I notice that the new paragraphs created by Apache POI do not have a recovery identifier, often referred to as RSID or rsidR. This is used by Word to identify changes made to a document in one session, for example between saving. This is not necessary - users can disable it in Microsoft Word if they want, but in fact almost everyone has it, so almost every document is filled with RSID. Read this excellent RSID explanation for more information.
In a Microsoft Word document, word/document.xml contains the following paragraphs:
<w:pw:rsidR="007809A1" w:rsidRDefault="007809A1" w:rsidP="00191825"> <w:r> <w:t>Paragraph of text here.</w:t> </w:r> </w:p>
However, the same paragraph created by the POI will look like this: word/document.xml :
<w:p> <w:r> <w:t>Paragraph of text here.</w:t> </w:r> </w:p>
I realized that I can make the POI add an RSID to each paragraph using this code:
byte[] rsid = ???; XWPFParagraph paragraph = document.createParagraph(); paragraph.getCTP().setRsidR(rsid); paragraph.getCTP().setRsidRDefault(rsid);
However, I do not know how I should generate the RSID.
Does the POI have a way to either generate and / or track the RSID? If not, is there any way to guarantee that the RSID that I create does not conflict with what is already in the document?
java docx apache-poi
gutch
source share