Extract OLE object (pdf) from Access DB

Question

Extract OLE object (pdf) from Access DB

We are updating / converting several old Access databases into MS-SQL. Many of these databases have OLE Object fields that store PDF files. I am looking for a way to extract these files and save them in our SQL database. I saw similar questions that answer how you can do this with image files (jpg, bmp, gif, etc.), but I have not found a way that works with PDF.

+5

c # ms-access oledb

Nate Jun 22 '09 at 20:54

source share

2 answers

OLEtoDisk

"This version saves the entire contents of the table containing OLE objects to disk. It does NOT require the source application, which served as the OLE server to insert the object, to support all MS Office documents, PDF, all images inserted by MS Photo Editor, MS Paint and Paint Shop Pro. Also supports extraction of the PACKAGE class, including the original file name. Contains a function to create a complete inventory of the OLE field, including the LINKED path and file names. It uses the structured storage API to read the actual contents of the field "

http://lebans.com/oletodisk.htm

+1

Tony toews Jun 23 '09 at 1:10

source share

Nate · Accepted Answer · 2009-06-23T18:34:21+0000

I finally got a code that works on what I want. The trick determines which part is the OLE header and removes it. Here's what works for me (based on the code found here )

public static byte[] StripOleHeader(byte[] fileData) { const string START_BLOCK = "%PDF-1.3"; int startPos = -1; Encoding u8 = Encoding.UTF7; string strEncoding = u8.GetString(fileData); if (strEncoding.IndexOf(START_BLOCK) != -1) { startPos = strEncoding.IndexOf(START_BLOCK); } if (startPos == -1) { throw new Exception("Could not find PDF Header"); } byte[] retByte = new byte[fileData.LongLength - startPos]; Array.Copy(fileData, startPos, retByte, 0, fileData.LongLength - startPos); return retByte; }

Please note that this only works for PDF files.

Extract OLE object (pdf) from Access DB

More articles: