How to search for text in a pdf quartz document

I use quartz to display pdf. I need to get the indexes of the pages where my search text exists. Can anybody help me? Thanks.

Solution: There is sample code that extracts text from a page and checks it for consistency.

#import <Foundation/Foundation.h> @interface PDFSearcher : NSObject { CGPDFOperatorTableRef table; NSMutableString *currentData; } @property (nonatomic, retain) NSMutableString * currentData; -(id)init; -(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString; @end #import "PDFSearcher.h" @implementation PDFSearcher @synthesize currentData; void arrayCallback(CGPDFScannerRef inScanner, void *userInfo) { PDFSearcher * searcher = (PDFSearcher *)userInfo; CGPDFArrayRef array; bool success = CGPDFScannerPopArray(inScanner, &array); for(size_t n = 0; n < CGPDFArrayGetCount(array); n += 2) { if(n >= CGPDFArrayGetCount(array)) continue; CGPDFStringRef string; success = CGPDFArrayGetString(array, n, &string); if(success) { NSString *data = (NSString *)CGPDFStringCopyTextString(string); [searcher.currentData appendFormat:@"%@", data]; [data release]; } } } void stringCallback(CGPDFScannerRef inScanner, void *userInfo) { PDFSearcher *searcher = (PDFSearcher *)userInfo; CGPDFStringRef string; bool success = CGPDFScannerPopString(inScanner, &string); if(success) { NSString *data = (NSString *)CGPDFStringCopyTextString(string); [searcher.currentData appendFormat:@"%@", data]; [data release]; } } -(id)init { if(self = [super init]) { table = CGPDFOperatorTableCreate(); CGPDFOperatorTableSetCallback(table, "TJ", arrayCallback); CGPDFOperatorTableSetCallback(table, "Tj", stringCallback); } return self; } -(BOOL)page:(CGPDFPageRef)inPage containsString:(NSString *)inSearchString { [self setCurrentData:[NSMutableString string]]; CGPDFContentStreamRef contentStream = CGPDFContentStreamCreateWithPage(inPage); CGPDFScannerRef scanner = CGPDFScannerCreate(contentStream, table, self); bool ret = CGPDFScannerScan(scanner); CGPDFScannerRelease(scanner); CGPDFContentStreamRelease(contentStream); //NSLog(@"%u, %@", [self.currentData length], self.currentData); return ([[self.currentData uppercaseString] rangeOfString:[inSearchString uppercaseString]].location != NSNotFound); } @end 
+3
source share
4 answers

Use CGPDFDocument, CGPDFPage and CGPDFScanner to scan and analyze page content in NSString. Then use the NSString function to find the text on this page. If it exists, store the corresponding pagenumber in some array. Repeat this scan and analyze the cycle for the number of pages in pdf

+2
source
+1
source

There is nothing to do inside quartz. Quartz is designed for graphic display - it does not need to know or care about finding a PDF for string matches. You will need to use Core Graphics syntax analysis methods to pull out the data, search for the string yourself, and then get the page on which it occurs.

0
source

If you use PDFDocument instead of CGPDFDocument , this API has text search operations like findString:withOptions

0
source

All Articles