This is my first openCV program, so goodbye if I don't know some basic computer vision concepts.
UPDATE: See new code / new problem below thanks to sturkmen's answer
I am working on the "digitization" of a large set of images, such as attached, as a project. All images come from one source. The ultimate goal is to pass the selected text fragments to tesseract, the OCR library.
(source code below) I'm going to explain my current approach, and then pose my questions.
My current approach is as follows:
Apply Inverse Binary Threshold
Separate image and find outlines 
Create a boundingRect from each path, then filter the minimum and maximum sizes
This is working fine.

My desired end result is to have one boundingRect around each column. Thus, for the images provided, which would be a family of them.
So, the problem is that the tabular "mini-sections" in the image are not reliably matched (the best example is the one in the rightmost column, which does not have a boundingRect around it).
I can think of two possible solutions (so as not to be an open type / opinion issue), but if you know of a more efficient solution, share it!
1) combine boundingRect , which are vertical neighbors to capture columns. Contains possible bounce along the edges.
2) Find another way to manipulate the image before searching for outlines. From my research, does the length-length smoothing algorithm look promising?
So my question is: which approach is best? Am I missing the best solution? I'm inexperienced in this section, so the offers are too small.
Thanks for reading!
#include "opencv2/core.hpp" #include "opencv2/highgui.hpp" #include "opencv2/imgproc.hpp" #include <iostream> #include <fstream> #include <sstream> #include <vector> using namespace cv; using namespace std; int main(int argc, char* argv[]) { Mat image = imread(path_to_file); Mat gray; cvtColor(image, gray, COLOR_BGR2GRAY); Mat fin; double thresh = threshold(gray, fin, 160, 255, THRESH_BINARY_INV); //size impacts dilation Mat kernel = getStructuringElement(MORPH_CROSS, Size(2, 4)); Mat dilated; dilate(fin, dilated, kernel, Point(-1,-1), 6); imwrite("testbw.png",dilated); Mat hierarchy; vector<vector<Point> >contours; findContours(dilated, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_NONE); //potentially sort by x for (const auto& c : contours) { // xy //columns 850 x 5400 Rect r = boundingRect(c); if (r.height > 3000 || r.width > 875) continue; if (r.height < 100 || r.width < 500) continue; rectangle(image, r, Scalar(255, 0, 255), 2); //made thicker } imwrite("test.png", image); waitKey(0); return 0;
}
Original Image: 
Updated Code
int main(int argc, char* argv[]) { Mat image = imread(path_to_file); Mat gray; cvtColor(image, gray, COLOR_BGR2GRAY); Mat fin; double thresh = threshold(gray, fin, 160, 255, THRESH_BINARY_INV); Mat kernel = getStructuringElement(MORPH_CROSS, Size(2, 4)); Mat dilated; dilate(fin, dilated, kernel, Point(-1,-1), 6); vector<Vec4i> hierarchy; vector<vector<Point> >contours; findContours(dilated, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_NONE); vector<Rect> rects; Rect big_rect = Rect(image.cols/2,image.rows/2,1,1); for (const auto& c : contours) {
New result: 
New problem: there is a lot of boundingRect around each column (you probably can’t tell by looking at the picture). This is a problem because I want to sub-image each column, for example. Mat ROI = image(rects[i]) , which will display much more than the desired 7 images.
New question: how can I combine many rectangles per column into one? I saw openCV groupRectangles , but it did not work.