Algorithmic guidelines for breaking an array of strings in the most balanced way

I have an array of strings of arbitrary length (say 30-45) that I want to reformat to fit on a certain number of pages (e.g. 15).

I would like to distribute the lines between the pages as evenly as possible, so that all pages are as close as possible to each other with the full length of characters, regardless of the total number of lines on the page. I also need to keep the string order (so I cannot change the order of the array).

Are there any specific algorithms that you recommend to solve this problem? Or vague approaches that you take? Thanks!

+4
source share
4 answers

One way is to format the text using http://en.wikipedia.org/wiki/TeX - the line break algorithm is optimal and is based on dynamic programming. Unfortunately, its pagination algorithm is not optimal, although I expect it to be as good as you easily find.

If you can model each page as having a place for a fixed number of characters, then there really is a solution for dynamic programming. You need to find a way to put 14 page breaks in an optimal place. Work from left to right and in every possible place for page breaks work out a general uneven punishment for the best way to insert page breaks k-1 in the previous text, ending with a possible place for the kth page break, Do it with k = 1..14. You can use the information on the left that you previously calculated when developing a general fine in a new place.

When you get to the end of the text, you can use the calculations so far to work out a penalty for unevenness, in the best way to insert 14 page breaks to the left. If you saved the calculation records on the left, you can also decide where the right-most of the 14 page breaks should lie. You can go back and work where there should be a gap of the 13th page, and so on, until you find the place of all the page breaks.

+2
source

I would come up, these are two stages, first build an approximate solution, and then improve this solution.

First, scroll through the list of lines and assign each line in turn to the page with the largest remaining space. You might want to check if there is enough space to redistribute the last lines of a page to earlier pages, so reduce the number of pages needed.

Secondly, select the pages with the largest and least free space. Change the shorter line with one with the longer line from the other so that the space remaining on the two pages is closer. Repeat (make sure you don't end up in an endless loop) until you have enough reasonable balance on all pages.

This is an approximate solution, not an exact one, but it should be able to get a reasonable result pretty quickly.

+1
source

Isn't it as simple as dividing common characters into common pages and adding sentences until you get close to the target characters on the page? In the end, you will get an offer that will cover the pages if it can break in the middle. If most of this sentence matches the current page, place it and then set it aside for the next page.

chars_left = 0 chars_per_page = total_chars / total_pages for i = 0 .. total_pages chars_left += chars_per_page while (chars_left > 0) s = get_next_sentence if s.length/2 > chars_left then break page.add( s) chars_left -= s.length endwhile endfor 
0
source

You can use the algorithm to generate Euclidean rhythms . Euclidean rhythms are rhythms that are distributed as evenly as possible over several beats. So, if you have four strokes that you want to spread out into 10 positions, you will receive:

 ..xx.xx 

Now, if you have 10 lines and you want to split them into four pages, you would simply add a page break after each line marked with an x:

 string1 string2 string3 string4 string5 string6 string7 string8 string9 string10 

This way you achieve an almost constant number of lines per page, and shorter pages are also evenly distributed among all pages.

The algorithm is quite simple and based on the Euclidean algorithm for calculating gcd and can be implemented in several lines. Also, it should be quite fast even with a large number of pages and elements.

0
source

All Articles