How to find where the browser breaks a paragraph of text

Question

How to find where the browser breaks a paragraph of text

I need to add line breaks in positions that the browser naturally adds a new line to the line of text.

For instance:

This is a very long text \ n that spans several lines in a paragraph.

This is the paragraph that the browser decided to break at \ n

I need to find this position and insert.

Does anyone know of any JS libraries or functions that can do this?

The only solution I have found so far is to remove the tokens from the paragraph and observe the clientHeight property to detect a change in element height. I don’t have time to finish this and would like to find something that has already been tested.

Edit: The reason I need to do this is because I need to accurately convert HTML to PDF. Acrobat makes text narrower than a browser. This leads to the fact that the text is torn in different positions. I need the same ragged edge and the same number of lines in the converted PDF.

Edit:

@dtsazza: Thanks for your reply. It is impossible to create a layout editor that almost accurately reflects HTML, I wrote 99% of them;)

The application I'm working on allows the user to create a product catalog by dragging and dropping tiles. Tiles are fixed widths, absolutely positioned divs containing images and text. All elements are designed so that the font size is fixed. My solution for finding \ n in a paragraph is in the order of 80% of the time, and when it works with this paragraph, the resulting PDF is so close to the on-screen version that the differences don't matter. Items have the same height (pixel), images are replaced with high-resolution versions, and all bitmaps are replaced on the server side generated by SVG.

The only slight difference between my HTML and PDF is that Acrobat makes the text a little narrower, which results in a line slightly shorter than the length of the line.

The diode solution for adding spans and finding their coordinates is very good and should give me the location of the BR. Remember that the user will never see HTML with BR inserted - they are added so that the PDF conversion creates a paragraph that is exactly the same size.

There are many people who seem to think that this is impossible. I already have a working application that created an extremely accurate HTML-> PDF conversion of our documents. I just need a better solution to add BR, because my solution sometimes skips BR. BTW, when it works, my paragraphs are the same height as the HTML equivalents that we produce.

If anyone is interested in the doc conversion type, I can check this screen:

http://www.localsa.com.au/brochure/brochure.html

Edit: Thank you very much Diodeus - your offer was in place.

Solution: for my situation it was more appropriate to wrap the words in spaces instead of spaces.

var text = paragraphElement.innerHTML.replace (// g, ' ');

text = "" + text + ""; // wrap the first and last words.

It wraps every word in the gap. Now I can query the document to get all the words, iterate over and compare the position of y. When y pos changes adds br.

This works flawlessly and gives me the results I need - Thank you!

+4

javascript html

Eli_s Jan 15 '09 at 14:36

source share

6 answers

I don’t think it will be a very clean solution for this, if any. The browser will skip the paragraph to fit the available space, if necessary, if necessary. Note that if the user resizes the browser window, all paragraphs will be reprinted and will almost certainly change their gap positions. If the user resizes the text on the page, paragraphs will be resubmitted with different line breaks. If you (or some scripts on your page) change the size of another element on the page, this will change the amount of free space for the floating paragraph and again, different break points.

Also, changing the actual layout of your page to mimic what the browser is doing for you (and very well) seems like the wrong approach to what you are doing. What problem are you trying to solve here? Probably the best way to achieve this.

Edit : OK, so you want to render the PDF just like a “screen version”. Do you have a certain final version of the screen, nominated - in terms of browser window sizes, user style sheets, font preferences and adjusted font size? The critical thing in HTML is that it does not intentionally indicate a specific layout . It simply describes what is on the page, what it is and where they are relative to each other.

I have seen several erroneous attempts before creating some HTML that will accurately replicate a print ad developed in something like a DTP application that needs an absolute absolute layout. These efforts were doomed to failure due to the nature of the HTML, and doing it the other way around (as you try) will be even worse because you don’t even have a final starting point for the job.

Assuming this is all out of your hands, and you still have to do it, my suggestion would be to abandon the idea of manipulating HTML. Look at the PDF conversion software - if it is good, it should give you some options for kerning fonts and similar settings. Playing with the details here should give you something that approximates the rendering of the fonts in the browser and thus breaks the lines in the same places.

Otherwise, all I can offer is to take screenshots of the browser and analyze them using OCR to work where the lines break (this does not require a very accurate OCR, since you know what is raw text anyway , just need to count the spaces). Or maybe just paste the screenshot into the PDF if text search / selection is not a big deal.

Finally, doing it manually is probably the only way to do this job completely and reliably.

But in reality this is still wrong, and any attempt to revise the requirements would be better. Keep going up one step in the chain - why should PDF have the same ragged edge as some arbitrary browsers? Can you achieve this goal in a different (better) way?

+3

Andrzej doyle Jan 15 '09 at 2:43

source share

It sounds good when you consider the user's font size, MS Windows accessibility mode, and hundreds of different mobile devices. Let the browser do this - trying to get precise control over the rendering will only cause hours of frustration.

0

Mike robinson Jan 15 '09 at 14:45

source share

I do not think that you can do this with any accuracy without inserting Gecko / WebKit / Trident or, in fact, recreating them.

0

annakata Jan 15 '09 at 2:51

source share

Maybe an alternative: do all line breaks yourself, instead of relying on the browser. Put all the text in the preliminary tags and add your own lines. Now, at least you don’t have to figure out where the browser has placed them.

0

Andrej Jan 15 '09 at 15:42

source share

This is not possible because it contradicts the fundamental differences between HTML and PDF.

HTML is displayed based on settings on the reader side, i.e. preferred font size, screen resolution, browser window size / geometry, etc. - these parameters are never known to the author and do not change from reader to reader, and this is because not everyone has the same technical utilities. PDF is created based on the parameters that the author prescribes and looks the same in every reader; this is especially useful if you want to print something on a given paper size. The goal of both methods is completely different media, and everything that looks good in one does not necessarily look good in the other.

The only thing you can do is use similar styles for your web page and PDF.

-1

Svante Jan 15 '09 at 15:05

source share

Diodeus - James MacFarlane · Accepted Answer · 2009-01-15T14:40:49+0000

I would suggest to wrap all spaces in the span tag and find the coordinates of each tag. When the value of Y changes, you are on a new line.

How to find where the browser breaks a paragraph of text

More articles: