Php tidy weird behavior

I use php tidy library to "clean and restore" some html coming from user input.

Everything works fine, but I am facing a problem that I can’t understand what the reason is. My code is as follows:

$tidy = new tidy(); $tidy_options = array( 'hide-comments' => true,'tidy-mark' => false, 'indent' => false, 'new-blocklevel-tags' => 'article,footer,header,hgroup,output,progress,section,video', 'new-inline-tags' => 'audio,details,time,ruby,rt,rp', 'drop-empty-paras' => false, 'doctype' => '<!DOCTYPE HTML>', 'sort-attributes' => 'none', 'vertical-space' => false, 'output-xhtml' => true,'wrap' => 180, 'wrap-attributes' => false, 'break-before-br' => false, 'show-body-only' => true ); $data = $tidy->repairString($data, $tidy_options, 'UTF8'); echo $data; 

This works for all kinds of input, except when I try to use html to embed swf files.
So, I am trying this code:

 <object data="http://the_swf_file_url" type="application/x-shockwave-flash" width="853" height="520"> <param name="movie" value="http://the_swf_file_url"> </object> 

but repairString removes everything from it and returns an empty string.
The strangest thing is that:
-If , I enter some text along with the above, so the input is similar to Hello world<object...>...</object> , then it works fine.
-Or, if I specify 'show-body-only' => false , it also works great!

Any clue Why is this happening? Thanks in advance.

Edit: I tried the pankar suggestion with setting save-objects to true, but no luck ...

+4
source share
2 answers

The problem is that you are trying to process an HTML fragment .

When you do this, the rest of the document is displayed . If you leave the default configuration and display a neat document with only a piece of text, you will see the DOCTYPE , html , head and body tags that you did not give. He suggested that these tags should exist.

The problem here is that the HTML specification regarding objects states that:

An OBJECT element can also be displayed in the contents of a HEAD element.

When the location of your fragment is displayed, it places it first so that it can happen. That means neatness will put it in the head tag.

The reason show-body-only affects your output is because your fragment does not fit in body .


However , when you add some text, it forces your snippet in the body tag. This is because the head tag is not allowed . So the inference of your fragment is in body .

In my opinion, the best option available to you is to insert all the code snippets into the template document and then analyze them again. You can do this quite easily with DOMDocument .

The second solution would be to enter a sentinel value that you can cross out again after displaying only the body.

those.

 ____MY_MAGIC_TOKEN____ <object ...></object> 

Then you can delete it again.

+6
source

Try specifying the preserve-entities configuration to true ( false by default).

EDIT

Seconds (more thorough) thoughts. This is the expected behavior. By setting show-body-only to true , you tell tidy to display the main body of the processed xhtml document.

This parameter actually ignores everything in the <head> document. The <object> component is a child of the <head> . You can verify this by simply specifying

$data = "<title>My Site</title>" .

The output will again be empty.

Your attempt to put the prefix text in the <object> simply tidies the order, because it believes that this data should be processed as part of the body of the page and, therefore, displayed.

Hope this helps more this time.

+3
source

All Articles