Ignore HTML characters when searching for HTML / PHP content?

We store blocks of HTML content in MySQL (this is formatted text created in CKeditor. Ckeidor adds inline CSS styles to format text). We store the database in the column "page_content"

We want the search function to allow users to search for text (only). However, we do not want the search to return HTML characters, which is happening now. For example, if we are looking for a "font", we do not want the search results to return pages with HTML <font> styles

Is there a way to ignore HTML / CSS characters when looking for HTML content from MySQL?

+7
source share
2 answers

Have you considered setting up a separate table for these searches? MySQL full-text searches only work with MyISAM tables, so you probably don't want to mix full-text searches with important data (unless, of course, you have a strange aversion to foreign keys and referential integrity).

The approach I used in the past is basically this:

  • Set up a separate table with a simple structure (id, search_text).
    • id matches the id of what you are looking for.
    • search_text is everything (the main text, title, author’s name, etc.) that you want to find in one block of text.
  • Add full text indexing to the lookup table.
  • Refresh the database update process to build the corresponding search_text string as plain text; this is where you could cross out HTML and possibly apply some other mappings (for example, extend things like "A +" to find a full text search).
  • When you search, you apply the same mappings that apply to the data available for search, and then go to the lookup table for matches.

This solves your HTML problem, allows you to easily search more than the HTML content, and allows you to customize your search results by weighing the various components of the search text with repetition (for example, if you want the tags to be more important than the body of the text, just add tags two or three times when creating search_text ).

You need to process the text to remove or ignore the HTML. This approach allows you to do this only once, rather than doing it with every search.

+4
source

I assume you want to search the database? (in this case, removing HTML tags would mean you have to store the content twice).

Try exploring MYSQL’s full-text search in natural language mode.

http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html

+2
source

All Articles