I am reading your discussion on this , and a more robust implementation might be fine. Especially considering diacritics support. Using one regex to fix all your problems may seem tempting, but the more complex it is, the harder it will support or expand it. Quote Jamie Zawinski
Some people, faced with a problem, think: "I know, I will use regular expressions." Now they have two problems.
Since I have problems with iconv on my local machine, I used a simpler implementation instead, feel free to use something more complex or reliable if your situation requires it.
I use a simple regular expression in this solution to get only a set of alphanumeric characters (also known as a "word"), the part in the regular expression that reads \p{L}\p{M} ensures that we also get all multibyte characters .
You can see this code working on IDEone .
<?php function stripAccents($p_sSubject) { $sSubject = (string) $p_sSubject; $sSubject = str_replace('æ', 'ae', $sSubject); $sSubject = str_replace('Æ', 'AE', $sSubject); $sSubject = strtr( utf8_decode($sSubject) , utf8_decode('àáâãäåçèéêëìíîïñòóôõöøùúûüýÿÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝ') , 'aaaaaaceeeeiiiinoooooouuuuyyAAAAAACEEEEIIIINOOOOOOUUUUY' ); return $sSubject; } function emphasiseWord($p_sSubject, $p_sSearchTerm){ $aSubjects = preg_split('#([^a-z0-9\p{L}\p{M}]+)#iu', $p_sSubject, null, PREG_SPLIT_DELIM_CAPTURE); foreach($aSubjects as $t_iKey => $t_sSubject){ $sSubject = stripAccents($t_sSubject); if(stripos($sSubject, $p_sSearchTerm) !== false || mb_stripos($t_sSubject, $p_sSearchTerm) !== false){ $aSubjects[$t_iKey] = '<strong>' . $t_sSubject . '</strong>'; } } $sSubject = implode('', $aSubjects); return $sSubject; } /////////////////////////////// Test \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ $aTest = array( 'goo' => 'I love Google to make my searches, but I`m starting to worry about privacy.' , 'peo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo' , 'péo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo' , 'gen' => '"gente", "inteligente", "VAGENS", and "Gente" ...vocês da física que passam o dia protegendo...' , 'voce' => '...vocês da física que passam o dia protegendo...' , 'o' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway' , 'ø' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway' , 'ae' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway' , 'Æ' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway' ); $sContent = '<dl>'; foreach($aTest as $t_sSearchTerm => $t_sSubject){ $sContent .= '<dt>' . $t_sSearchTerm . '</dt><dd>' . emphasiseWord($t_sSubject, $t_sSearchTerm) .'</dd>'; } $sContent .= '</dl>'; echo $sContent; ?>