PHP output showing small black diamonds with a question mark

I am writing a php program that extracts from a database source. Some of the varchars have quotes that appear as black diamonds with a question mark in them (, REPLACEMENT CHARACTER , I guess from Microsoft Word text).

How can I use php to remove these characters?

+69
php encoding character-encoding
Nov 09 '08 at 0:21
source share
21 answers

If you see this character (U + FFFD "CHARACTER CHANGE"), this usually means that the text itself is encoded as a single byte encoding, but is interpreted in one of the Unicode encodings (UTF8 or UTF16).

If it were the other way around, it would (usually) look something like this: ä.

Perhaps the source encoding is ISO-8859-1, also known as Latin-1. You can check this without changing your script: Browsers give you the opportunity to re-interpret the page in a different encoding - in Firefox uses "View" → "Character Encoding".

For the browser to use the correct encoding, add the HTTP header as follows:

header("Content-Type: text/html; charset=ISO-8859-1"); 

or put the encoding in the meta tag:

 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 

Alternatively, you can try reading from the database in a different encoding (preferably UTF-8) or convert the text using iconv() .

+67
Nov 09 '08 at 0:51
source share

This is an encoding problem. Thus, it may be mistaken at different levels, but most likely the rows in your database are encoded by utf-8, and you represent them as iso-8859-1. Or vice versa.

The correct way to fix this problem is to get character sets. The simplest strategy since you are using PHP is to use iso-8859-1 throughout the application. To do this, you must ensure that:

  • All PHP source files are saved as iso-8859-1 (Not to be confused with cp-1252).
  • Your web server is configured to serve files with charset=iso-8859-1
  • Alternatively, you can override web server settings from a PHP document using header .
  • Alternatively, you can insert a meta tag in HTML that points to the same thing, but this is not strictly necessary.
  • You can also specify the accept-charset attribute in your <form> elements.
  • Database tables are encoded as latin1
  • Database connection between PHP and database is set to latin1

If you already have data in your database, you should know that it is probably already confused. If you are not already at the production stage, just wipe it all and start. Otherwise, you will have to do some data wiping.

A note on meta tags, since everyone misunderstands what it is:

When the web server serves the file (HTML document), it sends some information that is not displayed directly in the browser. This is called HTTP headers. One of these headers is the Content-Type header, which defines the file type (eg. text/html ), as well as the encoding (aka charset). Although most web servers send a Content-Type header with charset information, it is optional. If not, the browser will interpret any meta tags using http-equiv="Content-Type" . It is important to understand that the meta tag is only interpreted if the web server does not send the header. In practice, this means that it is used only if the page is saved to disk and then opened from there.

This page has a very good explanation of these things.

+41
Nov 09 '08 at 0:52
source share

I also ran into this problem. Meanwhile, I came across three cases when this happened:

  1. szb ()

    I used substr() for a UTF8 string that cut out UTF8 characters, so the cut characters could not be displayed correctly. Use mb_substr($utfstring, 0, 10, 'utf-8'); instead. loans

  2. htmlspecialchars ()

    Another problem was using htmlspecialchars() on a UTF8 line. The fix is ​​to use: htmlspecialchars($utfstring, ENT_QUOTES, 'UTF-8');

  3. preg_replace ()

    Finally, I found that preg_replace() can lead to problems with UTF. Code $string = preg_replace('/[^A-Za-z0-9ÄäÜüÖöß]/', ' ', $string); for example, converted the UTF string "F (×) = 2 × -3" to "F 2". The fix is ​​to use mb_ereg_replace() instead.

I hope this additional information helps to get rid of such problems.

+26
Feb 28 '13 at 14:35
source share

As mentioned in earlier answers, this is because your text was written to the database in iso-8859-1 or in any other format.

So you just need to convert the data to utf8 before exiting it.

 $text = "string from database"; $text = utf8_encode($text); echo $text; 
+9
Aug 16 '15 at 16:28
source share

To make your MYSQL connection set to UTF-8 (or latin1, depending on what you are using), you can do this to:

 $con = mysql_connect("localhost","username","password"); mysql_set_charset('utf8',$con); 

or use this to check which encoding you are using:

 $con = mysql_connect("localhost","username","password"); $charset = mysql_client_encoding($con); echo "The current character set is: $charset\n"; 

Additional information here: http://php.net/manual/en/function.mysql-set-charset.php

+8
Apr 05 '12 at 6:28
source share

Based on your description of the problem, the data in your database is almost certainly encoded as Windows-1252 , and your page is almost certainly as ISO-8859-1 . These two character sets are equivalent, except that Windows-1252 has 16 additional characters that are not found in ISO-8859-1, including left and right curly quotes.

Assuming my analysis is correct, the easiest solution is to serve your page as Windows-1252. This will work because all characters that are in ISO-8859-1 are also in Windows-1252. In PHP, you can change the encoding as follows:

 header('Content-Type: text/html; charset=Windows-1252'); 

However, you really need to check what character encoding you use in your HTML files and the contents of your database, and try to be consistent or correctly convert where it is not possible.

+6
Nov 09 '08 at 1:19
source share

I decided to remove these characters from the string by doing this -

 ini_set('mbstring.substitute_character', "none"); $text= mb_convert_encoding($text, 'UTF-8', 'UTF-8'); 
+3
Jul 29 '15 at 2:41
source share

Try it please

mb_substr ($ description, 0, 490, "UTF-8");

+3
06 Oct '16 at 7:58
source share

Add this function to your variables utf8_encode ($ your variable);

+3
Jan 17 '17 at 11:16
source share

This may be caused by a mismatch of unicode or another character set. Try changing the encoding in your browser, in the settings the text will look normal. The question then becomes how to convert the contents of your database into the encoding you use for display. (It could just be adding the charset utf-8 statement to your output.)

+1
Nov 09 '08 at 0:26
source share

what I ended up doing at the end after I fixed my tables was to back up and change the settings to utf-8, after which I changed the dump file so that DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci are my character set entries

now I have no more problems with character sets because the database and browser are utf8.

I understood what caused this. This is the effect of the webpage + browser on the database. On terminals that are linux (ubuntu + firefox), it encodes the database in latin1, which sets the tabs. But on the windows of the 10+ terminal terminals, the entries were encoded in utf8. I also noticed that windows 10 had problems with latin1, so I decided to bend the wind and convert everything to utf8.

I realized that this is a problem with Windows 10, because we started using win 10. terminals, so again, Microsoft errors cause problems. I still don’t know why the encoding on forms changes, because the browser on Windows 10 shows the latin1 character set, but when it goes in utf8 encoding, and I get anomalous data. but in linux + firefox it does not.

+1
Sep 07 '16 at 15:30
source share

Just add these lines before the headings.

The exact format of the .doc/docx files will be obtained:

  if(ini_get('zlib.output_compression')) ini_set('zlib.output_compression', 'Off'); ob_clean(); 
+1
Mar 15 '17 at 5:13
source share

This will help you. Put this <head>

 <meta charset="iso-8859-1"> 
+1
Oct 08 '17 at 17:21
source share

You can also change the character set in your browser. For debugging reasons only.

0
Nov 09 '08 at 11:05
source share

Using the same encoding (as suggested here) both in the database and in the HTML does not work for me ... Therefore, remembering that the code is generated as HTML, I decided to use &quot; (HTML) or &#34; (ISO code Latin-1) in the text of my database where quotes were used. This resolved the issue by giving me a quotation mark. It is strange to note that before this decision, only some of the quotation marks and apostrophes were displayed incorrectly, while others did, however, the special code really worked in all cases.

0
Jun 22 '14 at 15:12
source share

I ran the "detect encoding" code after changing the collation in phpmyadmin, and now it appears as Latin_1.

but here is what occurred to me when considering another data anomaly in my application and how I fixed it:

I just imported a mixed-coding table (with diamond question marks on some rows, and they were all in the same column.), Here is my fix code. I used the utf8_decode process, which occupies an undefined placeholder and assigns a simple question mark instead of a “diamond question mark”, after which I used str_replace to replace the question mark with a space between quotation marks. here [code]

  include 'dbconnectfile.php'; //// the variable $db comes from my db connect file /// inx is my auto increment column /// broke_column is the column I need to fix $qwy = "select inx,broke_column from Table "; $res = $db->query($qwy); while ($data = $res->fetch_row()) { for ($m=0; $m<$res->field_count; $m++) { if ($m==0){ $id=0; $id=$data[$m]; echo $id; }else if ($m==1){ $fix=0; $fix=$data[$m]; $fix = utf8_decode($fix); $fixx =str_replace("?"," ",$fix); echo $fixx; ////I echoed the data to the screen because I like to see something as I execute it :) } } $insert= "UPDATE Table SET broke_column='".$fixx."' where inx='".$id."'"; $insresult= $db->query($insert); echo"<br>"; } ?> 
0
Sep 05 '16 at 22:26
source share

This happened in my case:

 $text = utf8_decode($text) 

I turn the black diamond symbol into a question mark so you can:

 $text = str_replace('?', '', utf8_decode($text)); 
0
Jan 03 '17 at 20:03
source share

For global purposes.

Instead of converting, encoding, decoding every text, I prefer them to be what they are and instead change the php server settings. In this way,

  • Let the diamonds
  • In the browser, in the browse menu, select “text encoding” and find one that allows you to see the text correctly.
  • Modify your php.ini and add:

    default_charset = "ISO-8859-1"

or instead of ISO-8859, which matches your text encoding.

0
Mar 24 '17 at 13:42 on
source share

When retrieving data from anywhere, you must use functions with the md_FUNC_NAME prefix.

If the same problem helped me.

Or you can find the code for this character and use regexp to remove these characters.

0
Jun 07 '17 at 10:25
source share

Just paste this code at the top of the page.

 <?php header("Content-Type: text/html; charset=ISO-8859-1"); ?> 
0
May 6 '19 at
source share

Go to your phpmyadmin and select your database and just increase the length / value of this table field to 500 or 1000, this will solve your problem.

-2
Aug 26 '17 at 16:31 on
source share



All Articles