I parse XML with simplexml_load_string() and use the data in it to update Active Directory (AD) objects via LDAP.
XML example (simplified):
<?xml version="1.0" encoding="UTF-8"?> <users> <user>Bìlbö Bággįnš</user> <user>Gãńdåłf Thê Gręât</user> <user>Śām Wīšë</user> </users>
First I ran ldap_search() to find one user, and then proceed to change their attributes. Pumping the above values directly into AD using LDAP will result in some pretty garbled characters.
For example: Bìlbö Bággįnš
I tried the following functions to no avail:
utf8_encode($str); utf8_decode($str); iconv("UTF-8", "ISO-8859-1//TRANSLIT", $str); iconv("UTF-8", "ASCII//TRANSLIT", $str); iconv("UTF-8", "T.61", $str);
Ideally, I do not want to do any of these string conversions. UTF-8 must be right, right ?!
I also noticed the following: I printed out the values to see how they exit. a hovering script in the CLI will show the correct characters, but web browsers show the same as AD.
What's happening? Should I look at something else, for example. URL coding? I hope this is up to a simple mistake at my end.
EDIT: I entered these characters using the AD admin GUI to see how they come out. I can read them through LDAP. The correct characters are displayed in the browser. hovering through the CLI will show question marks instead of foreign characters. Passing one of these return values to mb_detect_encoding() will return UTF-8.
I decided to immediately change the same object, not writing to a new line, but simply changing the existing value and saving the object. This works fine - I see the correct value (reverse) in AD.
- Development on Mac OS X 10.7 Lion - PHP 5.4.3
- Current job: Red Hat 6 - PHP 5.4.3
- AD Server: Windows 2003
UPDATE: After a few months, I could not find an answer / solution to this problem. In the end, I went with replacing the characters with their equivalent without an accent (NOT perfect, I know).