The process is quite complicated and not entirely good. You need to look at the Title class found in includes/Title.php . You should start with the newFromText method, but the bulk of the logic is in the secureAndSplit method.
Please note that (as with MediaWiki) the code is not decoupled in the slightest way. If you want to replicate it, you need to extract the logic, not just reuse the class.
The logic looks something like this:
- Decode character references (e.g. & eacute;)
- Convert spaces to underscores
- Check if the title is a link to a namespace or interwiki
- Delete hash fragments (e.g.
Apple#Name - Delete prohibited characters
- Links to the Forbid subdirectory (e.g.
../directory/page ) - Disable triple tilde sequences (
~~~ ) (for some reason) - Limit size to 255 bytes
- capital letter
In addition, I believe that I am right in saying that quotation marks should not be encoded by the original user - browsers can handle them transparently.
I hope this helps!
lonesomeday
source share