Since some phrases that you want to associate contain other phrases that you want to associate, and the links themselves also contain these phrases, you need to do this in two steps if you want to avoid complex regular expressions:
Phase 1: Replace each phrase with a unique identifier for a phrase that will not match anything else:
- You will need to replace longer phrases with shorter phases to make sure that you are not replacing only part of the phrase (for example, “London” London football events “).
- You can store phrases and URLs that need to be linked in a SortedDictionary, and provide an
IComparer<string>
that sorts the strings by length and then in alphabetical order. Please note that it is important that strings of the same length are still compared as different, or you cannot store them in a dictionary. - As each phrase is replaced, you must generate a link that will replace it and build dictionary matching identifiers with links.
- If you use
string.Replace
to replace the phrases that you will need to handle phrases that differ only as different phrases, i.e. "Party sites in London" are different from "Party sites in London" and each must have a separate identifier.
Phase 2: Replace all placeholder IDs with the generated links.
Here is the class for this:
class TextLinker : IComparer<string> { private SortedDictionary<string, string> phrasesToUrls; public TextLinker() { // Pass self as IComparer to sort dictionary using Compare method. phrasesToUrls = new SortedDictionary<string, string>(this); } public void AddLink(string phrase, string URL) { phrasesToUrls.Add(phrase, URL); } public string Link(string text) { // phase 1: replace phrases to be linked with unique placeholders Dictionary<string, string> placeholdersToLinks = new Dictionary<string, string>(); foreach (KeyValuePair<string, string> pair in phrasesToUrls) { // Replace phrases with placeholders. string placeholder = Guid.NewGuid().ToString(); text = text.Replace(pair.Key, placeholder); // Create dictionary of links by placeholder string link = string.Format( "<a href=\"{0}\">{1}</a>", pair.Value, pair.Key); placeholdersToLinks.Add(placeholder, link); } // Phase 2: replace unique placeholders with links. foreach (KeyValuePair<string, string> pair in placeholdersToLinks) { text = text.Replace(pair.Key, pair.Value); } return text; } public int Compare(string x, string y) { if (x.Length > y.Length) return -1; if (x.Length < y.Length) return +1; // Equal length strings still need to be differentiated, otherwise // they will be treated as the same key by the dictionary. return x.CompareTo(y); } }
And here is an example of its use:
string input = "London is a great city and have football events " + "in London but party sites in London are also good. London " + "football events are great along with London party sites. " + "Enjoy London!"; TextLinker linker = new TextLinker(); linker.AddLink( "Football events in London", "http://www.mysite/footbal-events/london"); linker.AddLink( "football events in London", "http://www.mysite/footbal-events/london"); linker.AddLink( "London football events", "http://www.mysite/footbal-events/london"); linker.AddLink( "London", "http://www.mysite/london-events/london"); linker.AddLink( "Party sites in London", "http://www.mysite/party-sites/london"); linker.AddLink( "party sites in London", "http://www.mysite/party-sites/london"); linker.AddLink( "London party sites", "http://www.mysite/party-sites/london"); string output = linker.Link(input);
You can also overload the AddLink
method to automatically generate alternative capitalization phrases.