I am trying to write a sed script that will grab the entire bare URL in a text file and replace them with <a href=[URL]>[URL]</a> . By "naked" I mean a URL that is not enclosed in an anchor tag.
My initial thought was that I should match a URL that does not have "or a> in front of them, and also after that does not have <or a". However, I am having difficulty expressing the notion of “not having in front of or behind my back,” because, as far as I know, sed does not have a look forward or a look.
Input Example:
[Beginning of File]http://foo.bar arbitrary text http://test.com other text <a href="http://foobar.com">http://foobar.com</a> Nearing end of file!!! http://yahoo.com[End of File]
An example of the desired result:
[Beginning of File]<a href="http://foo.bar">http://foo.bar</a> arbitrary text <a href="http://test.com">http://test.com</a> other text <a href="http://foo.bar">http://foo.bar</a> Nearing end of file!!! <a href="http://yahoo.com">http://yahoo.com</a>[End of File]
Note that the third line is <a href> because it is already inside <a href> . On the other hand, both the first and second lines change. Finally, note that text without a URL is not modified.
Ultimately, I'm trying to do something like:
sed s/[^>"](http:\/\/[^\s]\+)/<a href="\1">\1<\/a>/g 2-7-2013
I started by verifying that the following would match and remove the URL:
sed 's/http:\/\/[^\s]\+//g'
Then I tried this, but couldn't match the URL starting at the beginning of the file / input:
sed 's/[^\>"]http:\/\/[^\s]\+//g'
Is there a way around this in sed, either by modeling lookbehind / lookahead, or by explicitly matching the beginning of the file and the end of the file?
regex regex-negation awk sed regex-lookarounds
merlin2011
source share