The regular expression to remove a subdomain from the root domain in the list is Notepad ++ or Gvim

Question

The regular expression to remove a subdomain from the root domain in the list is Notepad ++ or Gvim

I have a list of URLs stored in a TXT file (I am using Windows 7).

The format of the URLs is as follows:

somesite1.com somesite2.com somesite3.com sub1.somesite3.com sub2.somesite3.com sub3.somesite3.com sub1.somesite3.net sub1.somesite1.org

Notepad ++ has the ability to use “find-replace with regular expressions,” and I'm sure gvim allows the user of regular expressions (although I'm not quite sure how to use them in Gvim).

In any case, I do not know what to put in the search and replace field so that it can view the contents of the file and leave me only with root domains. If done correctly, this will turn the above list of examples into the following:

 somesite1.com somesite2.com somesite3.com somesite3.com somesite3.com somesite3.com somesite3.net somesite1.org

Can someone help me?

0

vim regex notepad ++

Learning Jun 22 2018-11-11T00:

source share

3 answers

Replace ^[^.]*\.(?=\w+\.\w+$) with <blank>

Decrypted, this means:

^ = start of line
[^.]* = any number of characters that are not dots
\. = point
(?=[^.]+\.[^.]+$) = there must be exactly one word, one dot, then one word from here to the end

EDITED - added a look at another point

EDITED AGAIN - Changed the look at one point between words

+1

Bohemian Jun 22 2018-11-11T00:

source share

Replace the entire line with the last word and the previous word.

 %s/^.*\.\(\w\+\.\w\+\)$/\1/g

Note that vim requires \ , ( , ) for + like \+

UPDATE:

 %s/^.*\.\([0-9a-z\-]\+\.[0-9a-z\-]\+\)$/\1/g

better perhaps.

0

mattn Jun 22 2018-11-11T00:

source share

Chris Morgan · Accepted Answer · 2011-06-22 01:38

Several ways to do this for Vim (trailing slashes are also optional):

 :%s/^.\+\.\ze[^.]\+\.[^.]\+$// :%s/^.\+\.\([^.]\+\.[^.]\+\)$/\1/

See also :help /\ze , etc. \ze and \zs are Vim-specific and very useful. There are also reliable and reliable statements that can be useful in Vim and PCRE.

I believe Notepad ++ uses PCRE; find ^.+\.([^.]+\.[^.]+)$ and replacing it with \1 should work (but I am not using Notepad ++).

Be aware that this will not work with top-level domains of a country code that use third-level registration - example.com.au will be converted to com.au And then there are some countries that use certain registrations at the second or third level according to certain rules ... if you are not indifferent to these cases, you will need more rules, and a full parser will be more accurate than the regular expression (although, as always, it was would be possible with regular expressions).

The regular expression to remove a subdomain from the root domain in the list is Notepad ++ or Gvim

More articles: