How do you extract IP addresses from files using regex in linux shell?

How to extract text part using regular expressions in linux shell? Suppose I have a file with an IP address on each line, but in a different position. What is the easiest way to extract these IP addresses using regular Unix command line tools?

+56
command-line linux unix bash regex
Jan 09 '09 at 13:05
source share
19 answers

You can use grep to pull them out.

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt 
+112
Jan 09 '09 at 13:11
source share

Most of the examples here will correspond to 999.999.999.999, which are not the technically correct IP address.

The following will only match valid IP addresses (including network and broadcast addresses).

 grep -E -o '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' file.txt 

Omit -o if you want to see the whole line that matches.

+41
Jan 09 '09 at 13:46
source share

I usually start with grep to get the regex on the right.

 # [multiple failed attempts here] grep '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*' file # good? grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file # good enough 

Then I will try to convert it to sed to filter out the rest of the line. (After reading this topic, you and I will no longer do this: instead, we will use grep -o )

 sed -ne 's/.*\([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\).*/\1/p # FAIL 

This is when sed usually annoys me for not using the same regular expressions as the others. Therefore, I turn to perl .

 $ perl -nle '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ and print $&' 

Perl is good to know anyway. If you have a tiny CPAN bit set, you can even make it more reliable for a small fee:

 $ perl -MRegexp::Common=net -nE '/$RE{net}{IPV4}/ and say $&' file(s) 
+11
Jan 9 '09 at 13:35
source share

This works great for me in access logs.

 cat access_log | egrep -o '([0-9]{1,3}\.){3}[0-9]{1,3}' 

Let me break it in pieces.

  • [0-9]{1,3} means one to three occurrences of the range mentioned in []. In this case, it is 0-9. therefore, it matches patterns such as 10 or 183.

  • Followed by '.'. We will need to avoid this as. is a metacharacter and is of particular importance to the shell.

So now we are on patterns like '123.' '12 ". Et al.

  • This pattern is repeated three times (with the symbol '.'). Therefore, we will enclose it in brackets. ([0-9]{1,3}\.){3}

  • And finally, the pattern is repeated, but this time without a ".". That is why we saved it separately in the third stage. [0-9]{1,3}

If ips are at the beginning of each line, as in my case, use:

 egrep -o '^([0-9]{1,3}\.){3}[0-9]{1,3}' 

where '^' is the anchor that tells you to search at the beginning of the line.

+11
Aug 29 '14 at 18:08
source share

I wrote a little script to better see my log files, this is nothing special, but it can help many people who are learning perl. It searches for DNS by IP address after retrieving it.

+3
Jan 14 2018-11-11T00:
source share

I wrote an informative blog article on this topic: How to extract IPv4 and IPv6 IP addresses from plain text using Regex .

The article provides a detailed guide to the most common different patterns for IP addresses, which often need to be extracted and isolated from plain text using regular expressions.
This guide is based on the CodVerter IP Extractor source code tool to handle extraction and discovery of IP addresses as needed.

If you want to check and capture the IPv4 address, this template can do the job:

 \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b 

or for checking and capturing an IPv4 address with a prefix (slash):

 \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/[0-9]{1,2})\b 

or to capture a subnet mask or mask:

 (255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0) 

or filter out the subnet mask, the address that you do this with a regular expression of negative preview :

 \b((?!(255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)[.](255|254|252|248|240|224|192|128|0)))(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)[.]){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b 

To verify IPv6, you can follow the link to the article that I added at the top of this answer.
Here is an example to capture all common patterns (taken from the CodVerter IP Extractor help example):

enter image description here

If you want, you can check the IPv4 regex here .

+3
Jan 06 '19 at 20:06
source share

You can use some shell helper that I did: https://github.com/philpraxis/ipextract

included them here for convenience:

 #!/bin/sh ipextract () { egrep --only-matching -E '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' } ipextractnet () { egrep --only-matching -E '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/[[:digit:]]+' } ipextracttcp () { egrep --only-matching -E '[[:digit:]]+/tcp' } ipextractudp () { egrep --only-matching -E '[[:digit:]]+/udp' } ipextractsctp () { egrep --only-matching -E '[[:digit:]]+/sctp' } ipextractfqdn () { egrep --only-matching -E '[a-zA-Z0-9]+[a-zA-Z0-9\-\.]*\.[a-zA-Z]{2,}' } 

Download it / send it (if it is stored in the ipextract file) from the shell:

$. ipextract

Use them:

 $ ipextract < /etc/hosts 127.0.0.1 255.255.255.255 $ 

For an example of real use:

 ipextractfqdn < /var/log/snort/alert | sort -u dmesg | ipextractudp 
+2
Feb 22 '14 at 23:15
source share

grep -E -o "([0-9] {1,3} [.]) {3} [0-9] {1,3}"

+2
Feb 14 '15 at 20:29
source share

You can use sed . But if you know perl, it can be simpler and more useful to know in the long run:

 perl -n '/(\d+\.\d+\.\d+\.\d+)/ && print "$1\n"' < file 
+1
Jan 09 '09 at 13:14
source share

For those who want a ready-made solution for getting IP addresses from the apache log and listing the cases when the IP address visited the website, use this line:

 grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' error.log | sort | uniq -c | sort -nr > occurences.txt 

A good way to ban hackers. Next you can:

  1. Delete rows in less than 20 visits
  2. Using a regex, cut to one space so that you only have IP addresses
  3. Using regexp, cut the last 1-3 digits of the IP addresses so that you only have network addresses
  4. Add deny from and a space at the beginning of each line
  5. Put the result file as .htaccess
+1
Apr 08 '19 at 16:23
source share

I would suggest perl. (\ d +. \ d +. \ d +. \ d +) should probably do the trick.

EDIT: to make it look more like a complete program, you can do something like the following (not tested):

 #!/usr/bin/perl -w use strict; while (<>) { if (/(\d+\.\d+\.\d+\.\d+)/) { print "$1\n"; } } 

This processes a single IP address. If you have more than one IP address per line, you need to use the / g option. man perlretut gives you a more detailed guide to regular expressions.

0
Jan 09 '09 at 13:08
source share

You can also use awk. Something like...

awk '{i = 1; if (NF> 0) do {if ($ i ~ / regexp /) print $ i; i ++;} while (i <= NF);} 'file

- may require cleaning. just a quick and dirty answer to show basically how to do this with awk

0
Jan 09 '09 at 13:28
source share

All previous answers have one or more problems. The accepted answer allows ip numbers, for example 999.999.999.999. Currently, the second most recommended answer requires a prefix with 0, for example 127.000.000.001 or 008.008.008.008 instead of 127.0.0.1 or 8.8.8.8. For Apama, this is almost correct, but this expression requires ipnumber to be the only one in the line, does not allow any leading or trailing space, and cannot select ip from the middle of the line.

I think the correct regular expression can be found at http://www.regextester.com/22

So, if you want to extract all ip addresses from a file, use:

 grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt 

If you do not want duplicates to use:

 grep -Eo "(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])" file.txt | sort | uniq 

Please comment if there are still problems in this regex. It is easy to find a lot of incorrect regular expressions for this problem, I hope this has no real problems.

0
Jul 28 '17 at 21:37
source share

Everyone here uses really long regular expressions, but actually understanding the POSIX regular expression will allow you to use a small grep like this to print IP addresses.

 grep -Eo "(([0-9]{1,3})\.){3}([0-9]{1,3})" 

(Note) This does not ignore invalid IP addresses, but it is very simple.

0
Mar 10 '18 at 8:53
source share

I tried all the answers, but they all had one or more problems, and I listed some of them.

  1. Some have discovered 123.456.789.111 as a valid IP
  2. Some are not 127.0.00.1 valid IP
  3. Some do not detect IPs that start from scratch, like 08.8.8.8

Therefore, here I publish a regular expression that works on all of the above conditions.

Note: I extracted over 2 million IPs without any problems with the following regex.

 (?:(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)\.){3}(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d) 
0
Mar 13 '18 at 8:08
source share

I wanted to get only IP addresses that start with "10" from any file in the directory:

 grep -o -nr "[10]\{2\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" /var/www 
0
Jul 13 '19 at 0:05
source share

If you are not given a specific file, and you need to extract the IP address, we need to do this recursively. grep command → Searches for text or a file to match a given string and displays a consistent string.

grep -roE '[0-9] {1,3}. [0-9] {1,3}. [0-9] {1,3}. [0-9] {1,3} '| grep -oE '[0-9] {1,3}. [0-9] {1,3}. [0-9] {1,3}. [0-9] {1,3} '

-r → We can search the entire directory tree, that is, the current directory and all levels of subdirectories. It stands for recursive search.

-o → Print only the corresponding line

-E → Use extended regular expression

If we did not use the second grep command after this channel, we would get the IP address along with the path in which it is present.

-one
Sep 13 '15 at 21:02
source share
 cat ip_address.txt | grep '^[0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[,].*$\|^.*[,][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[,].*$\|^.*[,][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}[.][0-9]\{1,3\}$' 

Suppose the file is separated by a comma and the position of the ip address is at the beginning, at the end and somewhere in the middle

The first regular expression searches for the exact match of the ip address at the beginning of the line. The second regex after or looks for the ip address in the middle. We match it in such a way that the number that follows it must be 1 to 3 digits .falsy ips, like 12345.12.34.1, can be excluded in this.

The third regexp looks for the ip address at the end of the line

-one
Dec 02 '15 at 12:25
source share

for centos6.3

ifconfig eth0 | grep 'inet addr' | awk '{print $2}' | awk 'BEGIN {FS=":"} {print $2}'

-2
Mar 04 '13 at
source share



All Articles