Selecting specific columns from df -h output in python

Question

Selecting specific columns from df -h output in python

I am trying to create a simple script that will select specific columns from the unix df - h command. I can use awk for this, but how can we do this in python?

Here df -h output:

 Filesystem Size Used Avail Use% Mounted on
 / dev / mapper / vg_base-lv_root 28G 4.8G 22G 19% /
 tmpfs 814M 176K 814M 1% / dev / shm
 / dev / sda1 485M 120M 340M 27% / boot

I need something like:

Column 1:

 Filesystem
 / dev / mapper / vg_base-lv_root           
 tmpfs                 
 / dev / sda1

Column 2:

 Size
 28g
 814M 
 485M

+8

python unix parsing

user1610085 Aug 19 '12 at 14:24

source share

7 answers

Here is a complete example:

 import subprocess import re p = subprocess.Popen("df -h", stdout=subprocess.PIPE, shell=True) dfdata, _ = p.communicate() dfdata = dfdata.replace("Mounted on", "Mounted_on") columns = [list() for i in range(10)] for line in dfdata.split("\n"): line = re.sub(" +", " ", line) for i,l in enumerate(line.split(" ")): columns[i].append(l) print columns[0]

Mount points are assumed to not contain spaces.

Below is a more complete (and complicated solution) that is not a hard number of columns:

 import subprocess import re def yield_lines(data): for line in data.split("\n"): yield line def line_to_list(line): return re.sub(" +", " ", line).split() p = subprocess.Popen("df -h", stdout=subprocess.PIPE, shell=True) dfdata, _ = p.communicate() dfdata = dfdata.replace("Mounted on", "Mounted_on") lines = yield_lines(dfdata) headers = line_to_list(lines.next()) columns = [list() for i in range(len(headers))] for i,h in enumerate(headers): columns[i].append(h) for line in lines: for i,l in enumerate(line_to_list(line)): columns[i].append(l) print columns[0]

+2

Zaar hai Aug 19 '12 at 15:11

source share

Not an answer to the question, but I tried to solve the problem. :)

 from os import statvfs with open("/proc/mounts", "r") as mounts: split_mounts = [s.split() for s in mounts.read().splitlines()] print "{0:24} {1:24} {2:16} {3:16} {4:15} {5:13}".format( "FS", "Mountpoint", "Blocks", "Blocks Free", "Size", "Free") for p in split_mounts: stat = statvfs(p[1]) block_size = stat.f_bsize blocks_total = stat.f_blocks blocks_free = stat.f_bavail size_mb = float(blocks_total * block_size) / 1024 / 1024 free_mb = float(blocks_free * block_size) / 1024 / 1024 print "{0:24} {1:24} {2:16} {3:16} {4:10.2f}MiB {5:10.2f}MiB".format( p[0], p[1], blocks_total, blocks_free, size_mb, free_mb)

+2

eminor Aug 19 '12 at 18:28

source share

Do not use os.popen as it is deprecated (http://docs.python.org/library/os#os.popen).

I put the output of df -h in the file: test.txt and just read from this file. But you can read using the subprocess. Assuming you can read every line of df -h output, the following code would help: -

 f = open('test.txt') lines = (line.strip() for line in f.readlines()) f.close() splittedLines = (line.split() for line in lines) listOfColumnData = zip(*splittedLines) for eachColumn in listOfColumnData: print eachColumn

eachColumn will display the entire column that you want as a list. You can just iterate over it. If you need, I can give code to read the output from df -h so that you can remove the dependency on test.txt, but if you go to the documentation on the subprocess, you can find how to do this easily.

+1

Godman Aug 19 '12 at 15:13

source share

I had a mount point with a space in it. This cast aside most of the examples. This takes a lot from @ZarrHai, but puts the result in a dict

 #!/usr/bin/python import subprocess import re from pprint import pprint DF_OPTIONS = "-laTh" # remove h if you want bytes. def yield_lines(data): for line in data.split("\n"): yield line def line_to_list(line): pattern = re.compile(r"([\w\/\s\-\_]+)\s+(\w+)\s+([\d\.]+?[GKM]|\d+)" "\s+([\d\.]+[GKM]|\d+)\s+([\d\.]+[GKM]|\d+)\s+" "(\d+%)\s+(.*)") matches = pattern.search(line) if matches: return matches.groups() _line = re.sub(r" +", " ", line).split() return _line p = subprocess.Popen(["df", DF_OPTIONS], stdout=subprocess.PIPE) dfdata, _ = p.communicate() dfdata = dfdata.replace("Mounted on", "Mounted_on") lines = yield_lines(dfdata) headers = line_to_list(lines.next()) columns = [list() for i in range(len(headers))] for i,h in enumerate(headers): columns[i].append(h) grouped = {} for li, line in enumerate(lines): if not line: continue grouped[li] = {} for i,l in enumerate(line_to_list(line)): columns[i].append(l) key = headers[i].lower().replace("%","") grouped[li][key] = l.strip() pprint(grouped)

+1

the7erm Apr 11 '15 at 1:32

source share

It works:

 #!/usr/bin/python import os, re l=[] p=os.popen('df -h') for line in p.readlines(): l.append(re.split(r'\s{2,}',line.strip())) p.close() for subl in l: print subl

0

dawg Aug 19 '12 at 15:18

source share

I found this as an easy way to do this ...

 df -h | awk '{print $1}'

0

user3868300 Aug 11 '15 at 12:33

source share

Gilles · Accepted Answer · 2012-08-19T15:06:17+0000

You can use op.popen to run the command and get its result, then splitlines and split to separate the lines and fields. Run df -Ph rather than df -h so that the rows are not split if the column is too long.

 df_output_lines = [s.split() for s in os.popen("df -Ph").read().splitlines()]

The result is a list of strings. To extract the first column, you can use [line[0] for line in df_output_lines] (note that the columns are numbered from 0) and so on. You can use df_output_lines[1:] instead of df_output_lines to remove the title bar.

If you already have the df -h file stored in the file somewhere, you need to join the lines first.

 fixed_df_output = re.sub('\n\s+', ' ', raw_df_output.read()) df_output_lines = [s.split() for s in fixed_df_output.splitlines()]

Note that this assumes that neither the file system name nor the mount point contains spaces. If they are executed (which is possible with some settings in some unix variants), it is almost impossible to parse the output of df , even df -P . You can use os.statvfs to get information about a given file system (this is the Python interface for the C function, which df calls internally for each file system), but there is no portable way to list file systems.

Selecting specific columns from df -h output in python

More articles: