How to split dos-path into its components in Python

I have a string variable that represents a dos path, for example:

var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"

I want to split this line into:

[ "d", "stuff", "morestuff", "furtherdown", "THEFILE.txt" ]

I tried using split() and replace() , but they either handle the first backslash or insert hexadecimal numbers into a string.

I need to somehow convert this string variable to a raw string so that I can parse it.

What is the best way to do this?

I should also add that the contents of var that is, the path I'm trying to parse, is actually the return value of the query on the command line. This is not the data path that I generate myself. It is stored in a file, and the command line tool is not going to hide the backslash.

+114
python
Jul 02 '10 at 15:41
source share
18 answers

I have been bitten many times by people who write their own startup functions and make mistakes. Spaces, slashes, backslashes, colons - the possibilities for confusion are not endless, but mistakes are still easy to make. Therefore, I advocate the use of os.path and recommend it on this basis.

(Nevertheless, the path to virtue is not the one that is most easily perceived, and many people who find it are tempted to go the slippery path straight to the curse. They will not understand that someday everything will fall apart, and they - or, what is more probably someone else - should figure out why everything went wrong, and it turns out that someone made a file name that mix slashes and backslashes - and someone assumes the answer is "don't do this" , Be not one of these people. Except for the one who mixed up the traits and backslashes - You may have them, if you want.)

You can get the drive and path + file as follows:

 drive, path_and_file = os.path.splitdrive(path) 

Get the path and file:

 path, file = os.path.split(path_and_file) 

Obtaining the names of individual folders is not particularly convenient, but it is a kind of honest average discomfort, which increases the pleasure of further searching for something that really works:

 folders = [] while 1: path, folder = os.path.split(path) if folder != "": folders.append(folder) else: if path != "": folders.append(path) break folders.reverse() 

(The message "\" appears at the beginning of folders if the path was originally absolute. If you do not want this, you may lose some code.)

+139
Jul 02 '10 at 17:01
source share

I would do

 import os path = os.path.normpath(path) path.split(os.sep) 

First, normalize the path string to the correct string for the OS. Then os.sep should be safe to use as a delimiter when splitting a string.

+211
May 16 '13 at 19:00
source share

You can simply use the most Pythonic approach (IMHO):

 import os your_path = r"d:\stuff\morestuff\furtherdown\THEFILE.txt" path_list = your_path.split(os.sep) print path_list 

What will give you:

 ['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] 

The key here is to use os.sep instead of '\\' or '/' , as this makes it system independent.

To remove a colon from a drive letter (although I see no reason why you want to do this), you can write:

 path_list[0] = path_list[0][0] 
+72
Jan 15 '13 at 9:38
source share

In Python> = 3.4, this has become much easier. Now you can use pathlib.Path.parts to get all parts of the path.

Example:

 >>> from pathlib import Path >>> Path('C:/path/to/file.txt').parts ('C:\\', 'path', 'to', 'file.txt') >>> Path(r'C:\path\to\file.txt').parts ('C:\\', 'path', 'to', 'file.txt') 

When installing Windows on Python 3, it is assumed that you are working with Windows paths, while on * nix it is assumed that you are working with posix paths. This is usually what you want, but if it is not, you can use the pathlib.PurePosixPath or pathlib.PureWindowsPath classes as needed:

 >>> from pathlib import PurePosixPath, PureWindowsPath >>> PurePosixPath('/path/to/file.txt').parts ('/', 'path', 'to', 'file.txt') >>> PureWindowsPath(r'C:\path\to\file.txt').parts ('C:\\', 'path', 'to', 'file.txt') >>> PureWindowsPath(r'\\host\share\path\to\file.txt').parts ('\\\\host\\share\\', 'path', 'to', 'file.txt') 

Edit: There is also backup access to python 2: pathlib2

+38
Aug 03 '16 at 10:45
source share

The problem here begins with how you create the string first.

 a = "d:\stuff\morestuff\furtherdown\THEFILE.txt" 

Done this way, Python tries to use a special case: \s , \m , \f and \T In your case, \f treated as formfeed (0x0C), while other backslashes are handled correctly. What you need to do is one of the following:

 b = "d:\\stuff\\morestuff\\furtherdown\\THEFILE.txt" # doubled backslashes c = r"d:\stuff\morestuff\furtherdown\THEFILE.txt" # raw string, no doubling necessary 

Then, as soon as you separate them, you will get the desired result.

+11
Jul 02 '10 at 16:15
source share

For a more concise solution, consider the following:

 def split_path(p): a,b = os.path.split(p) return (split_path(a) if len(a) and len(b) else []) + [b] 
+9
Feb 24 '13 at 10:43
source share

I can’t actually make a real answer to this question (since I came here hoping to find it myself), but for me the number of different approaches and all the caveats mentioned above is the surest indication that the Python os.path module is in desperate need of this as an inline function.

+4
May 15 '13 at 13:42
source share

This works for me:

 >>> a=r"d:\stuff\morestuff\furtherdown\THEFILE.txt" >>> a.split("\\") ['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] 

Of course, you may also need to strip out the colon from the first component, but saving it allows you to reassemble the path.

The r modifier marks the string literal as "raw"; note that inline backslashes do not double.

+3
Jul 02 '10 at 15:43
source share

Material near mypath.split("\\") will be better expressed as mypath.split(os.pathsep) . pathsep is the path separator for your specific platform (for example, \ for Windows, / for Unix, etc.), and the Python assembly knows which one to use. If you use pathsep then your code will be agnostic for the platform.

+1
Jul 02 2018-10-02T00:
source share

Suppose you have a filedata.txt file with content:

 d:\stuff\morestuff\furtherdown\THEFILE.txt d:\otherstuff\something\otherfile.txt 

You can read and split file paths:

 >>> for i in open("filedata.txt").readlines(): ... print i.strip().split("\\") ... ['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] ['d:', 'otherstuff', 'something', 'otherfile.txt'] 
+1
Jul 12 '10 at 11:19
source share

re.split () may help a little more than string.split ()

 import re var = "d:\stuff\morestuff\furtherdown\THEFILE.txt" re.split( r'[\\/]', var ) ['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] 

If you also want to support Linux and Mac paths, just add a filter (None, result) so that it removes unwanted "from" (split), since their paths start with "/" or "//". e.g. '// mount / ...' or '/ var / tmp /'

 import re var = "/var/stuff/morestuff/furtherdown/THEFILE.txt" result = re.split( r'[\\/]', var ) filter( None, result ) ['var', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] 
+1
Oct 08 '13 at 18:09
source share

Functional way with a generator.

 def split(path): (drive, head) = os.path.splitdrive(path) while (head != os.sep): (head, tail) = os.path.split(head) yield tail 

In action:

 >>> print([x for x in split(os.path.normpath('/path/to/filename'))]) ['filename', 'to', 'path'] 
+1
Nov 05 '15 at 8:55
source share

You can recursively os.path.split string

 import os def parts(path): p,f = os.path.split(path) return parts(p) + [f] if f else [p] 

Testing this for some path lines and reassembling the path with os.path.join

 >>> for path in [ ... r'd:\stuff\morestuff\furtherdown\THEFILE.txt', ... '/path/to/file.txt', ... 'relative/path/to/file.txt', ... r'C:\path\to\file.txt', ... r'\\host\share\path\to\file.txt', ... ]: ... print parts(path), os.path.join(*parts(path)) ... ['d:\\', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] d:\stuff\morestuff\furtherdown\THEFILE.txt ['/', 'path', 'to', 'file.txt'] /path\to\file.txt ['', 'relative', 'path', 'to', 'file.txt'] relative\path\to\file.txt ['C:\\', 'path', 'to', 'file.txt'] C:\path\to\file.txt ['\\\\', 'host', 'share', 'path', 'to', 'file.txt'] \\host\share\path\to\file.txt 

The first list item may need to be handled differently depending on how you want to deal with drive letters, UNC paths, and absolute and relative paths. Changing the last [p] to [os.path.splitdrive(p)] causes the problem to split the drive letter and directory root into a tuple.

 import os def parts(path): p,f = os.path.split(path) return parts(p) + [f] if f else [os.path.splitdrive(p)] [('d:', '\\'), 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] [('', '/'), 'path', 'to', 'file.txt'] [('', ''), 'relative', 'path', 'to', 'file.txt'] [('C:', '\\'), 'path', 'to', 'file.txt'] [('', '\\\\'), 'host', 'share', 'path', 'to', 'file.txt'] 

Edit: I realized that this answer is very similar to the above user1556435 , I leave my answer as the processing of the drive component in the path is different.

+1
Sep 22 '16 at 7:24
source share

Like the other explanations, your problem is using \ , which is the escape character in a string literal / constant. OTOH, if you have this line of the path to a file from another source (read from a file, console, or returned by the os function) - there would be no splitting of the problems into '\\' or r '\'.

And like others, suppose that if you want to use \ in the program’s literature, you need to either duplicate it \\ , or the whole literal must be prefixed with r , for example r'lite\ral' or r"lite\ral" , to avoid a parser that converts the \ character and r to the CR character (carriage return).

There is another way - just do not use backslashes \ paths in your code! Since the last century, Windows has recognized and works great with path names that use a slash as the / ! Directory separator Somehow not many people know about it .. but it works:

 >>> var = "d:/stuff/morestuff/furtherdown/THEFILE.txt" >>> var.split('/') ['d:', 'stuff', 'morestuff', 'furtherdown', 'THEFILE.txt'] 

This, by the way, will make your code work on Unix, Windows and Mac ... because they all use / as a directory separator ... even if you don't want to use the predefined os module constants.

0
Jul 08 2018-10-10T00: 00Z
source share

I use the following: since it uses the os.path.basename function, it does not add any slashes to the returned list. It also works with any slashes: ie window \\ or unix /. In addition, it does not add the \\\\ that windows use for server paths :)

 def SplitPath( split_path ): pathSplit_lst = [] while os.path.basename(split_path): pathSplit_lst.append( os.path.basename(split_path) ) split_path = os.path.dirname(split_path) pathSplit_lst.reverse() return pathSplit_lst 

So, for '\\\\ server \\ folder1 \\ folder2 \\ folder3 \\ folder4'

You get

['server', 'folder1', 'folder2', 'folder3', 'Folder4']

0
Apr 03 2018-12-12T00:
source share

Actually, I'm not sure if this answers the question completely, but I had fun writing this little function that stores the stack, adheres to os.path-based manipulations, and returns a list / stack of elements.

  9 def components(path): 10 ret = [] 11 while len(path) > 0: 12 path, crust = split(path) 13 ret.insert(0, crust) 14 15 return ret 16 
0
Jul 03 '15 at 0:55
source share

Below a line of code can handle:

  1. C: / path / path
  2. C: // path // path
  3. C: \ path \ path
  4. C: \ path \ path

path = re.split (r '[/// \]', path)

0
Apr 19 '19 at 12:46
source share

use ntpath.split()

-one
Jul 02 '10 at 15:44
source share



All Articles