Is there any smart way to combine overlapping paths in python?

Say I have two path names: head and tail . They can overlap with any number of segments. If they do not, I would just like to join them normally. If they overlap, I would like to discover the common part and combine them accordingly. To be more specific: if there are duplicates in the names, I would like to find as much overlapping part as possible. Example

"/root/d1/d2/d1/d2" + "d2/d1/d2/file.txt" == "/root/d1/d2/d1/d2/file.txt" and not "/root/d1/d2/d1/d2/d1/d2/file.txt" 

Is there any ready-to-use library function for such a case, or should I implement one?

+8
python
source share
3 answers

You can use list comprehension in the join function:

 >>> p1="/root/d1/d2/d1/d2" >>> p2="d2/d1/d2/file.txt" >>> p1+'/'+'/'.join([i for i in p2.split('/') if i not in p1.split('/')]) '/root/d1/d2/d1/d2/file.txt' 

Or, if the difference is just the base name of the second path, you can use os.path.basename to get the name bname and combine it into p1 :

 >>> import os >>> p1+'/'+os.path.basename(p2) '/root/d1/d2/d1/d2/file.txt' 
+3
source share

I suggest you use difflib.SequenceMatcher followed by get_matching_blocks

 >>> p1, p2 = "/root/d1/d2/d1/d2","d2/d1/d2/file.txt" >>> sm = difflib.SequenceMatcher(None,p1, p2) >>> size = sm.get_matching_blocks()[0].size >>> path = p1 + p2[size:] >>> path '/root/d1/d2/d1/d2/file.txt' 

Ans General Solution

 def join_overlapping_path(p1, p2): sm = difflib.SequenceMatcher(None,p1, p2) p1i, p2i, size = sm.get_matching_blocks()[0] if not p1i or not p2i: None p1, p2 = (p1, p2) if p2i == 0 else (p2, p1) size = sm.get_matching_blocks()[0].size return p1 + p2[size:] 

Execution

 >>> join_overlapping_path(p1, p2) '/root/d1/d2/d1/d2/file.txt' >>> join_overlapping_path(p2, p1) '/root/d1/d2/d1/d2/file.txt' 
+3
source share

I think this works:

 p1 = "/root/d1/d2/d1/d2" p2 = "d2/d1/d2/file.txt" def find_joined_path(p1, p2): for i in range(len(p1)): if p1[i:] == p2[:len(p1) - i]: return p1[:i] + p2 print(find_joined_path(p1, p2)) 

Note that this is a general solution that works for any two lines, so it may not be as optimized as a solution that only works with file paths.

+1
source share

All Articles