Python access data in the Package subdirectory

I am writing a python package with modules that need to open data files in a ./data/ subdirectory. Right now I have paths to files hard-coded into my classes and functions. I would like to write more reliable code that can access the subdirectory, regardless of where it is installed in the user system.

I tried various methods, but so far I have been out of luck. It appears that most of the "current directory" commands return the python system interpreter directory, not the module directory.

It seems to be a trivial, common problem. But I can’t figure it out. Part of the problem is that my data files are not .py files, so I cannot use import functions, etc.

Any suggestions?

My package directory now looks like this:

 / __init__.py module1.py module2.py data/ data.txt 

I am trying to access data.txt from module*.py

Thank!

+65
python packages
Apr 22 '09 at 22:17
source share
5 answers

You can use the underscore-underscore-file-underscore-underscore ( __file__ ) symbol to get the package path, for example:

 import os this_dir, this_filename = os.path.split(__file__) DATA_PATH = os.path.join(this_dir, "data", "data.txt") print open(DATA_PATH).read() 
+24
Apr 22 '09 at 22:37
source share

The standard way to do this is with the setuptools and pkg_resources packages.

You can lay out your package in accordance with the following hierarchy and configure the package installation file to specify its data resources in accordance with this link:

http://docs.python.org/distutils/setupscript.html#installing-package-data

Then you can re-find and use these files using pkg_resources at this link:

http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access

 import pkg_resources DATA_PATH = pkg_resources.resource_filename('<package name>', 'data/') DB_FILE = pkg_resources.resource_filename('<package name>', 'data/sqlite.db') 
+95
Apr 08 2018-11-11T00:
source share

Provide a solution that works today. Definitely use this API to not reinvent all of these wheels.

A true file system file name is required. Eggs will be extracted to the cache directory:

 from pkg_resources import resource_filename, Requirement path_to_vik_logo = resource_filename(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png") 

Returns a readable file-like object for the specified resource; it could be an actual file, StringIO, or some similar object. The stream is in "binary mode", in the sense that any bytes in the resource will be read as is.

 from pkg_resources import resource_stream, Requirement vik_logo_as_stream = resource_stream(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png") 

Package discovery and resource access with pkg_resources

+11
Oct 09 '14 at 12:33 on
source share

I think I pursued the answer.

I create a data_path.py module that I import into my other modules containing:

 data_path = os.path.join(os.path.dirname(__file__),'data') 

And then I open all my files with

 open(os.path.join(data_path,'filename'), <param>) 
+6
Apr 22 '09 at 22:35
source share

You need a name for your entire module, you are provided with a directory tree that does not list this part, it worked for me:

 import pkg_resources print( pkg_resources.resource_filename(__name__, 'data/data.txt') ) 

It is noticeable that setuptools does not allow files based on matching names with packed data files, so you must include the data/ prefix, no matter what. You can use os.path.join('data', 'data.txt) if you need alternative directory separators. As a rule, I do not find compatibility issues with hard-coded unix style directory separators.

+3
Dec 10 '15 at 9:59
source share



All Articles