Setuptools: package data folder location

I am using setuptools to distribute my python package. Now I need to distribute additional data files.

From what I gathered from the setuptools documentation, I need to have data files inside the package directory. However, I would prefer my data files to be inside a subdirectory in the root directory.

What I would like to avoid:

/ #root |- src/ | |- mypackage/ | | |- data/ | | | |- resource1 | | | |- [...] | | |- __init__.py | | |- [...] |- setup.py 

What I would like to use instead:

 / #root |- data/ | |- resource1 | |- [...] |- src/ | |- mypackage/ | | |- __init__.py | | |- [...] |- setup.py 

I just don’t feel comfortable having so many subdirectories, if that’s not essential. I cannot find the reason why I / have / put the files in the package directory. It is also cumbersome to work with many nested IMHO subdirectories. Or is there a good reason to justify this restriction?

+74
python setuptools
Dec 23 '10 at 13:25
source share
4 answers

Option 1: Set as package data

The main advantage of placing data files at the root of your Python package is that it allows you not to worry about where the files will live on systems, which can be Windows, Mac, Linux, some mobile platform or inside Eggs. You can always find the data directory relative to your Python package root, no matter where or how it is installed.

For example, if I have a project layout like this:

 project/ foo/ __init__.py data/ resource1/ foo.txt 

You can add a function to __init__.py to find the absolute path to the file data:

 import os _ROOT = os.path.abspath(os.path.dirname(__file__)) def get_data(path): return os.path.join(_ROOT, 'data', path) print get_data('resource1/foo.txt') 

Outputs:

 /Users/pat/project/foo/data/resource1/foo.txt 

After the project is installed as an Egg, the path to data will change, but the code does not need to be changed:

 /Users/pat/virtenv/foo/lib/python2.6/site-packages/foo-0.0.0-py2.6.egg/foo/data/resource1/foo.txt 



Option 2: set to a fixed location

An alternative would be to place your data outside the Python package, and then either:

  • Pave the data location through the configuration file, command line arguments, or
  • Paste the location into your Python code.

This is much less desirable if you plan to distribute your project. If you really want to do this, you can install data wherever you want on the target system by specifying the destination for each group of files by going to the list of tuples:

 from setuptools import setup setup( ... data_files=[ ('/var/data1', ['data/foo.txt']), ('/var/data2', ['data/bar.txt']) ] ) 

Updated : an example shell function for a recursive grep Python file:

 atlas% function grep_py { find . -name '*.py' -exec grep -Hn $* {} \; } atlas% grep_py ": \[" ./setup.py:9: package_data={'foo': ['data/resource1/foo.txt']} 
+97
Mar 24 '11 at 17:33
source share

I think I found a good compromise that will allow you to maintain the following structure:

 / #root |- data/ | |- resource1 | |- [...] |- src/ | |- mypackage/ | | |- __init__.py | | |- [...] |- setup.py 

You should set the data as package_data in order to avoid the problems described in the answer with the samplebias example, but in order to save the file structure, you must add it to the setup.py file:

 try: os.symlink('../../data', 'src/mypackage/data') setup( ... package_data = {'mypackage': ['data/*']} ... ) finally: os.unlink('src/mypackage/data') 

In this way, we create an appropriate β€œjust in time” structure and maintain an organized source tree.

To access such data files in your code, you simply use:

data = resource_filename(Requirement.parse("main_package"), 'mypackage/data')

I still don't like to specify "mypackage" in the code, because the data may have nothing to do with this module, but I think this is a good compromise.

+8
Oct 23 '14 at 17:31
source share

I use setuptools to create my own OS packages such as RPM and DEB. I use the project layout.

 <project>/ lib/ -> .../lib/pythonX/site-packages/ bin/ -> .../bin/ etc/ -> /etc/ doc/ man/ -> .../man/man1/ share/ -> .../share/doc/<project>/ 

In my setup.py , matching is done as described above. I find this layout ideal for python. Released packages can be moved, but by default they will be under /usr/local/ .

-one
Mar 30 '11 at 5:15
source share

I think that you can basically give something as an argument * data_files * for setup ().

-3
Dec 23 '10 at 15:23
source share



All Articles