Genfromtxt: how to disable caching

Question

Genfromtxt: how to disable caching

I confirmed that the genfromtxt function (and those derived from it) silently cache the remote file, which they process in the local directory, and use the local copy in subsequent calls without checking if it has changed. If you look at the npyio.py source file , it seems that this is because the DataSource object that processes the request is created without passing the corresponding parameter. Of course, it’s easy to change the sources of the libraries to disable caching, but then I would have to repeat this after each update.
Is there any other solution? (except deleting the cache directory every time)

+4

python numpy

NameOfTheRose May 09 '15 at 11:38

source share

2 answers

, :

, ?
genfromtxt ?

1. (, ) , ( , , ).

, genfromtxt :

def patched_gen_from_text(*args, **kwargs):
    # Do something regarding caching
    return numpy.genfromtxt(*args, **kwargs)

numpy.genfromtext ( , ):

import numpy 

numpy.genfromtxt = patched_gen_from_text

2. (, ? ?) .

, (, ). , , . , md5 RPC .

, filecmp , .

+3

Ami Tavory 09 '15 11:55

share

NameOfTheRose · Accepted Answer · 2015-05-15T09:40:18+0000

Studying the source of the library, I realized that the required behavior can be achieved by changing the default values of a small helper function called open in the numat datasource module. As suggested above, this is possible without changing the source of the library. Here is the code I came with:

import numpy
from numpy.lib._datasource import DataSource
#def open(path, mode='r', destpath=os.curdir):
def openm(path, mode='r', destpath=None):
  ds = DataSource(destpath)
  return ds.open(path, mode)
numpy.lib._datasource.open=openm

which must be enabled before calling genfromtxt or functions derived from it.
But my research also showed that these functions are quite slow and that when Windows caching is disabled, a warning is issued - this is not related to the override above, it seems to be related to how the mktemp function is implemented in Windows. In addition, the cached file and its associated temporary directories are not deleted.

, -, , , , (, , ), . datasource, , , , . datasource genfromtxt. , genfromtxt Windows Linux, .

, numpy cache genfromtxt

npyio.py, . genfromtxt

def genfromtxt(fname, dtype=float, comments='#', delimiter=None,
           skiprows=0, skip_header=0, skip_footer=0, converters=None,
           missing='', missing_values=None, filling_values=None,
           usecols=None, names=None,
           excludelist=None, deletechars=None, replace_space='_',
           autostrip=False, case_sensitive=True, defaultfmt="f%i",
           unpack=None, usemask=False, loose=True, invalid_raise=True):

to ( )

def genfromtxt(fname, dtype=float, comments='#', delimiter=None,
           skiprows=0, skip_header=0, skip_footer=0, converters=None,
           missing='', missing_values=None, filling_values=None,
           usecols=None, names=None,
           excludelist=None, deletechars=None, replace_space='_',
           autostrip=False, case_sensitive=True, defaultfmt="f%i",
           unpack=None, usemask=False, loose=True, invalid_raise=True,cache=True):

, /url,

if isinstance(fname, basestring):
    fhd = iter(np.lib._datasource.open(fname, 'rbU')
    own_fhd = True

if isinstance(fname, basestring):
    ds=DataSource('.' if cache==True else None)
    fhd = iter(ds.open(fname, 'rbU'if cache==True else 'rbUD'))
    own_fhd = True

'D' - Windows ( Linux), , .
Wrapper

def mgenfromtxt(fname,cache=True,**karg):
    if cache==False and isinstance(fname, basestring) and numpy.DataSource()._isurl(fname):
       ds=numpy.DataSource(None)
       fhd = iter(ds.open(fname,'rbUD'))
       l1=numpy.genfromtxt(fhd,**karg)
       fhd.close()
       del ds
       return l1
    else:
       return(numpy.genfromtxt(fname,**karg))

Genfromtxt: how to disable caching

More articles: