csv.readerwill not read the entire file into memory. It lazily iterates over the file line by line when you iterate over an object reader. That way you can just use reader, as usual, but breakwith your iteration after you read, but how many lines you want to read. You can see this in the C code used to implement the objectreader .
Initializer for the reader objecT:
static PyObject *
csv_reader(PyObject *module, PyObject *args, PyObject *keyword_args)
{
PyObject * iterator, * dialect = NULL;
ReaderObj * self = PyObject_GC_New(ReaderObj, &Reader_Type);
if (!self)
return NULL;
self->dialect = NULL;
self->fields = NULL;
self->input_iter = NULL;
self->field = NULL;
self->input_iter = PyObject_GetIter(iterator);
if (self->input_iter == NULL) {
PyErr_SetString(PyExc_TypeError,
"argument 1 must be an iterator");
Py_DECREF(self);
return NULL;
}
static PyObject *
Reader_iternext(ReaderObj *self)
{
PyObject *fields = NULL;
Py_UCS4 c;
Py_ssize_t pos, linelen;
unsigned int kind;
void *data;
PyObject *lineobj;
if (parse_reset(self) < 0)
return NULL;
do {
lineobj = PyIter_Next(self->input_iter);
if (lineobj == NULL) {
if (!PyErr_Occurred() && (self->field_len != 0 ||
self->state == IN_QUOTED_FIELD)) {
if (self->dialect->strict)
PyErr_SetString(_csvstate_global->error_obj,
"unexpected end of data");
else if (parse_save_field(self) >= 0)
break;
}
return NULL;
}
As you can see, it next(reader_object)calls next(file_object)internally. This way you iterate through the lines without reading all of this in memory.
source
share