Return a lazy iterator that depends on the data allocated inside the function

I am new to Rust and I read "Rust Programming Language", and in the "Error Handling" section there is a "case study" describing a program for reading data from a CSV file using the csv and rustc-serialize libraries (using getopts to parse the arguments).

The author writes a search function that goes through the lines of the csv file using the csv::Reader object and collects those records whose field "city" matches the specified value in the vector and returns it. I used a slightly different approach than the author, but this should not affect my question. My (working) function looks like this:

 extern crate csv; extern crate rustc_serialize; use std::path::Path; use std::fs::File; fn search<P>(data_path: P, city: &str) -> Vec<DataRow> where P: AsRef<Path> { let file = File::open(data_path).expect("Opening file failed!"); let mut reader = csv::Reader::from_reader(file).has_headers(true); reader.decode() .map(|row| row.expect("Failed decoding row")) .filter(|row: &DataRow| row.city == city) .collect() } 

where the type of DataRow is just a record,

 #[derive(Debug, RustcDecodable)] struct DataRow { country: String, city: String, accent_city: String, region: String, population: Option<u64>, latitude: Option<f64>, longitude: Option<f64> } 

Now the author poses, as a terrible "exercise for the reader," the problem of changing this function to return an iterator instead of a vector (excluding the call to collect ). My question is: how can this be done at all, and what are the most concise and idiomatic ways to do this?


A simple attempt, which seems to me to get the correct type signature, is

 fn search_iter<'a,P>(data_path: P, city: &'a str) -> Box<Iterator<Item=DataRow> + 'a> where P: AsRef<Path> { let file = File::open(data_path).expect("Opening file failed!"); let mut reader = csv::Reader::from_reader(file).has_headers(true); Box::new(reader.decode() .map(|row| row.expect("Failed decoding row")) .filter(|row: &DataRow| row.city == city)) } 

I am returning a tag object of type Box<Iterator<Item=DataRow> + 'a> so as not to expose the Filter type internally and where the lifetime of 'a is entered to avoid the need to create a local city clone. But this cannot be compiled because the reader does not live long enough; it is allocated on the stack and therefore freed when the function returns.

I suppose this means that the reader must be allocated on the heap (i.e. in the box) from the very beginning or somehow moved from the stack before the function completes. If I returned the closure, this is exactly the problem that would be solved by closing move . But I do not know how to do something like this when I do not return a function. I tried to determine the type of user iterator containing the necessary data, but I could not get it to work, and it was more ugly and more inventive (do not do too much of this code, I only include it in showing the general direction of my attempts):

 fn search_iter<'a,P>(data_path: P, city: &'a str) -> Box<Iterator<Item=DataRow> + 'a> where P: AsRef<Path> { struct ResultIter<'a> { reader: csv::Reader<File>, wrapped_iterator: Option<Box<Iterator<Item=DataRow> + 'a>> } impl<'a> Iterator for ResultIter<'a> { type Item = DataRow; fn next(&mut self) -> Option<DataRow> { self.wrapped_iterator.unwrap().next() } } let file = File::open(data_path).expect("Opening file failed!"); // Incrementally initialise let mut result_iter = ResultIter { reader: csv::Reader::from_reader(file).has_headers(true), wrapped_iterator: None // Uninitialised }; result_iter.wrapped_iterator = Some(Box::new(result_iter.reader .decode() .map(|row| row.expect("Failed decoding row")) .filter(|&row: &DataRow| row.city == city))); Box::new(result_iter) } 

This question , apparently, concerns the same problem, but the author of the answer solves it by making the corresponding data static , which I do not think is an alternative for this question.

I am using Rust 1.10.0, the current stable version from the Arch Linux rust package.

+7
iterator allocation heap-memory rust lifetime
source share
1 answer

The easiest way to transform the original function is to simply wrap the iterator . However, this will directly lead to problems, because you cannot return an object that refers to itself , and the decode result refers to Reader . If you could overcome this, you could not return the links to the iterator yourself .

One solution is to simply re-create the DecodedRecords iterator for each call to your new iterator:

 fn search_iter<'a, P>(data_path: P, city: &'a str) -> MyIter<'a> where P: AsRef<Path> { let file = File::open(data_path).expect("Opening file failed!"); MyIter { reader: csv::Reader::from_reader(file).has_headers(true), city: city, } } struct MyIter<'a> { reader: csv::Reader<File>, city: &'a str, } impl<'a> Iterator for MyIter<'a> { type Item = DataRow; fn next(&mut self) -> Option<Self::Item> { let city = self.city; self.reader.decode() .map(|row| row.expect("Failed decoding row")) .filter(|row: &DataRow| row.city == city) .next() } } 

This may have overhead associated with it, depending on the implementation of the decode . Alternatively, it can be a "rewind" to the beginning of the input - if you replaced Vec instead of csv::Reader , you will see this. However, in this case it works.

Also, I usually open the file and create csv::Reader outside the function and pass the DecodedRecords iterator and convert it, returning the alias newtype / box / type around the main iterator. I prefer this because the structure of your code reflects the lifetime of the objects.

I am a little surprised that there is no IntoIterator for csv::Reader , which also solves the problem because there would be no links.

+3
source share

All Articles