Iterators, Sloth, and Property

I start with Rust and I play with regex to create a lexer.

Lexer uses a large regular expression containing a bunch of named capture groups. I am trying to take the results of my regular expression and create a Vec<&str, &str> the capture name and capture value, but I always run into the problem of the lifetime of the values ​​returned from the iteration when matching and filtering by results.

I think this has something to do with laziness and the fact that the iterator was not absorbed when leaving the scope, but I'm not sure how to solve the problem.

 extern crate regex; use regex::Regex; fn main() { // Define a regular expression with a bunch of named capture groups let expr = "((?P<num>[0-9]+)|(?P<str>[a-zA-Z]+))"; let text = "0ab123cd"; let re = Regex::new(&expr).unwrap(); let tokens: Vec<(&str, &str)> = re.captures_iter(text) .flat_map(|t| t.iter_named()) .filter(|t| t.1.is_some()) .map(|t| (t.0, t.1.unwrap())) .collect(); for token in tokens { println!("{:?}", token); } } 

Executing the above code results in the following error:

 $ cargo run Compiling hello_world v0.0.1 (file:///Users/dowling/projects/rust_hello_world) src/main.rs:14:23: 14:24 error: `t` does not live long enough src/main.rs:14 .flat_map(|t| t.iter_named()) ^ src/main.rs:17:19: 22:2 note: reference must be valid for the block suffix following statement 3 at 17:18... src/main.rs:17 .collect(); src/main.rs:18 src/main.rs:19 for token in tokens { src/main.rs:20 println!("{:?}", token); src/main.rs:21 } src/main.rs:22 } src/main.rs:14:23: 14:37 note: ...but borrowed value is only valid for the block at 14:22 src/main.rs:14 .flat_map(|t| t.iter_named()) ^~~~~~~~~~~~~~ error: aborting due to previous error Could not compile `hello_world`. 
+5
source share
1 answer

The limit point in your situation is the .iter_named() method:

 fn iter_named(&'t self) -> SubCapturesNamed<'t> 

Pay attention to &'t self : the exit lifetime will be tied to the lifetime of the Captures instance. This is because the names are stored in the Capture object, so any &str for them cannot survive this object.

There is only one fix for this: you must keep Capture instances alive:

 let captures = re.captures_iter(text).collect::<Vec<_>>(); let tokens: Vec<(&str, &str)> = captures.iter() .flat_map(|t| t.iter_named()) .filter(|t| t.1.is_some()) .map(|t| (t.0, t.1.unwrap())) .collect(); 
+7
source

All Articles