I use a scanner with a separator, and I came across strange behavior that I would like to understand.
I am using this program:
Scanner sc = new Scanner("Aller à : Navigation, rechercher"); sc.useDelimiter("\\s+|\\s*\\p{Punct}+\\s*"); String word=""; while(sc.hasNext()){ word = sc.next(); System.out.println(word); }
Conclusion:
Aller à Navigation rechercher
So, at first I don’t understand why I get an empty token, the documentation says:
Depending on the type of demarcation template, empty tokens may be returned. For example, the pattern "\ s +" will not return empty tokens, since it matches multiple separator instances. The separator pattern "\ s" can return empty tokens, as it passes only one space at a time.
I use \\s+
, so why does it return an empty token?
Then there is one more thing that I would like to understand regarding regex. If I change the delimiter using a "reverse" regular expression:
sc.useDelimiter("\\s*\\p{Punct}+\\s*|\\s+");
The result is correct, and I get:
Aller à Navigation rechercher
Why does this work along the way?
EDIT:
In this case:
Scanner sc = new Scanner("(23 ou 24 minutes pour les épisodes avec introduction) (approx.)1"); sc.useDelimiter("\\s*\\p{Punct}+\\s*|\\s+");
I still have an empty token between introduction
and approx
. Can this be avoided?