Reverse mapping of a map <string, int> to a vector <vector <string>>
Based on the previous question, I have a mapping of words and their score stored in map<string, int> . I want to change this, so that all words with the same count are combined together. My solution was to use vector<vector<string> > . The index of the first vector is a counter and the second vector is a collection of words.
After reading the answers to the previous question, here is what I tried to do:
vector<vector<string> > sorted_words; for (map<string, int>::const_iterator it = counters.begin(); it != counters.end(); ++it) { cout << "word:" << it->first << " count:" << it-> second << " vector size: " << sorted_words.size() << endl; if (sorted_words.size() - 1 > it->second && sorted_words[ it->second ].size() > 0) { cout << "Adding " << it->first << endl; sorted_words[ it->second ].push_back(it->first); } else { cout << "Creating " << it->first << endl; vector<string> v; v.push_back(it->first); sorted_words.resize( it->second + 1 ); sorted_words[it->second] = v; } } This results in segfault on the very first loop pass in the if statement.
What I'm trying to do is see if the size of the outer vector is such that my current value is inbounds, and if so, if I already created a nested vector. (I need to do this because the map can return in any order. For example, the very first element could be <"foo", 3>.)
If I'm going to do it in a way that is different from C ++, feel free to point it out.
For a space, you will most likely be better off using std::map<int, std::vector<string>> . The following rather trivial code (which could be improved by lower clipping all words and removing punctuation) demonstrates:
#include <iostream> #include <fstream> #include <sstream> #include <vector> #include <map> using namespace std; int main(int argc, char *argv[]) { if (argc < 2) return EXIT_FAILURE; // map of strings to counts. std::map<string, int> strs; ifstream inf(argv[1]); string str; while (inf >> str) ++strs[str]; // map of counts to strings, smallest to largest. std::map<int, std::vector<string>> vals; for (auto it : strs) vals[ it.second ].push_back(it.first); // report counts for each for (auto it : vals) { cout << "Count: " << it.first << ": "; std::copy(it.second.begin(), it.second.end(), ostream_iterator<string>(cout, " ")); cout << endl; } } Input example
I chose the Shakespearean monologue, “How Do You Like It,” which has some interesting attributes, as you will see in an instant:
All the world a stage, And all the men and women merely players: They have their exits and their entrances; And one man in his time plays many parts, His acts being seven ages. At first, the infant, Mewling and puking in the nurse arms. And then the whining school-boy, with his satchel And shining morning face, creeping like snail Unwillingly to school. And then the lover, Sighing like furnace, with a woeful ballad Made to his mistress' eyebrow. Then a soldier, Full of strange oaths and bearded like the pard, Jealous in honour, sudden and quick in quarrel, Seeking the bubble reputation Even in the cannon mouth. And then the justice, In fair round belly with good capon lined, With eyes severe and beard of formal cut, Full of wise saws and modern instances; And so he plays his part. The sixth age shifts Into the lean and slipper'd pantaloon, With spectacles on nose and pouch on side, His youthful hose, well saved, a world too wide For his shrunk shank; and his big manly voice, Turning again toward childish treble, pipes And whistles in his sound. Last scene of all, That ends this strange eventful history, Is second childishness and mere oblivion, Sans teeth, sans eyes, sans taste, sans everything. Output result
Count: 1: All At Even For In Into Is Jealous Last Made Mewling Sans Seeking Sighing That The Then They Turning Unwillingly acts again age ages. all all, arms. ballad beard bearded being belly big bubble cannon capon childish childishness creeping cut, ends entrances; eventful everything. exits eyebrow. eyes eyes, face, fair first, formal furnace, good have he history, honour, hose, infant, instances; justice, lean lined, lover, man manly many men mere merely mistress' modern morning mouth. nose nurse oaths oblivion, one pantaloon, pard, part. parts, pipes players: pouch puking quarrel, quick reputation round satchel saved, saws scene school-boy, school. second seven severe shank; shifts shining shrunk side, sixth slipper'd snail so soldier, sound. spectacles stage, sudden taste, teeth, this time too toward treble, voice, well whining whistles wide wise woeful women world world youthful Count: 2: Full His With on plays strange their to Count: 3: like sans then with Count: 4: a of Count: 6: in Count: 7: his Count: 8: And Count: 11: and the An interesting amount of unique phrases in the monologue, which are great. Almost the way he planned it. However, when capitalization and punctuation are taken into account, the numbers are significantly different. Fortunately, it was trivial to do this by changing only the first while loop:
while (inf >> str) { string alpha; for_each(str.begin(), str.end(), [](char& c){c=tolower(static_cast<unsigned char>(c));}); copy_if(str.begin(), str.end(), back_inserter(alpha), [](const char& c){return isalpha(static_cast<unsigned char>(c));}); ++strs[alpha]; } This gave us the following results:
Count: 1: acts again age ages arms at ballad beard bearded being belly big bubble cannons capon childish childishness creeping cut ends entrances even eventful everything exits eyebrow face fair first for formal furnace good have he history honour hose infant instances into is jealous justice last lean lined lover made man manly many men mere merely mewling mistress modern morning mouth nose nurses oaths oblivion one pantaloon pard part parts pipes players pouch puking quarrel quick reputation round satchel saved saws scene school schoolboy second seeking seven severe shank shifts shining shrunk side sighing sixth slipperd snail so soldier sound spectacles stage sudden taste teeth that they this time too toward treble turning unwillingly voice well whining whistles wide wise woeful women world worlds youthful Count: 2: eyes full on plays strange their to Count: 3: all like Count: 4: a of sans then Count: 5: with Count: 7: in Count: 9: his Count: 12: the Count: 19: and Still pretty impressive, Billy.
As an added bonus, due to the nature of the first card sorting, you get a list of words in alphabetical order. Woot for bonus features.
If you want to invert a map<K, V> , use map<V, vector<K>> . If you don't care about what the actual score is at this point, you can effectively build vector<vector<V>> by going from the intermediate map. For instance:
vector<vector<string>> invert(const map<string, int>& input) { map<int, vector<string>> inverse; for (const auto& pair : input) inverse[pair.second].push_back(pair.first); vector<vector<string>> result; for (auto& pair : inverse) result.push_back(move(pair.second)); return result; } Instead of having a vector<vector<string>> or an inverted map, for example map<int, vector> , you can use your existing map ( map<string, int> ) and just create vector<vector<int>> for collecting indices of all words having the same count. This will save you a lot of space. In addition, when changing the previous map<string, int> all you have to do is update the indexes in vector<vector<int>> , which will be faster than in the other two approaches.