Here is a very MATLAB-y way to do this. I tried to clearly define the variables. Play with each line and examine the results to understand how it works. Workhorse Functions: unique and hist
% First produce a cell array of words to be analyzed paragraph_cleaned_up_whitespace = regexprep(paragraph, '\s', ' '); paragraph_cleaned_up = regexprep(paragraph_cleaned_up_whitespace, '[^a-zA-Z0-9 ]', ''); words = regexpi(paragraph_cleaned_up, '\s+', 'split'); [unique_words, i, j] = unique(words); frequency_count = hist(j, 1:max(j)); [~, sorted_locations] = sort(frequency_count); sorted_locations = fliplr(sorted_locations); words_sorted_by_frequency = unique_words(sorted_locations).'; frequency_of_those_words = frequency_count(sorted_locations).';
Peter source share