Create a short readable line from a longer line

I have a requirement to enclose a string such as ...

Could you become a robot? You will be given a free annual oil change. "

... to something much shorter, but still humanly identifiable (it will need to be found from the selection list - my current solution has users entering an arbitrary title for the sole purpose of selection)

I would like to extract only the part of the line that forms the question (if possible), and then somehow reduce it to something like

WouldConsiderBecomingRobot

Are there any grammatical algorithms that can help me? I think there may be something that allows you to select only verbs and nouns .

Since this is just an act as a key, it does not have to be perfect; I do not want to trivialize the complex complexity of the English language.

+4
source share
5 answers

As a result, I created the following extension method that works surprisingly well. Thanks to Joe Blau for his excellent and efficient suggestions:

public static string Contract(this string e, int maxLength) { if(e == null) return e; int questionMarkIndex = e.IndexOf('?'); if (questionMarkIndex == -1) questionMarkIndex = e.Length - 1; int lastPeriodIndex = e.LastIndexOf('.', questionMarkIndex, 0); string question = e.Substring(lastPeriodIndex != -1 ? lastPeriodIndex : 0, questionMarkIndex + 1).Trim(); var punctuation = new [] {",", ".", "!", ";", ":", "/", "...", "...,", "-,", "(", ")", "{", "}", "[", "]","'","\""}; question = punctuation.Aggregate(question, (current, t) => current.Replace(t, "")); IDictionary<string, bool> words = question.Split(' ').ToDictionary(x => x, x => false); string mash = string.Empty; while (words.Any(x => !x.Value) && mash.Length < maxLength) { int maxWordLength = words.Where(x => !x.Value).Max(x => x.Key.Length); var pair = words.Where(x => !x.Value).Last(x => x.Key.Length == maxWordLength); words.Remove(pair); words.Add(new KeyValuePair<string, bool>(pair.Key, true)); mash = string.Join("", words.Where(x => x.Value) .Select(x => x.Key.Capitalize()) .ToArray() ); } return mash; } 

This reduces to 15 characters:

  • This has no prerequisites - write an essay ...: PrereqsWriteEssay
  • You have selected a vehicle: YouveSelectedCar
+1
source

It may be too simplistic, but I may be tempted to start with a list of "placeholder words":

 var fillers = new[]{"you","I","am","the","a","are"}; 

Then extract everything in front of the question mark (using a regular expression, mashing the lines, whatever you imagine), giving you “Could you become a robot”.

Then go to the line that extracts each word considered to be a filler.

 var sentence = "Would you consider becoming a robot"; var newSentence = String.Join("",sentence.Split(" ").Where(w => !fillers.Contains(w)).ToArray()); // newSentence is "Wouldconsiderbecomingrobot". 

The Pascal cover of each word will lead to your desired line - I will leave this as an exercise for the reader.

+4
source

Create a popular social networking site. When users want to join or post comments, invite them to solve the captcha. CAPTCHA will consist of matching your shortened versions of long lines with their full versions. Your contraction algorithm will be based on a neural network or genetic algorithm, which will be prepared according to the results of the download.

You can also sell ads on the website.

+1
source

I do not think that there is any algorithm that can determine if each word of a string is a noun, adjective or any other. The only solution would be to use a custom dictionary: just create a list of words that cannot be identified as verbs or nouns (me, you, them, them, him, her, from, a, etc.).

Then you just need to save all the words in front of the question mark, which are not on the list.

This is just a workaround, and I said that it is not perfect.

Hope this helps!

0
source

Welcome to the wonderful world of natural language processing . If you want to identify nouns and verbs, you will need a part of the speech tag .

0
source

All Articles