Understanding Bloodhound.tokenizers.obj.whitespace

Question

Understanding Bloodhound.tokenizers.obj.whitespace

That's it, I tried to apply Twitter typeahead and Bloodhound to my project based on some working sample, but I can not understand below code.

 datumTokenizer: Bloodhound.tokenizers.obj.whitespace('songs'), queryTokenizer: Bloodhound.tokenizers.whitespace,

The original code is as follows.

 var songlist = new Bloodhound({ datumTokenizer: Bloodhound.tokenizers.obj.whitespace('songs'), queryTokenizer: Bloodhound.tokenizers.whitespace, limit: 10, remote: '/api/demo/GetSongs?searchTterm=%QUERY' });

An official document just said:

datumTokenizer is a function with a signature (datum) that converts binding to an array of string tokens. Required.
queryTokenizer is a function with a signature (request) that converts the request into an array of string tokens. Required.

What does it mean? Can someone help me tell me more about this so that I better understand?

+8

jquery twitter twitter-typeahead bloodhound

Joe.wang Oct 28 '15 at 2:16

source share

2 answers

davew · Answer 1 · 2017-04-20T18:16:35+0000

I found some useful info here:

https://github.com/twitter/typeahead.js/blob/master/doc/migration/0.10.0.md#tokenization-methods-must-be-provided

The most common tokenization methods divide a given string into spaces or non-words. Bloodhound provides an implementation of these methods out of the box:
  // returns ['one', 'two', 'twenty-five'] Bloodhound.tokenizers.whitespace(' one two twenty-five'); // returns ['one', 'two', 'twenty', 'five'] Bloodhound.tokenizers.nonword(' one two twenty-five'); 
To tokenize a request, you probably want to use one of the above methods. To indicate data tokenization, you can do something more complex.
For data, sometimes you want tokens to be obscured from several properties. For example, if you were building a search engine for GitHub repositories, it would probably be wise to have tokens derived from the name, owner, and main language of the repo:
  var repos = [ { name: 'example', owner: 'John Doe', language: 'JavaScript' }, { name: 'another example', owner: 'Joe Doe', language: 'Scala' } ]; function customTokenizer(datum) { var nameTokens = Bloodhound.tokenizers.whitespace(datum.name); var ownerTokens = Bloodhound.tokenizers.whitespace(datum.owner); var languageTokens = Bloodhound.tokenizers.whitespace(datum.language); return nameTokens.concat(ownerTokens).concat(languageTokens); } 
There may also be a scenario in which you want to tokenize the binding to the backend. The best way to do this is to simply add the property to your data containing these tokens. Then you can provide a tokenizer that simply returns existing tokens:
  var sports = [ { value: 'football', tokens: ['football', 'pigskin'] }, { value: 'basketball', tokens: ['basketball', 'bball'] } ]; function customTokenizer(datum) { return datum.tokens; } 
There are many other ways you can use data tokenization, it really depends on what you are trying to do.

Unfortunately, it seems that this information is not easy to find from the main documentation.

sunix · Answer 2 · 2016-04-13T14:39:13+0000

This is a token used to split data or queries into an array of words to perform a search / match. datumTokenizer refers to your data, and queryTokenizer refers to the request made (usually the text is entered at the input).

If your data is an array of objects (for example, json), datumTokenizer allows you to specify in which field of your object you want to search. For example, if you want to search by name and code fields, you can enter something like Bloodhound.tokenizers.obj.whitespace(['name','code']) or provide a custom function.

Further information can be found at: https://github.com/twitter/typeahead.js/blob/master/doc/migration/0.10.0.md#tokenization-methods-must-be-provided

Understanding Bloodhound.tokenizers.obj.whitespace

More articles: