For a published book, this pretty awful code! (Here you can download all the examples for the book, the corresponding file chapter4/nn.py )
- No docstring. What should this function do? From his name it can be assumed that he generates one of the nodes in the "hidden layer" of the neural network, but what role do
wordids and urls ? - The database query uses string substitution and is therefore vulnerable to SQL injection attacks (especially since it is related to web search, therefore
wordids probably come from a user query and therefore may be unreliable, but maybe they are identifiers , not words, so in practice itโs normal, but still a very bad habit to join). - Do not use the expressive power of the database: if all you want to do is determine if the key exists in the database, then you probably want to use
SELECT EXISTS(...) rather than asking the database to send you a bunch of records that you will then ignore. - The function does nothing if there was already a record with
createkey . There is no mistake. It's right? Who can say? - The weight expression for words is scaled to the number of words, but the weight value for URLs is constant
0.1 (maybe there are always 10 URLs, but it would be better to scale len(urls) here).
I could go on and on, but itโs better not to.
In any case, to answer your question, it looks like this function adds a database entry for node in a hidden layer of a neural network . This neural network has, I think, words in the input layer and URLs at the output level. The idea of โโthe application is to try to train the neural network to find good search results (URLs) based on the words in the query. See trainquery function, which takes arguments (wordids, urlids, selectedurl) . Presumably (since there is no docking I have to guess) the wordids were the words the user was looking for, urlids are the URLs that the search engine suggested to the user, and selectedurl is the one that the user selected. The idea is to train the neural network in order to better predict which users will select URLs and therefore place these URLs in future search results.
Thus, a mysterious line of code prevents the creation of nodes in a hidden layer with links to more than three nodes in the input layer. In the context of a search application, this makes sense: there is no point in training the network for too specialized queries, because these queries will not be repeated often enough for training to be worth it.
source share