Assuming IPv4 addresses, a search space of 2 32 exists. Each IP address requires no more than 1 bit (0 == no visit, 1 == visit). Excluding storage overhead, storage would require 512 MB (2 29 ). Thus, for simplified implementation, an array of 512 MB (or a table with 2 29 ) will be allocated, each of which stores bytes or 2 rows 27 each of which stores a 32-bit integer, or 2 26 each of which stores a 64-bit integer number or ...)
You can optimize this for a sparse population by turning it into a tree.
Define the "page" size to 2 x . You will allocate storage for one page at a time.
Divide the search space (2 32 ) by the size of your page. This is the total number of pages needed to represent all possible addresses in your search space.
Then, to determine if there is a hit in your hash, you first determine if the page is present, and if so, whether the corresponding bit is set on the page. To cache an address, you first determine if the page is present; if not, you will create it. Then you set the corresponding bit.
This form is fairly easy to apply to a database table. You will need only two columns: the page index and the binary array. When you select a page, you simply save the row in the table with the correct page index and an empty binary array.
For example, for a 1024-bit page size (with maximum pages < 22 ), you can structure the table as follows:
CREATE TABLE VisitedIPs( PageIndex int NOT NULL PRIMARY KEY, PageData binary(128) NOT NULL )
To check if you visited the IP address, you would use a code similar to (pseudocode):
uint ip = address.To32Bit(); string sql = "SELECT PageData " + "FROM VisitedIPs " + "WHERE PageIndex = " + (ip >> 10); byte[] page = (byte[])GetFromDB(sql); byte b = page[(ip & 0x3FF) >> 3]; bool hasVisited = (b & (1 << (ip & 7)) != 0;
The setting that the IP address visited is similar:
uint ip = address.To32Bit(); string sql = "SELECT PageData " + "FROM VisitedIPs " + "WHERE PageIndex = " + (ip >> 10); byte[] page = (byte[])GetFromDB(sql); page[(ip & 0x3FF) >> 3] |= (1 << (ip & 7)); sql = "UPDATE VisitedIPs " + "SET PageData = @pageData " + "WHERE PageIndex = " + (ip >> 10); ExecSQL(sql, new SqlParam("@pageData", page));