UPDATE: As the triptych pointed out, the Aho-Corasick algorithm seems very relevant for antivirus scanners. Here's what you need to read:
http://www.dais.unive.it/~calpar/AA07-08/aho-corasick.pdf
http://www.researchgate.net/publication/4276168_Generalized_Aho-Corasick_Algorithm_for_Signature_Based_Anti-Virus_Applications/file/d912f50bd440de76b0.pdf
http://jason.spashett.com/av/index.htm
Aho-Corasick-like algorithm for use in antivirus code
Below is my old answer. Its still relevant for the simple detection of malware, such as worms, that simply make copies of themselves:
I will just write some of my thoughts on how AVs can work. I do not know for sure. If someone considers incorrect information, please let me know.
There are many ways to detect AV threats of possible threats. One way is based on the discovery signature.
A signature is just a unique fingerprint of a file (it's just a sequence of bytes). As for computer science, it can be called a hash . One hash can take about 4/8/16 bytes. Assuming a size of 4 bytes (e.g. CRC32), about 67 million signatures can be stored in 256 MB .
All these hashes can be stored in the signature database. This database can be implemented using a balanced tree structure, so insert, delete and search operations can be performed in O(logn) time, which is quite fast even with large values โโof n (n is the number of records). Or, if a lot of memory is available, you can use a hash table that gives O(1) insert, delete and search. It can be faster when n grows larger and a good hashing method is used.
So, what does the antivirus do, roughly speaking, it calculates the hash of the file or only its critical sections (where malicious injections are possible) and looks for its signature database in it. As explained above, the search is very fast, which allows you to scan a huge number of files in a short period of time. If it is found, the file is classified as malicious.
Similarly, the database can be quickly updated, since insertion and deletion are also fast.
You can read these pages to get a deeper understanding.
What happens faster, hash search or binary search?
https://security.stackexchange.com/questions/379/what-are-rainbow-tables-and-how-are-they-used