What is the best way to identify duplicate credit card numbers without storing them?

I run a website on which we mark certain accounts as scammers, and "mark" their account and all credit cards used as bad. We do not store actual credit card values, but instead store the checksum algorithm / MD 5.

We face collisions all the time. What is the best way to store these values ​​- not reversible, but able to match future values.

I thought MD5 would be the best, but we have a debate here ...

+6
passwords encryption md5 credit-card
source share
11 answers

A cryptographically secure hash will work. (SHA512 or SHA256 will be ok)

However, I would use a rather secret salt that is not stored with the cards (to prevent any attack from the rainbow table).

PS:
Rainbow table attacks against credit cards can be especially effective as the overall text space is quite small due to the limited character set, fixed size and check digits.

PPS:
You cannot use random salt for each record because you can never really check for duplicates. Salts are used to prevent collisions, while we specifically look for collisions in this case.

+15
source share

It is not safe enough to use a good hash algorithm. If your list is stolen, your saved hashes can be used to get information about the work card. The actual schema space for credit card numbers is small enough for a specific attacker to pre-calculate many of the possible hashes in advance, and this may have other consequences for your system if there is intrusion or internal work.

I recommend that you use the salt and also calculate the second value that you want to add to the salt, based on a formula that includes each digit of the card number and the first salt value. This ensures that if you lose control of any part, you still have reasonable uniqueness that makes ownership of the list useless. The formula should not be heavily weighted in relation to the first 6 digits of the card (BIN number), although no trace of the formula should be stored in the same place as the salt or the final hash.

Consider the anatomy of a 16-digit credit card number:

6-digit BIN (bank identification number)
9-digit account number
1 checksum Moon

BIN lists are well known in the manufacturing industry and not so difficult to compile for those who have access to an illegal list of card numbers. The number of valid BINs is further reduced by the assigned space for each issuer.

Visa - starts at 4
American Express - Starts at 34/37
MasterCard - starts at 5
Discover / CUP - starts at 6
Diner Club - starts at 35
and etc.

Please note that some of the assigned BIN information in each issuer category is also sparse. If an attacker knows where most of your customers are located, this will significantly reduce uniqueness, since BIN information is assigned based on the bank. An attacker who already has an account issued by a small bank in a rich area can simply get an account and use BIN as a starting point on his own card.

The checksum digits are calculated using a well-known formula, so it is immediately discarded as a source of unique data.

Armed with several BIN target centers, the attacker must check 9 digits for each BIN set. This is 1 billion checksums and hash operations for each set. I have no guidelines, but I'm sure that 1 million Hash operations per minute is not unreasonable for MD5 or any SHA flavor on a sufficiently powerful machine. It takes less than one day to crack all matches under a given BIN.

Finally, you might consider saving a timestamp or visitor token (IP / subnet) with your hashes. It's nice to catch duplicate card numbers, but also consider the consequences of someone filling your system with dummy numbers. At some point, you need to make a decision on the compromise between the numbers of blocking cards, which, as you know, are invalid, and also give yourself a mechanism for identifying and eliminating improper use.

For example, a disgruntled employee can steal card information on his own, and then use his hash mechanism against you by inserting valid hashes into the black list of the card number to block repeated business. It is quite expensive to cancel if you just store the hash - everything is opaque when it was converted to a hash. With that in mind, give yourself a way to identify the source of the hash.

+4
source share

Perhaps you can save two different hashes of the card number. The probability that both hashes will lead to collisions is almost zero.

+4
source share

Use SHA1, hash collisions not yet found.

+3
source share

People who indicate that the hash is “broken” do not make sense, possibly spewing something that they heard, not understanding what it means. When people say that hashes are “broken”, they usually mean that it is easy to generate an alternative payload that calculates the same hash.

This "breaks" the hash, but only for the specific purpose of using a hash to validate the data is what it should be.

This is not important here, that is, someone manages to create an alternative data stream that occurs with a hash to the same value as one of the credit cards, has not achieved anything significant or useful from the point of view of the attack vector.

The risk with hashes here is that the problem space for credit card numbers is rather low, and the rainbow tables for them will be quite cheap and easily generated.

Adding salt will add a little protection against rainbow tables already formed for clean card numbers, but the extent to which it offers any real protection depends on how secret the salt will remain if you are at risk. If salt is open, then new rainbow tables can be generated cheaply and all that.

Given that salt should be available for the application to perform checks against the blacklist, there is a good chance that someone compromising the blacklist data can also get to the salt. If you have several servers, you can mitigate this to some extent by ensuring that salt and data are not in the same place, so exposing one server will not give anyone all the details they need. (Likewise, data and salt are not stored for backup on the same medium where someone can leave with one tape and get everything). Salt only adds some protection while it is secret (used in this type).

If you have the resources to do it safely, I think this is the route. If you get a significant amount of collisions with any reasonable hash function, you have to do many volumes. (Actually, I am very surprised that a collision will be a problem even then, any reasonable hash function should provide different results in a small problem space like this).

+3
source share

If you find collisions with MD5, why not use a better algorithm like SHA1 or SHA256 ?

+2
source share

MD5 SHOULD NOT go, as it is broken. Reply Bruce Schneier: "[w] e already knew that MD5 is a broken hash function" and that "no one should use MD5 anymore."

those. use SHA512 or SHA256, as someone already suggested.

+2
source share

As Henry (+1) already mentioned, the correct solution is to use a message authentication code, such as HMAC, with a secret key. This is exactly the "secret salt" that was mentioned earlier. (BTW. Salts are always open).

Use a standard design such as HMAC-SHA-256 (RFC2104, FIPS-198a), keep the key secret and save the results (authentication tags) in the database.

A larger digest size (256 bit) SHA-256 should prevent any collisions, SHA-256 is a pretty good hash function, and the chance of accidental collisions is 2 ^ -128, so if you encounter a collision in your system, please give me know!:)

+2
source share

As others have said, HMAC should be a way.

An HMAC-SHA-256 with the appropriate key should:

  • Avoid collisions.
  • Avoid getting a credit card number from a saved value.
  • Prevent execution by the same attacker calculation (on all possible credit card numbers to find the appropriate value).

But there is another very important thing:

Sincerely, you do not store credit card numbers. Even if you can be 100% sure that you are using the right encryption, you probably will not store credit card numbers anyway. What for? Firstly, because the key may be leaked.

This way you store hashes so that your credit card number cannot be recovered .... Right?

Well, if you use a simple hash, a simple rainbow table with hashes of all possible credit card numbers gives all the source data that you apparently did not store. Unfortunately. But you already knew that.

So, we try to do better. Let's say using individual salts is better, and using HMAC is the best approach we know.

Consider the following scenario:

  • Take the 16-digit card number.
  • The first 6 digits (bank identification number) are guessed by using several common BINs.
  • The last 4 digits are displayed in the masked card number, which you can save. (You may not have saved this, which helps.)
  • 1 rank is calculated (Luhn).

This results in 5 digits being rude. This is a meager 100,000 attempts.

If we used individual salts, the game is over. We can simply transfer each individual card number to an average of 50,000 attempts.

If we used HMAC, we were safe. But remember ... we prefer not to store the encrypted card numbers, because even with perfect encryption, the key may be leaked. Guess what. The HMAC key may be skipped the same way. Using the key, again, we can adjust each card number on average for 50,000 attempts. Thus, the leaked key gives us credit card numbers, just as if we saved the numbers of encrypted cards.

Thus, due to the low entropy of credit card numbers, storing hashes does not increase significant security compared to encrypted values (but PCI limits the key rotation requirements for encryption).

A bit of perspective:

Well, we assume that there is a missing key here. Extreme But then again, just like PCI, part of their arguments forbids you to store credit card numbers, so we should at least consider them.

True, I did not take into account many guesses to find the BIN. However, this should be a small constant. Or we can limit ourselves to one BIN.

Definitely a PCI auditor may be more forgiving than me.

Yes, if you do not store the masked card number, you are 10'000 points safer. It helps a lot. Use it to your advantage. However, if 50K attempts are feasible, 500M may also be feasible. This is not enough to make me consider the data safe in the context of a compromised key.

Output:

Use the HMAC-SHA-256. Understand the risk. Keep as little as possible. Protect your keys vigilantly. Spend a fortune on a hardware security module :-)

+2
source share

Using the strongest hash is usually good. Speed ​​is not the essence, and slowness actually works against anyone who is trying to change the brute forces of your hashed values.

I like the jacuzzi, personally - if you use PHP, check the supported algorithms for hash functions docs

Whirlpool returns a string of 128 characters, but you do not need to store it. The first 32 or 64 characters would be enough. You can also consider sha512 or sha284.

+1
source share

Do not worry about salts, just use HMAC. I know this is an abuse, but then you get a decent hash key so that you can prevent collisions and attacks of rainbow tables.

It's nice that even if the key is leaking, no one can decrypt it. The best thing that works for HMAC is brute force. Actually, the key here is salt, as mentioned earlier. It's nice that the algorithm is slightly better than a regular tanning bed made by most non-security programmers.

+1
source share

All Articles