Choosing a unique identifier in C for the embedded application

Question

Choosing a unique identifier in C for the embedded application

I'm currently trying to implement an algorithm to select a unique (16-bit) identifier. The challenge is to do this quickly, without using too much memory. The list of identifiers currently in use is determined by scanning an external flash device through an SPI sequence and, therefore, is a relatively slow process. Also, the algorithm will work on a micro-controller with a small size, so I can’t just just read all the entries in RAM and process them there.

Thoughts I have had so far:

Select a number, then scroll through the list and see if it is in use. Rinse and repeat. Suffers from quite slow (especially if there are a lot of files).
As above, but select a number using the pseudo random number generator with the corresponding seed. This has the advantage that it is less likely to have so many iterations.
Scanning the list and filling the array with all the elements found. Sort it and then it becomes trivial. This can use a huge amount of memory.
Use a huge (well, ridiculously huge) bitmask. Actually, no practical.
Accept that the life of the tool is such that it will be cast or "formatted" long before it has typed 65534 files for Flash, so just keep the highest value used so far in Flash or backup memory and keep incrementing. Honestly, this is likely to work well for this particular application.

At the moment, I am inclined to use either the second or the fifth, but I would be interested to know if anyone has any other thoughts. I would like that there is an algorithm similar in form to CRC, which can be used to process each number in turn and give a fair idea of the amount that was not there but I do not know how this can work.

+4

c algorithm embedded

Dral May 13, '09 at 8:07

source share

12 answers

If there is not enough RAM to implement a raster image large enough for 64 KB recordings, the number of scans via FLASH to search for an unused identifier can be reduced by using a smaller temporary raster image for each scan. A 16-bit raster map can record found identifiers in the range 0-255 on the first pass, 256-511 on the second scan, etc. At the end of each scan, if the bitmap has at least one unmarked bit made. I believe that this will work well in conjunction with using a random starting range.

On the other hand, if I had high confidence in option 5, I could just go with that.

+4

Lance richardson May 13, '09 at 12:06

source share

I assume that the FLASH device is not removed due to the mention of SPI, but IIRC SD cards have SPI access mode, so this may not be true.

If FLASH is persistent, and you have a reliable, non-volatile place to remember the last issued identifier, then this is probably what needs to be done. It is definitely fast and low memory at runtime. This should be easy to explain, implement and test.

If FLASH is removable, then perhaps using a pseudo-random number generator and conflict testing. Assuming your numbers are well distributed, your chance of a collision is easily predicted from your total usage. Just select a generator with a decently long repetition interval. It might be a good idea to make fun of it in a simulation as an acceptance test for a selected algorithm.

+1

RBerteig May 13, '09 at 8:16

source share

I am wondering why you are not just saving the last identifier and increasing it. Is there a reason why you hesitate? You do not give an answer to your question, just a general concern.

If you need the identifier to be somewhat random for security reasons, use a random number generator and save the current values of the generator register in flash memory. Thus, you can load them for the next identifier, which ensures that you get the full loop length without repeating if you carefully choose your algorithm.

[EDIT] Since you are facing conflicts, there should be some data where the collision can occur, for example, in file names or some of these. If the obvious approach (creating a file name and checking if it exists) is too slow, and you have huge spaces in the "distribution map", then generate a random identifier and check this. This should allow you to find an unused identifier with just a few iterations.

+1

Aaron digulla May 13, '09 at 8:37

source share

Use the Maximum Linear Feedback Shift Register and save the last value you passed. LFSR will, given a specific starting point (not counting zero), give you all the numbers in the sequence 1..2 ^ n in pseudo-random order. If you start with the kth element, you always get the same k + 1st element. The implementation is tiny:

if (IsEven(sequence)) { sequence /= 2; } else { sequence = (sequence / 2) ^ feedback; }

where the feedback is a bit pattern from the maximum feedback table for the number of bits that you want to generate. This means that to generate the next number, you read the last issued number, run it through the code above, and then use it.

Alternatively, why don't you just count and save the last number?

+1

plinth May 13, '09 at 17:58

source share

I will go with 2. Be careful how you choose a generator and a seed. All pseudo-numerical sequences repeat themselves after a certain number of iterations. Therefore, you need to check that it does not happen too soon.

0

kgiannakakis May 13, '09 at 8:12

source share

I would try option 3. Instead of storing a sorted array of values, I would save a sorted array of ranges.

0

mouviciel May 13, '09 at 8:22

source share

How much RAM do you have? It's a little hard to say, “embedded” can mean so much these days. :) A bitmap will require 8192 bytes at the time of generation and guarantee perfect results every time.

I also looked at some kind of "sparse" bitmap, but I don’t know the appropriate data structure, but maybe worth exploring.

0

unwind May 13, '09 at 11:47

source share

Store the number of cycles in a sequential identifier.
Run the identifier through MD5.
Use the lowest 16 bits.

The theory is that MD5 creates a different hash for each input. The least significant 16 bits should be as "random" as the whole hash.

0

Robert May 13, '09 at 14:44

source share

If you could use larger identifiers then 5 would be no problem.

0

starblue May 13, '09 at 16:21

source share

About requesting your CRC algorithm ...

It looks like you are looking for an algorithm that will run in a random list of less than 16-bit numbers less than 64 KB in size and generate a new 16-bit number that is not yet in the list; it is preferable to do this in one go through this list.

If the sequence in which the identifiers are freed has nothing to do with the sequence in which they are allocated (for example, I feel that this is your case), you can do nothing with the generation or distribution of identifiers to get your algorithm.

The best bet then seems 5 from your list.

If you are adventurous ...

and if you can re-specify your identifiers (that is, change the highlighted identifier to another unallocated identifier), you can run a defrag iteration once in a while to move all highlighted identifiers to lower values and find the highest free identification number for next distribution. This will help to remember the total number of distributions and releases made since the last such “defragmentation”. Highlight increment sequentially, starting at 0.

Thus, you only need to remember 3 unsigned short integers in memory. And yes, take the slightly expensive iteration of redistribution once in a while, based on their values.

0

nik May 13, '09 at 17:13

source share

Another option is to save the file of valid identifiers to a flash drive. Thus, you do not request all the possibilities every time. If you need random identifiers, then randomize the file. Store the offset to the last as the first number in the file. When you need it, delete the last one from the file, and when you free it, add it back to the file. With an offset and flash drive, this should be an almost constant time, regardless of the number of identifiers remaining. As a bonus, the offset at the beginning will tell you how many identifiers you have left if you need to know that at any time. The disadvantage is that you need to access the flash memory for each identifier (permanent, but still access) and how to handle the case of an error if the file is missing.

0

Caleb Huitt - cjhuitt May 13, '09 at 17:47

source share

Kevin peterson · Accepted Answer · 2009-05-13T08:29:03+0000

I think you have some options here, but another one to consider is the Bloom Filter . This has the potential for false positives (i.e., you can exclude an identifier that has already been used, although it was not), but it can allow you to choose the exact amount of space that you can devote to this data.

Choosing a unique identifier in C for the embedded application

More articles: