How to prevent “partial write” data corruption during power loss?

In the embedded environment (using MSP430), I saw some data corruption caused by partial writing to non-volatile memory. This is apparently caused by power loss during recording (for FRAM or data segments).

I check the data stored in these places using CRC.

My question is: what is the correct way to prevent this "partial recording"? I have currently modified my code to write to two separate FRAM locations. Thus, if one record is interrupted, causing an invalid CRC, the other location must remain valid. Is this a common practice? Do I need to implement this double-write behavior for any non-volatile memory?

+7
memory embedded msp430
source share
4 answers

A simple solution is to support two versions of the data (on separate pages for flash memory), the current version and the previous version. Each version has a header containing a sequence number and a word that checks the sequence number - just 1 addition of a sequence number, for example:

--------- | seq | --------- | ~seq | --------- | | | data | | | --------- 

Most importantly, when writing data, the words seq and ~seq written last.

At startup, you read data that has the highest permissible sequence number (for example, to account for short sequence words). When you write data, you overwrite and check the oldest block.

The solution you are already using is valid as long as the CRC is written last, but it lacks simplicity and imposes an overhead on the calculation of the CRC, which may be undesirable or desirable.

In FRAM, you're not worried about endurance, but it's a problem for flash memory and EEPROM. In this case, I use the write-back caching method, where the data is stored in RAM, and when the timer changes, it starts or restarts if it is already running - when the timer expires, the data is written - this prevents bursts from erasing memory and is useful even on FRAM. because it minimizes the software overhead of data recording.

+6
source share

Our engineering team uses two approaches to this problem: solve it in hardware and software!

First of all, the diode and capacitor arrangement provide several milliseconds of power during burnout. If we notice that we have lost external power, we do not allow the code to enter any entries that are not broken.

Secondly, our data is especially important for work, it is often updated, and we do not want to wear out our unrelated flash storage (it only supports so many records.), So we actually store data 16 times in flash and protect each record with CRC code. At boot, we find the latest valid record and then begin our erase / write cycles.

We have never seen data corruption since the introduction of our frankly paranoid system.

Update:

I should note that our flash is external to our processor, so CRC helps to check data if there is a communication failure between the processor and the flash chip. In addition, if we encounter multiple failures in a row, multiple recording protects against data loss.

+5
source share

We used something similar to Clifford's answer, but it was written in one write operation. You need two copies of the data and alternate between them. Use an incremental serial number so that in fact one location has even serial numbers, and one is odd.

Write data like this (in one write command, if you can):

 --------- | seq | --------- | | | data | | | --------- | seq | --------- 

When you read it, make sure that both serial numbers match - if they are not, the data is invalid. At startup, read both locations and find out which one is later (given the sequence overflow).

+4
source share

Always save the data in some protocol, for example START_BYTE, Total bytes to write, data, END BYTE. Before recording to external / internal memory, always check the POWER / Monitor POWER registers. if in any case the data is corrupted, the END byte will also be corrupted. Thus, the record will not be verified after checking the entire protocol. a checksum is not a good idea, you can choose CRC16 instead if you want to include CRC in your protocol.

0
source share

All Articles