You can usually output data to UART and record or display it in a terminal emulator, such as TeraTerm (or HyperTerminal, if necessary).
If you implement a ring buffer and an ISR for UART filing, this will have minimal impact on system behavior at runtime unless you exceed port throughput for extended periods of time. It probably has a lower system impact and is more deterministic than writing to EEPROM or Flash, especially if the UART has FIFO or DMA capabilities, and although bandwidth can be limited, it has the advantage of almost unlimited capacity.
Your chip may have built-in debugging features that can be associated with a host debugger with arbitrary debugging output or the possibility of semi-hosting. This will have minimal impact at runtime.
Clifford
source share