How to list the first or last 10 lines from a file without unpacking it in Linux

I have a .bz2 file. I want to list the first or last 10 lines without unpacking it, because it is too big. I tried head -10 or tail -10 , but I see gibberish. I also need to compare two compressed files to see if they are similar or not. how to achieve this without unpacking the files?

EDIT: similar products are identical (have the same content).

+4
source share
2 answers

Although bzip2 is a block-based compression algorithm, so theoretically you could just find the specific blocks you want to unpack, it would be difficult (for example, if the last ten lines that you ultimately want to see actually span two or more compressed blocks?).

To answer your next question, you can do it that actually decompresses the entire file, so it’s wasteful in a way, but it doesn’t try to store this file anywhere, so you don’t run the storage capacity questions:

 bzcat file.bz2 | head -10 bzcat file.bz2 | tail -10 

If your distribution does not include bzcat (which would be a little unusual in my experience), bzcat equivalent to bzip2 -d -c .

However, if your ultimate goal is to compare two compressed files (which could be compressed at different levels, and therefore comparing the actual compressed files directly does not work), you can do this (assuming bash as your shell):

 cmp <(bzcat file1.bz2) <(bzcat file2.bz2) 

This will decompress both files and compare the uncompressed data byte by byte without saving any of the decompressed files anywhere.

+8
source

The normal standard bunzip2 command cannot do this. However, the man page says that bzip2 runs in 900 KB blocks and mentions bzip2recover , which is a program that can unpack individual blocks.

Using this knowledge, you can collect something that cuts off the first and last 900 KB (or so) from the desired file, and then uses bzip2recover to unpack them.

0
source

All Articles