How to read from gzip or read plain text in golang?

I am writing a small webapp in golang and it includes analyzing the file uploaded by the user. I would like to automatically detect if the file is gzipped or not, and create readers / scanners accordingly. One twist is that I cannot read the entire file in memory, I can only work with the stream. Here is what I have:

func scannerFromFile(reader io.Reader) (*bufio.Scanner, error) { var scanner *bufio.Scanner //create a bufio.Reader so we can 'peek' at the first few bytes bReader := bufio.NewReader(reader) testBytes, err := bReader.Peek(64) //read a few bytes without consuming if err != nil { return nil, err } //Detect if the content is gzipped contentType := http.DetectContentType(testBytes) //If we detect gzip, then make a gzip reader, then wrap it in a scanner if strings.Contains(contentType, "x-gzip") { gzipReader, err := gzip.NewReader(bReader) if (err != nil) { return nil, err } scanner = bufio.NewScanner(gzipReader) } else { //Not gzipped, just make a scanner based on the reader scanner = bufio.NewScanner(bReader) } return scanner, nil } 

This works fine for plain text, but for gzipped data it inflates incorrectly, and after a few kilobytes I inevitably get garbled text. Is there an easier way? Any ideas why, after several thousand lines, this uncompression is wrong?

+5
source share
2 answers

You may find that the gziped file is checked if the first 2 bytes are 0x1f8b (I found this information here ).

In the comments, someone mentioned that you should check these bytes separately, so the first one is 0x1f and the second is 0x8b .

 testBytes, err := bReader.Peek(2) //read 2 bytes .... if testBytes[0] == 31 && testBytes[1] == 139 { //gzip }else{ ... } 

Hope this helps.

+3
source

Thanks to everyone - it turns out that twotwotwo and thundercat were correct, and the stream was corrupted in a spot not related to the code I posted. Oddly enough, it looks like it's due to writing an HTTP response while continuing to read from the request stream. I am still investigating this, but it seems that the original question was erroneous.

0
source

Source: https://habr.com/ru/post/1212571/


All Articles