I am writing a small webapp in golang and it includes analyzing the file uploaded by the user. I would like to automatically detect if the file is gzipped or not, and create readers / scanners accordingly. One twist is that I cannot read the entire file in memory, I can only work with the stream. Here is what I have:
func scannerFromFile(reader io.Reader) (*bufio.Scanner, error) { var scanner *bufio.Scanner //create a bufio.Reader so we can 'peek' at the first few bytes bReader := bufio.NewReader(reader) testBytes, err := bReader.Peek(64) //read a few bytes without consuming if err != nil { return nil, err } //Detect if the content is gzipped contentType := http.DetectContentType(testBytes) //If we detect gzip, then make a gzip reader, then wrap it in a scanner if strings.Contains(contentType, "x-gzip") { gzipReader, err := gzip.NewReader(bReader) if (err != nil) { return nil, err } scanner = bufio.NewScanner(gzipReader) } else { //Not gzipped, just make a scanner based on the reader scanner = bufio.NewScanner(bReader) } return scanner, nil }
This works fine for plain text, but for gzipped data it inflates incorrectly, and after a few kilobytes I inevitably get garbled text. Is there an easier way? Any ideas why, after several thousand lines, this uncompression is wrong?
source share