How to split a file into equal parts without breaking separate lines?

I was wondering if it is possible to split the file into equal parts (edit: = all equal except for the last) without breaking the line? Using the split command on Unix strings can be split in half. Is there a way, say, to split a file into 5 equal parts, but it still only consists of whole lines (this is not a problem if one of the files is slightly larger or smaller)? I know that I can simply calculate the number of lines, but I have to do this for a large number of files in a bash script. Thank you very much!

+84
split unix bash shell
Oct. 14 '11 at 8:05
source share
6 answers

If you mean an equal number of lines, split has an option for this:

 split --lines=75 

If you need to know that this 75 should really be for N equal parts, its:

 lines_per_part = int(total_lines + N - 1) / N 

where shared strings can be obtained using wc -l .

See the following script for an example:

 #!/usr/bin/bash # Configuration stuff fspec=qq.c num_files=6 # Work out lines per file. total_lines=$(wc -l <${fspec}) ((lines_per_file = (total_lines + num_files - 1) / num_files)) # Split the actual file, maintaining lines. split --lines=${lines_per_file} ${fspec} xyzzy. # Debug information echo "Total lines = ${total_lines}" echo "Lines per file = ${lines_per_file}" wc -l xyzzy.* 

It is output:

 Total lines = 70 Lines per file = 12 12 xyzzy.aa 12 xyzzy.ab 12 xyzzy.ac 12 xyzzy.ad 12 xyzzy.ae 10 xyzzy.af 70 total 



Later versions of split allow you to specify a CHUNKS number with the -n/--number option. So you can use something like:

 split --number=l/6 ${fspec} xyzzy. 

(this is ell-slash-six , which means lines , not one-slash-six ).

This will give you roughly equal files in terms of size without middle line separators.

I mention this last point because it does not give you about the same number of lines in each file, more than the same number of characters.

So, if you have one 20-character line and 19 1-character lines (twenty lines in total) and are divided into five files, you most likely will not get four lines in each file.

+118
Oct 14 '11 at 8:10
source share

the script is not even required, split (1) supports the desired function from the box:
split -l 75 auth.log auth.log. The above command splits the file into pieces of 75 lines per piece and displays a file in the form: auth.log.aa, auth.log.ab, ...

wc -l in the source file and the output gives:

  321 auth.log 75 auth.log.aa 75 auth.log.ab 75 auth.log.ac 75 auth.log.ad 21 auth.log.ae 642 total 
+31
Apr 09 '13 at 9:56 on
source share
Separation

was updated in 8.8 coreutils (announced December 22, 2010) with the -number option to generate a certain number of files. The --number = l / n parameter generates n files without dividing lines.

http://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html#split-invocation http://savannah.gnu.org/forum/forum.php?forum_id=6662

+15
Jul 14 '14 at 19:26
source share

I made a bash script that specified several parts as input, split the file

 #!/bin/sh parts_total="$2"; input="$1"; parts=$((parts_total)) for i in $(seq 0 $((parts_total-2))); do lines=$(wc -l "$input" | cut -f 1 -d" ") #n is rounded, 1.3 to 2, 1.6 to 2, 1 to 1 n=$(awk -v lines=$lines -v parts=$parts 'BEGIN { n = lines/parts; rounded = sprintf("%.0f", n); if(n>rounded){ print rounded + 1; }else{ print rounded; } }'); head -$n "$input" > split${i} tail -$((lines-n)) "$input" > .tmp${i} input=".tmp${i}" parts=$((parts-1)); done mv .tmp$((parts_total-2)) split$((parts_total-1)) rm .tmp* 

I used the head and tail commands and saved in tmp files to split the files

 #10 means 10 parts sh mysplitXparts.sh input_file 10 

or with awk, where 0.1 is 10% => 10 parts, or 0.334 is 3 parts

 awk -v size=$(wc -l < input) -v perc=0.1 '{ nfile = int(NR/(size*perc)); if(nfile >= 1/perc){ nfile--; } print > "split_"nfile }' input 
+4
May 10 '15 at 17:10
source share

A simple solution for a simple question:

 split -nl/5 your_file.txt 

no need for scripting here.

From the man file, CHUNKS may be:

 l/N split into N files without splitting lines 
+3
Aug 16 '17 at 3:54 on
source share
 var dict = File.ReadLines("test.txt") .Where(line => !string.IsNullOrWhitespace(line)) .Select(line => line.Split(new char[] { '=' }, 2, 0)) .ToDictionary(parts => parts[0], parts => parts[1]); or enter code here line="to=xxx@gmail.com=yyy@yahoo.co.in"; string[] tokens = line.Split(new char[] { '=' }, 2, 0); ans: tokens[0]=to token[1]=xxx@gmail.com=yyy@yahoo.co.in" 
+1
Oct. 20 '14 at 12:08
source share



All Articles