The "head" command for aws s3 to view the contents of a file

On Linux, we usually use the head / tail commands to preview the contents of a file. This helps to view part of the file (to check the format), rather than opening the entire file.

In the case of Amazon S3, it seems that there are only ls, cp, mv, etc. Commands I wanted to know if it is possible to view part of a file without downloading the entire file on my local computer using cp / GET.

+17
unix amazon-s3 amazon-web-services
source share
7 answers

You can specify a range of bytes when retrieving data from S3 to get the first N bytes, the last N bytes, or whatever is in between. (This is also useful because it allows you to upload files in parallel - just run multiple threads or processes, each of which extracts a portion of the shared file.)

I don't know which of the CLI tools support this directly, but finding a range does what you want.

The AWI CLI tools ("aws s3 cp" to be precise) do not allow you to do a range search, but s3curl ( http://aws.amazon.com/code/128 ) should do the trick. (Thus, the usual twisting, for example, using the --range option, but then you have to subscribe to the request yourself.)

+6
source share

One thing you can do is cp the object for fat, and then pass it to the head:

aws s3 cp s3://path/to/my/object - | head 

At the end, you get an interrupt error message, but it works.

+46
source share

You can use the range switch for the older s3api get-object to return the first bytes of the s3 object. (AFAICT s3 does not support the switch.)

The pipe \dev\stdout can be passed as the target file name if you just want to view the S3 object through the pipeline to head . Here is an example:

aws s3api get-object --bucket mybucket_name --key path/to/the/file.log --range bytes=0-10000 /dev/stdout | head

Finally, if you, like me, are dealing with compressed .gz files, the above method also works with zless , which allows you to view the header of the unzipped file:

aws s3api get-object --bucket mybucket_name --key path/to/the/file.log.gz --range bytes=0-10000 /dev/stdout | zless

One tip with zless : if it doesn't work, try increasing the size of the range.

+8
source share

If you do not want to download the entire file, you can download part of it with the --range option specified in aws s3api and after loading part of the file, then run the head command in this file.

Example:

 aws s3api get-object --bucket my_s3_bucket --key s3_folder/file.txt --range bytes=0-1000000 tmp_file.txt && head tmp_file.txt 

Explanation:

aws s3api get-object downloads part of the s3 file from the specified bucket and s3 folder with the specified size in --range to the specified output file. && executes the second command only if the first is successful. The second command prints the first 10 lines of a previously created output file.

+4
source share

There is no such possibility. You can get only the whole object. You can execute the HTTP HEAD request to view the metadata of the object, but this is not what you are looking for.

0
source share

One easy way to do this:

 aws s3api get-object --bucket bucket_name --key path/to/file.txt --range bytes=0-10000 /path/to/local/t3.txt | cat t3 | head -100 

For gz file you can do

 aws s3api get-object --bucket bucket_name --key path/to/file.gz --range bytes=0-10000 /path/to/local/t3 | zless t3 | head -100 

If there is less data, increase the number of bytes required.

0
source share

If you use s3cmd, you can use s3cmd get and write for stdout and direct it to head as follows:

 s3cmd get s3://bucket/file | head 

If you want to view gzip -d - gzip file, gzip -d - in gzip -d - and in gzip -d - :

 s3cmd get s3://bucket/file | gzip -d - | head 

If you get tired of this business, add the following script to your ~/.bashrc

 function s3head() { NUM_LINES=10 while test $# -gt 0; do case $1 in -h|--help) echo "s3head [-n NUM] <S3_FILE_PATH>" return ;; -n) shift if test $# -gt 0; then export NUM_LINES=$1 else echo "Number of lines not specified" return fi shift ;; *) break ;; esac done if [[ -z "$1" ]]; then echo "S3 file path is not specified" return fi s3cmd get $1 - | zcat -f | head -n $NUM_LINES } 

Now ~/.bashrc file ~/.bashrc . Just by running s3head s3://bucket/file , you will get the first 10 lines of your file. If you want more lines, just specify -n and the number of lines as follows:

 # Prints the first 14 lines of s3://bucket/file s3head -n 14 s3://bucket/file 
0
source share

All Articles