John1024 explained how to do it right; I would like to see why the original version is not working. The main problem is that for loops over words, not lines. A file has two words on each line (file name and counter), so it starts a loop twice in a line. To see this, try:
for line in `hadoop fs -cat sample.txt` do echo "$line" done
... and it will print something like:
2015-03-04.01.Abhi_Ram.txt 10 2015-03-04.02.Abhi_Ram.txt 70
... that's not what you want at all. It also has some other nasty quirks, for example, if the input file contains the word "*", it inserts a list of file names into the current directory.
The while read ... done <file approach is the right way to iterate through the lines in a shell script. It just happens that you can also split each line into fields without having to bind to awk (in this case read filename count does this).
source share