Yuhang He's Blog

Some birds are not meant to be caged, their feathers are just too bright.

Shell script for everyday

Too often have we to deal with shell script, here I list some shell scripts that we would encounter everyday.

  • ls only list folders: ls -l | grep ^d;
  • ls only list files: ls -l | grep ^_;
  • wc summarize the character number of a file: wc -f file_name;
  • cut show a file with a predefined area: cut -b column file_name. For example, cut -b 5-9 file_name, it shows the file between column 5 and column 9 of all lines;
  • file show file type: file file_name;
  • convert resize a bunch of images:
1
2
3
for name in /path/*.jpg; do
    convert -resize 100x100 \! $name $name
done
  • wget download data
1
2
3
4
5
  		DIR = "$( cd "$(dirname "$0")" ;  pwd - P )"
  		cd $DIR
  		echo "Downloading ..."
  		wget --no-check-certificate source_link
  		echo "Done."

Note that DIR returns the current directory, wget should be installed in advance.

  • cat tr awk calculate each word frequency occurance number for a file.
1
2
   cat file.txt | tr -d ',.:!"-{}[]' | tr -s '\n' ' ' >tmpFile
  	   awk 'BEGIN{RS=" "} {++w[$0]} END{for(a in w) if(a!="") print a": "w[a]}' tmpFile | sort >result.txt

The first row is to transfer the original file to a tmpFile by deleting punctutation mark (.:!"."{}[] etc.) and substituting '\n' for spacing. Then,we can use awk to calculate each word’s frequency occurance and store them in result.txt.

  • find cat merge N files into one file: find . -name "*.txt" -exec 'cat' {} \; > test.tmp.
  • wget and xargs download file in parallel hebaviour: cat file_name | xargs -n 1 -P 50 wget
    -n 1 indicates giving one url to wget to download one time, -P 50 indicates maximum 20 wget in parallel hehaviour.
  • while, read to read one line of a file for each time.
1
2
3
4
cat file_name | while read line_tmp
do
   xxx
done

This is useful to process each line separately. Note that cat alone would automatically split the input with \n \t ' ', etc. * cat, grep to decide whether the a file contains a special string:

1
2
3
if [ "`cat $file_name | grep -c $img`" != 0 ]; then
   echo 'exist'
fi
  • find perl to achieve pattern search and substitute for multiple files at the same time:
    1. only for pattern in current directory
      perl -i -pe "s:pattern_to_subs:pattern_subs/:;" *.txt
    2. pattern in both current and sub- directories
      find . -name "*.txt" -print | xargs perl -pi -e "s:pattern_to_subs:pattern_subs:ig" *.txt
    3. only work for ordinary files
      find -type f -name "*.txt" -print0 | xargs --nulll perl -pi -e "s:pattern_to_subs:pattern_subs:"
  • tar compress file with extra time information:
1
  tar zcvf file_dir-$(date +%Y%m%d-%H%M).tar.gz file_dir/
  • ls -1 | shuf -n 1 gives a random directory.
  • cal | grep -E --color "\bdate +%e\b|$" show the calendar with “today” being highlighted.
  • awk '!($0 in a) {a[$0];print}' file shows lines with repeated lines being stripped.