I have a while loop that that reads in a mail log file and puts it into an array so I'll be able to search through the array and match up/search for a flow. Unfortunately, the while loop is taking a long time to get through the file, it is a very large file but there must be another faster way of doing this.

cat /home/maillog |grep "Nov 13" |grep "from=<xxxx@xxxx.com>" |awk '{print $6}' > /home/output_1 while read line; do awk -v line="$line" '$6 ~ line { print $0 }' /home/maillog >> /home/output_2 ; done < /home/output_1

Any ideas? Thank's in advance.

  • 2
    I think you can make all that bash only with awk. awk read each line of the file and let you filter, save variables, etc. I recently used awk in a file with more than 20k of rows and it went super fast. That was my first awk script, and I found this web very useful tutorialspoint.com/awk– Daniel RodríguezNov 19 at 8:58
  • 2
    You're starting a new instance of awk for each line, that must be slow.– chorobaNov 19 at 9:00
  • Do you really need /home/output_1 file ? avoiding the disk usage between the two calls can help to improve performance.– ÔrelNov 19 at 9:19
  • 1
    You read /home/maillog too many time, one per line into output_1 that is why is so slow. Rework it to read it only once or twice– ÔrelNov 19 at 9:36
  • 1
    can you share an example of input and output ?– ÔrelNov 19 at 9:43

Let us analyse your script and try to explain why it is slow.

Let's first start with a micro-optimization of your first line. It's not going to speed up things, but this is merely educational.

cat /home/maillog |grep "Nov 13" |grep "from=<xxxx@xxxx.com>" |awk '{print $6}' > /home/output_1 

In this line you make 4 calls to different binaries which in the end can be done by a single one. For readability, you could keep this line. However, here are two main points:

  1. Useless use of cat. The program cat is mainly used to concattenate files. If you just add a single file, then it is basically overkilling. Especially if you want to pass it to grep.

    cat file | grep ...=> grep ... file
  2. multiple greps in combination with awk ... can be written as a single awk

    awk '/Nov 13/ && /from=<xxxx@xxxx.com>/ {print $6}'

So the entire line can be written as:

awk '/Nov 13/ && /from=<xxxx@xxxx.com>/ {print $6}' /home/maillog > /home/output_1

The second part is where things get slow:

while read line; do awk -v line="$line" '$6 ~ line { print $0 }' /home/maillog >> /home/output_2 ;done < /home/output_1

Why is this slow? Per line you read form /home/output_1, you load the program awk into memory, you open the file /home/maillog, process every line of it and close the file /home/maillog. At the same time, per line you process, you open /home/output_2 every time, put the file pointer to the end of the file, write to the file and close the file again.

The whole program can actually be done with a single awk:

awk '(NR==FNR) && /Nov 13/ && /from=<xxxx@xxxx.com>/ {a[$6];next}($6 in a)' /home/maillog /home/maillog > /home/output2
  • oh man, you are a genius, thanks for taking your time explaining me. It's working well now.– Julián DíazNov 21 at 4:39

Your Answer

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Not the answer you're looking for? Browse other questions tagged or ask your own question.