-
Notifications
You must be signed in to change notification settings - Fork 1
Viewing and modifying large text files (head, tail, cat and grep)
To see what fastq files look like, we can open part of them in a Text Editor. As fastq files are quite big, we can open just a part of the file using the editors less or nano.
Start by downloading a sample fastq file like this:
[pierre@albiorix ~]$ wget --no-check-certificate https://github.com/DeWitP/Bioinformatic_Pipelines/raw/master/RNA-Seq_materials/data/FR32_ATCACG_before.fastq.gzThen unzip the compressed file with the program gzip by typing:
[pierre@albiorix ~]$ gunzip FR32_ATCACG_before.fastq.gzUse less to view the fastq file. You can move down through the file with space, and quit back to the Terminal with q.
You can also view the top or bottom 10 lines of a file with the commands head and tail.
Use tail to view the bottom 10 lines of the fastq file you just downloaded.
Now, use the manual page for tail to find out how to view more than 10 lines, and use tail again to view the bottom 50 lines of the fastq file.
This prints out the lines to the screen. However, we can redirect the output to a file with the > symbol. Like this: [print command] > file.txt. The double angle bracket, >>, does the same thing but appends information to a file rather than overwriting it.
Use the > to print the last 50 lines of the fastq file to a new file, called tail50.txt
Open the new file in less with less tail50.txt
One of the most useful commands in the Terminal is grep. This command searches through a text for lines matching a given argument and prints out only the lines that contain the argument. Try using grep to pull out any read that contains the sequence ATTCC from the original fastq file:
[pierre@albiorix ~]$ grep "ATTCC" FR32_ATCACG_before.fastqUsed in combination with other commands, grep can do almost anything. To combine commands, we can use the “pipe”, |. It takes the output from one command and feeds it into another command.
Try using tail in combination with grep like this:
[pierre@albiorix ~]$ tail –n 50 FR32_ATCACG_before.fastq | grep "@"This will print only the lines containing an @ symbol, i.e. the identifier lines, out of the last 50 lines of the file.
The command cat prints a whole text file or combines several files into one and prints to the screen. Using the –b flag, you can also count lines.
Use cat –b to find out how many reads there are in one of the fastq files.