Getting Started

Read (or online here):

info sed 'Introduction'

And also read the few paragraphs before the description of command line options (or online here):

info sed 'Invoking sed'

So, the (most basic) way to invoke Sed is:

sed SCRIPT INPUTFILE...

SCRIPT is some sort of Sed command (or commands). They can be created “on the fly” on the command line, or written in a text file and then passed to Sed. In this basic approach, however, let’s concentrate on the command line script only.

INPUTFILE... means that you can pass one or more input files to Sed. The three dots in man and info pages mean “one or more of the previous item/atom.”

The Sed SCRIPT can often be a complex beast, but let’s exemplify the basic syntax of Sed’s invocation with a script that replaces foo with bar on the entire file.

sed 's/foo/bar/' file.txt

The part 's/foo/bar/' is means substitute (s) foo with bar. The / character is the delimiter. The delimiter is the character used to delimit the text that is matched text and the replacement text. In our example, foo is what we want to match, and bar is what we want to replace foo with. If file.txt contains the string foo on any of its lines, it will be replaced with bar. No matter whether the each line contained the foo and the substitution occurred or not, all lines are finally printed to stdout.

As for the INPUTFILE, there are several ways to feed and input file to Sed. Behold!

The most common case:

sed 's/foo/bar/' file.txt

This common case easily accepts more than one filename:

sed 's/foo/bar/' file1.txt file2.txt etc.txt

Using an explicit redirection:

sed 's/foo/bar' < file.txt

Piping:

echo 'this is foo' | sed 's/foo/bar/'
printf %s 'this is foo' | sed 's/foo/bar/'
cat file.txt | sed 's/foo/bar/'

Redirecting and piping:

< file.txt | sed 's/foo/bar/'

Using a here string:

sed 's/foo/bar/' <<< 'this is foo'

Using a here document:

$ sed 's/foo/bar/' <<EOF
> this is foo
> EOF
this is bar

In this last example, the “$” symbol is Bash’s PS1, and the “>” symbol is Bash’s PS2. You don’t type those characters yourself. They are shown just to make it clear how the entire command looks like. Let’s describe how to type the last command step by step, just in case.

The shell is happily waiting for you to type a command, and you see the symbol “$”. You then type sed 's/foo/bar' <<EOF and hit <Enter>. Now the prompt changes to a “>” to show you that it understood that the command is not finished and it is waiting for you to type more instructions. Then you enter this is foo followed by another <Enter>. Finally you type is EOF again and type <Enter> one last time. The shell then sees that you finished typing your command and run the entire script, showing this is bar.

Read more about Here Documents and Here Strings here and here. They are also described in your shell’s man page. Make sure there is not spaces or tabs after the closign EOF delimiter or the shell will not understand that you are done typing the command. As a rule of thumb, it is generally a good idea no to leave traling spaces on any lines and on any programming language. There are several reasons why that is a good thing but that our topic here. Configure your editor to show trailing whitespace. I’ll show how to do it in Vim:

set listchars=:trail-
set list

Now, Vim will show a - for every stray traling whitespace at the end of lines (:help listchars and :help list).

# comment command

Read (online):

info sed Common\ Commands

Technically, # is also a command, and its use is the same as for programming languages: to describe and document what specific parts of code does, add licencing information, and even curse when projects requisites keep changing.

Then, beware of that #n thing at the very first line of a script, as mentioned in the info page.

q quit command

Read (online):

info sed Common\ Commands

One example of the use of q is to do something only with the first occurrence of a pattern. If no addresses are specified, Sed will try to find a match in all lines of a file. That means that if you are trying to replace “Intro” with “INTRO” only the first time “Intro” appears on a file, you have to use q.

sed '/Intro/ { s/Intro/INTRO/; q };

We haven’t either talked about using regex as addresses yet, neither about using { and } to group commands, but that above command says, “look for a line that contains ‘Intro’ (and this is a regex, not a string), then, apply the following group of commands when that regex matches a line.” Inside the group we have the s/Intro/INTRO/ command, then the command separator ;, then the last command, the q.

d delete command

TODO

p print command

TODO

n command

From info sed 'Common Commands':

If auto-print is not disabled, print the pattern space, then, regardless, replace the pattern space with the next line of input. If there is no more input then sed exits without processing any more commands.

This is file.txt:

$ cat file.txt
foo
bar
jedi

Let’s understand what the following Sed command does:

sed 'n' < file.txt

Sed reads the first line of input, “foo” and places it in the PS. The command is n, so, Sed prints the PS and replaces PS with the next line, “bar”. Again, the n command is applied, which makes Sed print PS (which is now “bar”), and the last line of input is read. Now, “jedi” is in PS,n is executed, which causes PS to be once more printed to STDOUT. There are not more input lines, so sed just exits. At the end, this command just prints

foo
bar
jedi

just like the cat command would do. As we see, the n command is not very useful on its own, but at least we now know how it works.

Let’s try an example that makes a little more sense. The n command is useful in situations where you want to deal with “every other line”, that is, you skip one line, and do something with the other. For example, you have this file where each country name is written in en_US in odd lines, and in pt_BR in even lines:

Japan
Japão
Brazil
Brasil
United States
Estados Unidos

You want to add “(pt_BR)” after each line where the name is indeed in pt_BR.

$ sed 'n; s/.*/& (pt_BR)/' < countries.txt
Japan
Japão (pt_BR)
Brazil
Brasil (pt_BR)
United States
Estados Unidos (pt_BR)

And if you want to add “en_US” to the respective lines as well, we do:

$ sed 's/.*/& (en_US)/; n; s/.*/& (pt_BR)/' < countries.txt
Japan (en_US)
Japão (pt_BR)
Brazil (en_US)
Brasil (pt_BR)
United States (en_US)
Estados Unidos (pt_BR)

Let’s explain sed 'n; s/.*/& (pt_BR)/' first.

Sed reads “Japan” into PS. The command is n, so, Sed just prints it and reads the next line of input. Note that Sed doesn’t run the s command on “Japan” (because of the n command). Now “Japão” is in PS (again because of n) and the s command is run, causing “Japão” to become “Japão (pt_BR)”, and PS is printed. There are no more commands, so, Sed just restarts the cycle again.

“Brazil” is read into PS (which automatically replaces whatever was there). n is the command to be run, which causes PS to be printed and the next line of input to be read into PS, which causes PS now contain “Brasil”, which is the text that the s command operates on this time, turning “Brasil” into “Brasil (pt_BR)”. PS is once more printed.

The cycle starts once more for “United States”, which is simple printed, “Estados Unidos” is read into PS and operated on by the s command and then printed.

Note that every time sed reads a line into the PS, it will run any commands on the contents of the PS and then automatically print the PS unless the -n command line option is used.

As for 's/.*/& (en_US)/; n; s/.*/& (pt_BR)/', the process is very similar. First, “Japan” is read into PS. The command is s, which turns “Japan” into “Japão (en_US)”. Then n causes PS to be printed, and the next line of input to placed into PS (the old content of PS is gone by now). The current command is the second s, which replaces “Japão” with “Japão (pt_BR). PS is printed and the cycle starts again for “Brazil” and finally for “United States”.