Capturing Groups in Sed

2021-09-19

While migrating this blog from Jekyll to Hugo, I found I needed to replace all my old markdown posts which has my title params using single-quotes which Hugo does not appreciate, to double-quotes.

For example, I have the following across multiple files (over 200+ files)

---
title: 'abcdefg'
tags:
- new
- stuff
---

And I needed to replace title: 'abcdefgh' with title: "abcdefgh".

To do that, I’m going to use sed because its going to be terribly painful to do this manually.

Although I often use sed to replace full words, modifying characters around some text is is something that I haven’t have much practice with.

In normal regex, we will often use (.*) to capture a group of everything .* and $1 to reference the matched group.

In sed, it looks similar:

(.*) refers to matching groups
\1 refers to the first match
\2 refers to the 2nd match
...

So for the above scenario, we can quickly replace the single-quotes with just:

$ sed -i "s,title: '\(.*\)',title: \"\1\",g"

What this translate to is:

find a line that starts with title:, and has '' surrounding a matching group of words (.*)
then replace that line with title: and add a placeholder of \1 with surrounding ""
then replace \1 with the captured matching groups of word (.*)

So now,

title: 'abcdefgh'
...

becomes

title: "abcdefgh"
...

Note: in the above example, I am using , as the sed delimiter instead of the normal /
Also, notice that we have \" just before and after \1. This is because we need to escape double quotes since we’re using double quotes as the argument wrapper already.