sed
sed is not sad, wish u happy~
序
sed ( Stream Editor ),学习好 Shell 的这些利器,得明白一个特点,那就是他们都是 Line-Oriented 的,每次读出和处理一行,想着你现在写了一个程序,每次从一个缓冲区获取一行,也就是说,直到读到 '\n'
为止,接着你开始对你读出来的数据进行处理。
基本功能
这里提一个插曲,公司要求 tab 全部为4个空格,但是编码的时候用4个空格,其实非常不灵活,所以我的想法就是之后再使用脚本转换,然后想当然地用了 sed,但是后来发现它是将源文件再生成,于是乎我的 Git 记录变成了删除了全部行,增加了全部行,其实只需要使用 -i 选项即可,后来我使用 vim 自带的 [range]%s/pattern/substitution/g 也解决了这个问题~
How it works
The next line is read from the input file and places it in the pattern space. If the end of file is found, and if there are additional files to read, the current file is closed, the next file is opened, and the first line of the new file is placed into the pattern space.
The line count is incremented by one. Opening a new file does not reset this number.
Each sed command is examined. If there is a restriction placed on the command, and the current line in the pattern space meets that restriction, the command is executed. Some commands, like "n" or "d" cause sed to go to the top of the loop. The "q" command causes sed to stop. Otherwise the next command is examined.
After all of the commands are examined, the pattern space is output unless sed has the optional "-n" argument.
sed 其实有非常多的功能,下面提一下它的格式
最开始接触的时候往往觉得 sed 指令非常复杂,其实不然,它一共就是三部分,第一部分,叫 address restriction, 第二部分就是 command, 然后是 参数,理解了这一点,才能理解那些一长串的指令。
If [addr]
is specified, the command X will be executed only on the matched lines. [addr]
can be a single line number, a regular expression, or a range of lines (see sed addresses). Additional [options]
are used for some sed
commands.
对于提到的 addr 有以下的俩种
行数
Pattern - /regexp/
其实最经常用到的是如下几种
commands: d -n p I a/i/c -e
还可以用空行来指范围
删除 # 开头的空行,都类似了,即删除 addr 所指代的行
下面来实现,grep 的功能, -n 即 -noprint
当然,grep 的递归查找还是很方便的~ -n
代表只会打印匹配了的那一行
-e 的原理就是,例子上面也有,不多解释
Copy the input line into the pattern space.
Apply the first sed command on the pattern space, if the address restriction is true.
Repeat with the next sed expression, again operating on the pattern space.
When the last operation is performed, write out the pattern space and read in the next line from the input file.
注意例子均来自 ref[1],这里主要做一个笔记,然后加上点自己的理解
substitute command
s 这是 sed 最出名的指令了,其实其中复杂还是在于正则表达式的使用,真正使用起来的方式非常的简单。
参考链接有一节,专门提到了这个 Addresses and Ranges of Text
Regular Expression
&
& 用来指代匹配的 pattern,跟 grep 是一样的,对于 find 指令,则是 { }
-r
-r 是 extended 的正则模式,支持一些集合,总之,发现不支持了,就得思考是不是因为不支持的原因,对于 grep 是 -E
\1 \2 to keep part of the pattern
这一点和 grep 也是一样的
flags or command
对于 s
指令的格式是这样的
‘s/regexp/replacement/flags’
下面对于 flag 的说明,来自 ref[2]
g
Apply the replacement to all matches to the regexp, not just the first.
number
Only replace the numberth match of the regexp.
interaction in
s
command Note: the POSIX standard does not specify what should happen when you mix theg
and number modifiers, and currently there is no widely agreed upon meaning acrosssed
implementations. For GNUsed
, the interaction is defined to be: ignore matches before the numberth, and then match and replace all matches from the numberth on.p
If the substitution was made, then print the new pattern space.
Note: when both the
p
ande
options are specified, the relative ordering of the two produces very different results. In general,ep
(evaluate then print) is what you want, but operating the other way round can be useful for debugging. For this reason, the current version of GNUsed
interprets specially the presence ofp
options both before and aftere
, printing the pattern space before and after evaluation, while in general flags for thes
command show their effect just once. This behavior, although documented, might change in future versions.w filename
If the substitution was made, then write out the result to the named file. As a GNU
sed
extension, two special values of filename are supported: /dev/stderr, which writes the result to the standard error, and /dev/stdout, which writes to the standard output.3e
This command allows one to pipe input from a shell command into pattern space. If a substitution was made, the command that is found in pattern space is executed and pattern space is replaced with its output. A trailing newline is suppressed; results are undefined if the command to be executed contains a NUL character. This is a GNU
sed
extension.Ii
The
I
modifier to regular-expression matching is a GNU extension which makessed
match regexp in a case-insensitive manner.Mm
The
M
modifier to regular-expression matching is a GNUsed
extension which directs GNUsed
to match the regular expression in multi-linemode. The modifier causes^
and$
to match respectively (in addition to the normal behavior) the empty string after a newline, and the empty string before a newline. There are special character sequences (\`
and\'
) which always match the beginning or the end of the buffer. In addition, the period character does not match a new-line character in multi-line mode.
值得注意的是,p 同时也是一条 command,g 也是,只是用的不多,下面提一些例子,来自 ref[1]
Cheat Sheet
ref[3] 建议 sed 用双引号,而不是单引号
转义
/ /
内的是原生字符串,照道理我们应该只需要 \\
即可,但是这里却非常奇怪,跟 Java
写这条语句一样了,需要 4
个 \
References
Last updated