元编程¶
类似 Lisp , Julia 自身的代码也是语言本身的数据结构。由于代码是由这门语言本身所构造和处理的对象所表示的,因此程序也可以转换并生成自身语言的代码。元编程的另一个功能是反射,它可以在程序运行时动态展现程序本身的特性。
表达式和求值¶
Julia 代码表示为由 Julia 的 Expr
类型的数据结构而构成的语法树。下面是 Expr
类型的定义:
type Expr
head::Symbol
args::Array{Any,1}
typ
end
head
是标明表达式种类的符号; args
是子表达式数组,它可能是求值时引用变量值的符号,也可能是嵌套的 Expr
对象,还可能是真实的对象值。 typ
域被类型推断用来做类型注释,通常可以被忽略。
有两种“引用”代码的方法,它们可以简单地构造表达式对象,而不需要显式构造 Expr
对象。第一种是内联表达式,使用 :
,后面跟单表达式;第二种是代码块儿,放在 quote ... end
内部。下例是第一种方法,引用一个算术表达式:
julia> ex = :(a+b*c+1)
:(a + b * c + 1)
julia> typeof(ex)
Expr
julia> ex.head
:call
julia> typeof(ans)
Symbol
julia> ex.args
4-element Array{Any,1}:
:+
:a
:(b * c)
1
julia> typeof(ex.args[1])
Symbol
julia> typeof(ex.args[2])
Symbol
julia> typeof(ex.args[3])
Expr
julia> typeof(ex.args[4])
Int64
下例是第二种方法:
julia> quote
x = 1
y = 2
x + y
end
quote # none, line 2:
x = 1 # line 3:
y = 2 # line 4:
x + y
end
符号¶
:
的参数为符号时,结果为 Symbol
对象,而不是 Expr
:
julia> :foo
:foo
julia> typeof(ans)
Symbol
在表达式的上下文中,符号用来指示对变量的读取。当表达式被求值时,符号的值受限于符号的作用域(详见 变量的作用域 )。
有时, 为了防止解析时产生歧义, :
的参数需要添加额外的括号:
julia> :(:)
:(:)
julia> :(::)
:(::)
Symbol
也可以使用 symbol
函数来创建,参数为一个字符或者字符串:
julia> symbol('\'')
:'
julia> symbol("'")
:'
求值和内插¶
指定一个表达式,Julia 可以使用 eval
函数在 global 作用域对其求值。
julia> :(1 + 2)
:(1 + 2)
julia> eval(ans)
3
julia> ex = :(a + b)
:(a + b)
julia> eval(ex)
ERROR: a not defined
julia> a = 1; b = 2;
julia> eval(ex)
3
Every module has its own eval
function that
evaluates expressions in its global scope.
Expressions passed to eval
are not limited to returning values
— they can also have side-effects that alter the state of the enclosing
module’s environment:
julia> ex = :(x = 1)
:(x = 1)
julia> x
ERROR: x not defined
julia> eval(ex)
1
julia> x
1
表达式仅仅是一个 Expr
对象,它可以通过编程构造,然后对其求值:
julia> a = 1;
julia> ex = Expr(:call, :+,a,:b)
:(+(1,b))
julia> a = 0; b = 2;
julia> eval(ex)
3
注意上例中 a
与 b
使用时的区别:
- 表达式构造时,直接使用 变量
a
的值。因此,对表达式求值时a
的值没有任何影响:表达式中的值为1
,与现在a
的值无关 - 表达式构造时,使用的是 符号
:b
。因此,构造时变量b
的值是无关的——:b
仅仅是个符号,此时变量b
还未定义。对表达式求值时,通过查询变量b
的值来解析符号:b
的值
这样构造 Expr
对象太丑了。Julia 允许对表达式对象内插。因此上例可写为:
julia> a = 1;
julia> ex = :($a + b)
:(+(1,b))
编译器自动将这个语法翻译成上面带 Expr
的语法。
代码生成¶
Julia 使用表达式内插和求值来生成重复的代码。下例定义了一组操作三个参数的运算符:
for op = (:+, :*, :&, :|, :$)
eval(quote
($op)(a,b,c) = ($op)(($op)(a,b),c)
end)
end
上例可用 :
前缀引用格式写的更精简:
for op = (:+, :*, :&, :|, :$)
eval(:(($op)(a,b,c) = ($op)(($op)(a,b),c)))
end
使用 eval(quote(...))
模式进行语言内的代码生成,这种方式太常见了。Julia 用宏来简写这个模式:
for op = (:+, :*, :&, :|, :$)
@eval ($op)(a,b,c) = ($op)(($op)(a,b),c)
end
@eval
宏重写了这个调用,使得代码更精简。 @eval
的参数也可以是块代码:
@eval begin
# multiple lines
end
对非引用表达式进行内插,会引发编译时错误:
julia> $a + b
ERROR: unsupported or misplaced expression $
宏¶
宏有点儿像编译时的表达式生成函数。 Just as functions map a tuple of argument values to a return value, macros map a tuple of argument expressions to a returned expression. They allow the programmer to arbitrarily transform the written code to a resulting expression, which then takes the place of the macro call in the final syntax tree.调用宏的语法为:
@name expr1 expr2 ...
@name(expr1, expr2, ...)
注意,宏名前有 @
符号。第一种形式,参数表达式之间没有逗号;第二种形式,宏名后没有空格。这两种形式不要记混。例如,下面的写法的结果就与上例不同,它只向宏传递了一个参数,此参数为多元组 (expr1, expr2, ...)
:
@name (expr1, expr2, ...)
程序运行前, @name
展开函数会对表达式参数处理,用结果替代这个表达式。使用关键字 macro
来定义展开函数:
macro name(expr1, expr2, ...)
...
return resulting_expr
end
下例是 Julia 中 @assert
宏的简单定义:
macro assert(ex)
return :($ex ? nothing : error("Assertion failed: ", $(string(ex))))
end
这个宏可如下使用:
julia> @assert 1==1.0
julia> @assert 1==0
ERROR: Assertion failed: 1 == 0
in error at error.jl:22
宏调用在解析时被展开为返回的结果。这等价于:
1==1.0 ? nothing : error("Assertion failed: ", "1==1.0")
1==0 ? nothing : error("Assertion failed: ", "1==0")
That is, in the first call, the expression :(1==1.0)
is spliced into
the test condition slot, while the value of string(:(1==1.0))
is
spliced into the assertion message slot. The entire expression, thus
constructed, is placed into the syntax tree where the @assert
macro
call occurs. Then at execution time, if the test expression evaluates to
true, then nothing
is returned, whereas if the test is false, an error
is raised indicating the asserted expression that was false. Notice that
it would not be possible to write this as a function, since only the
value of the condition is available and it would be impossible to
display the expression that computed it in the error message.
The actual definition of @assert
in the standard library is more
complicated. It allows the user to optionally specify their own error
message, instead of just printing the failed expression. Just like in
functions with a variable number of arguments, this is specified with an
ellipses following the last argument:
macro assert(ex, msgs...)
msg_body = isempty(msgs) ? ex : msgs[1]
msg = string("assertion failed: ", msg_body)
return :($ex ? nothing : error($msg))
end
Now @assert
has two modes of operation, depending upon the number of
arguments it receives! If there’s only one argument, the tuple of expressions
captured by msgs
will be empty and it will behave the same as the simpler
definition above. But now if the user specifies a second argument, it is
printed in the message body instead of the failing expression. You can inspect
the result of a macro expansion with the aptly named macroexpand()
function:
julia> macroexpand(:(@assert a==b))
:(if a == b
nothing
else
Base.error("assertion failed: a == b")
end)
julia> macroexpand(:(@assert a==b "a should equal b!"))
:(if a == b
nothing
else
Base.error("assertion failed: a should equal b!")
end)
There is yet another case that the actual @assert
macro handles: what
if, in addition to printing “a should equal b,” we wanted to print their
values? One might naively try to use string interpolation in the custom
message, e.g., @assert a==b "a ($a) should equal b ($b)!"
, but this
won’t work as expected with the above macro. Can you see why? Recall
from string interpolation that an
interpolated string is rewritten to a call to the string
function.
Compare:
julia> typeof(:("a should equal b"))
ASCIIString (constructor with 2 methods)
julia> typeof(:("a ($a) should equal b ($b)!"))
Expr
julia> dump(:("a ($a) should equal b ($b)!"))
Expr
head: Symbol string
args: Array(Any,(5,))
1: ASCIIString "a ("
2: Symbol a
3: ASCIIString ") should equal b ("
4: Symbol b
5: ASCIIString ")!"
typ: Any
So now instead of getting a plain string in msg_body
, the macro is
receiving a full expression that will need to be evaluated in order to
display as expected. This can be spliced directly into the returned expression
as an argument to the string
call; see error.jl for
the complete implementation.
The @assert
macro makes great use of splicing into quoted expressions
to simplify the manipulation of expressions inside the macro body.
卫生宏¶
卫生宏 是个更复杂的宏。In short, macros must
ensure that the variables they introduce in their returned expressions do not
accidentally clash with existing variables in the surrounding code they expand
into. Conversely, the expressions that are passed into a macro as arguments are
often expected to evaluate in the context of the surrounding code,
interacting with and modifying the existing variables. Another concern arises
from the fact that a macro may be called in a different module from where it
was defined. In this case we need to ensure that all global variables are
resolved to the correct module. Julia already has a major advantage over
languages with textual macro expansion (like C) in that it only needs to
consider the returned expression. All the other variables (such as msg
in
@assert
above) follow the normal scoping block behavior.
来看一下 @time
宏,它的参数是一个表达式。它先记录下时间,运行表达式,再记录下时间,打印出这两次之间的时间差,它的最终值是表达式的值:
macro time(ex)
return quote
local t0 = time()
local val = $ex
local t1 = time()
println("elapsed time: ", t1-t0, " seconds")
val
end
end
t0
, t1
, 及 val
应为私有临时变量,而 time
是标准库中的 time
函数,而不是用户可能使用的某个叫 time
的变量( println
函数也如此)。
Julia 宏展开机制是这样解决命名冲突的。首先,宏结果的变量被分类为本地变量或全局变量。如果变量被赋值(且未被声明为全局变量)、被声明为本地变量、或被用作函数参数名,则它被认为是本地变量;否则,它被认为是全局变量。本地变量被重命名为一个独一无二的名字(使用 gensym
函数产生新符号),全局变量被解析到宏定义环境中。
但还有个问题没解决。考虑下例:
module MyModule
import Base.@time
time() = ... # compute something
@time time()
end
此例中, ex
是对 time
的调用,但它并不是宏使用的 time
函数。它实际指向的是 MyModule.time
。因此我们应对要解析到宏调用环境中的 ex
代码做修改。这是通过 esc
函数的对表达式“转义”完成的:
macro time(ex)
...
local val = $(esc(ex))
...
end
这样,封装的表达式就不会被宏展开机制处理,能够正确的在宏调用环境中解析。
必要时这个转义机制可以用来“破坏”卫生,从而引入或操作自定义变量。下例在调用环境中宏将 x
设置为 0 :
macro zerox()
return esc(:(x = 0))
end
function foo()
x = 1
@zerox
x # is zero
end
应审慎使用这种操作。
非标准字符串文本¶
字符串 中曾讨论过带标识符前缀的字符串文本被称为非标准字符串文本,它们有特殊的语义。例如:
r"^\s*(?:#|$)"
生成正则表达式对象而不是字符串b"DATA\xff\u2200"
是字节数组文本[68,65,84,65,255,226,136,128]
事实上,这些行为不是 Julia 解释器或编码器内置的,它们调用的是特殊名字的宏。例如,正则表达式宏的定义如下:
macro r_str(p)
Regex(p)
end
因此,表达式 r"^\s*(?:#|$)"
等价于把下列对象直接放入语法树:
Regex("^\\s*(?:#|\$)")
这么写不仅字符串文本短,而且效率高:正则表达式需要被编译,而 Regex
仅在 代码编译时 才构造,因此仅编译一次,而不是每次执行都编译。下例中循环中有一个正则表达式:
for line = lines
m = match(r"^\s*(?:#|$)", line)
if m == nothing
# non-comment
else
# comment
end
end
如果不想使用宏,要使上例只编译一次,需要如下改写:
re = Regex("^\\s*(?:#|\$)")
for line = lines
m = match(re, line)
if m == nothing
# non-comment
else
# comment
end
end
由于编译器优化的原因,上例依然不如使用宏高效。但有时,不使用宏可能更方便:要对正则表达式内插时必须使用这种麻烦点儿的方式;正则表达式模式本身是动态的,每次循环迭代都会改变,生成新的正则表达式。
不止非标准字符串文本,命令文本语法( `echo "Hello, $person"`
)也是用宏实现的:
macro cmd(str)
:(cmd_gen($shell_parse(str)))
end
当然,大量复杂的工作被这个宏定义中的函数隐藏了,但是这些函数也是用 Julia 写的。你可以阅读源代码,看看它如何工作。它所做的事儿就是构造一个表达式对象,用于插入到你的程序的语法树中。
反射¶
In addition to the syntax-level introspection utilized in metaprogramming, Julia provides several other runtime reflection capabilities.
Type fields The names of data type fields (or module members) may be interrogated
using the names
command. For example, given the following type:
type Point
x::FloatingPoint
y
end
names(Point)
will return the array Any[:x, :y]
. The type of
each field in a Point
is stored in the types
field of the Point object:
julia> typeof(Point)
DataType
julia> Point.types
(FloatingPoint,Any)
Subtypes The direct subtypes of any DataType may be listed using
subtypes(t::DataType)
. For example, the abstract DataType FloatingPoint
has four (concrete) subtypes:
julia> subtypes(FloatingPoint)
4-element Array{Any,1}:
BigFloat
Float16
Float32
Float64
Any abstract subtype will also be included in this list, but further subtypes
thereof will not; recursive applications of subtypes
allow to build the
full type tree.
Type internals The internal representation of types is critically important
when interfacing with C code. isbits(T::DataType)
returns true if T is
stored with C-compatible aligment. The offsets of each field may be listed
using fieldoffsets(T::DataType)
.
Function methods The methods of any function may be listed using
methods(f::Function)
.
Function representations Functions may be introspected at several levels
of representation. The lowered form of a function is available
using code_lowered(f::Function, (Args...))
, and the type-inferred lowered form
is available using code_typed(f::Function, (Args...))
.
Closer to the machine, the LLVM Intermediate Representation of a function is
printed by code_llvm(f::Function, (Args...))
, and finally the resulting
assembly instructions (after JIT’ing step) are available using
code_native(f::Function, (Args...)
.