Julia 矩阵对角化

Julia中的对角化（稠密矩阵）

使用CPU计算时，Julia对角化主要由LinearAlgebra.jl这个包提供
文档在Linear Algebra · The Julia Language

一般矩阵对角化

使用eigen(A)就可以简单的对角化矩阵A

julia> A = rand(3,3);

julia> sol = eigen(A);

julia> sol.values   # 特征值
3-element Vector{Float64}:
 -0.29055641751750316
  0.3535525967234129
  1.4829692260025664

julia> sol.vectors  # 对应的特征向量（每列
3×3 Matrix{Float64}:
 -0.484695  -0.326585  -0.746718
  0.860294  -0.654397  -0.616361
 -0.158002   0.681987  -0.250025

特殊矩阵对角化

对角化一些 Matrices with special symmetries and structures 的时候，要先用Julia的构造函数构造出指定类型的矩阵，在输入函数eigen()即可，比对角化一般的矩阵要快不少。根据文档，Julia中已经实现了如下几种矩阵类型，底层调用的包应该是LAPACK，这应该也是eigen()只能输入稠密矩阵的原因吧

Type	Description
`Symmetric`	Symmetric matrix
`Hermitian`	Hermitian matrix
`UpperTriangular`	Upper triangular matrix
`UnitUpperTriangular`	Upper triangular matrix with unit diagonal
`LowerTriangular`	Lower triangular matrix
`UnitLowerTriangular`	Lower triangular matrix with unit diagonal
`UpperHessenberg`	Upper Hessenberg matrix
`Tridiagonal`	Tridiagonal matrix
`SymTridiagonal`	Symmetric tridiagonal matrix
`Bidiagonal`	Upper/lower bidiagonal matrix
`Diagonal`	Diagonal matrix
`UniformScaling`	Uniform scaling operator

1
2
3
4


# 简单演示一些实对称矩阵
A = rand(100, 100)
A = Symmetric(A)    # 一定要转换类型
eigen(A)            # 正常输入即可

底层调用的应该是LinearAlgebra.LAPACK.syev!这个函数：

help?> LinearAlgebra.LAPACK.syev!
  syev!(jobz, uplo, A)

  Finds the eigenvalues (jobz = N) or eigenvalues and eigenvectors (jobz   
  = V) of a symmetric matrix A. If uplo = U, the upper triangle of A is    
  used. If uplo = L, the lower triangle of A is used.

julia> A = rand(3, 3)
3×3 Matrix{Float64}:
 0.174336  0.256671   0.769035
 0.264181  0.857172   0.820843
 0.5553    0.0932148  0.0113623

julia> LinearAlgebra.LAPACK.syev!('N', 'U', A)  # 使用上三角矩阵，仅计算全部特征值
3-element Vector{Float64}:
 -0.8223969680207438
  0.22148121673036028
  1.6437855717321945

julia> sol = LinearAlgebra.LAPACK.syev!('V', 'U', A);  # 使用上三角矩阵，计算全部特征值和对应的特征向量

julia> sol[1]   # 特征值
3-element Vector{Float64}:
 -1.3984285742861604
  0.13262292456038008
  1.2606172659335537

julia> sol[2]   # 对应的特征向量（每列
3×3 Matrix{Float64}:
 0.645942  -0.687149   0.332543
 0.701843   0.363202  -0.612781
 0.300292   0.629214   0.716878

不管是eigen还是LinearAlgebra.LAPACK.syev!都会返回正交归一化的特征向量

julia> sol.vectors[:, 1]' * sol.vectors[:, 1]
1.0000000000000002

julia> sol.vectors[:, 1]' * sol.vectors[:, 2]
-1.1102230246251565e-16

julia> sol.vectors[:, 1]' * sol.vectors[:, 3]
-2.220446049250313e-16

Julia中的对角化（稀疏矩阵）

Julia中稀疏矩阵由包SparseArrays.jl提供
获取其特征值则由Arpack.jl，KrylovKit.jl提供，简单演示一下KrylovKit.jl，我们生成一个20个格点，使用 $S_z = 0$ 的sector的哈密顿量，开边界的一维链的海森堡模型对应的哈密顿矩阵，使用KrylovKit.jl获取其最小的特征值和对应的特征向量（即基态

julia> include("test.jl")
btest (generic function with 1 method)

julia> sol = main();    # 获得哈密顿矩阵和最小的特征值以及对应的特征向量
N: 20 with UP: 10
Dimension of Hilbert space: 184756
We got all required states.
100.0% finished.
We got Hamiltonion Matrix.
Number of nonzero elements: 2032316 (0.006%) of total.

julia> sol[1]   # 哈密顿矩阵，可以看出是个实对称矩阵
184756×184756 SparseMatrixCSC{Float64, Int64} with 2032316 stored entries:
⢻⣶⣄⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⢙⡻⣮⣳⣄⠀⢀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠙⢾⣻⣾⣦⠀⠙⢦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⢀⠈⠛⠿⣧⡀⠀⠙⢦⠀⠀⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠈⠳⣄⠀⠈⢿⣷⣦⢀⡀⠀⠀⠀⠙⢦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠈⠳⣄⠈⢛⠿⣧⡙⢦⡀⠀⠀⠀⠙⢦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⣌⢿⣷⣦⡀⠀⠀⠀⠀⠙⢦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠈⠻⢿⣷⠀⠀⠀⠀⠀⠀⠙⢦⡀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠈⠳⣄⠀⠀⠀⠀⠀⠀⢿⣷⣦⡀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⣄⠀⠀⠀⠀⠈⠻⢿⣷⡙⢦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⣄⠀⠀⠀⠈⠳⣌⢻⣶⣥⡀⠙⢦⡀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⣄⠀⠀⠀⠈⠁⠻⢿⣷⡀⠀⠙⢦⡀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠳⣄⠀⠈⢻⣶⣤⡀⠁⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠳⣄⠀⠻⡿⣯⡷⣄⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠁⠀⠙⢯⡻⣮⣅⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠙⠿⣧

julia> sol[2][1]    # 最小特征值
1-element Vector{Float64}:
 -8.682473334398972

julia> sol[2][2][1] # 最小特征值对应的特征向量
184756-element Vector{Float64}:
 -1.135549029396922e-16
 -3.0113364158208294e-16
 -1.6514843550594074e-16
  3.1987004020425103e-16
  3.3517396266049933e-16
  3.685711501055183e-16
  4.372602319957195e-16
  2.111201672285677e-17
  ⋮
  6.320568687676405e-16
  7.685930508565765e-16
  2.4501843267160276e-16
  2.833358444525711e-16
 -4.0818433639642274e-17
 -3.756562576224366e-16
 -6.090099987357374e-16

julia> vec = sol[2][2][1];

julia> using Gadfly

julia> using Cairo

julia> t_pl = plot(y = vec.^2, Geom.bar);   # 画出概率幅

julia> img = PDF("t_pl.pdf", 12inch, 8inch) # 保存为PDF

julia> draw(img, t_pl)  # 保存文件

点此下载PDF

要获取其他特征值可以去Arpack.jl，KrylovKit.jl文档看看

在Julia中使用CUDA完成对角化

Julia中使用CUDA大多要引入CUDA.jl 这个包，应该是CUDA官方工具的Julia包装
一些函数使用可以?CUSOLVER.syevd!查看，最好去CUDA官方的文档¹
cuSULVER本身可以看做LAPACK的CUDA版本

总之，在Julia中使用CUDA对角化矩阵，大概有以下几个步骤

引入CUDA，CUSOLVER
在CPU创建矩阵上传至GPU，或直接在GPU中创建矩阵
调用CUSOLVER求解

julia> using CUDA

julia> using CUDA.CUSOLVER

julia> or = rand(3, 3);

julia> cp_or = Symmetric(or);   # 在内存中创建矩阵

julia> cu_or = CuArray(cp_or);  # 上传至GPU

julia> cu_sol = CUSOLVER.syevd!('V', 'U', cu_or);    # 使用CUDA获取全部特征值与特征向量

julia> cu_sol[1]    # 全部特征值
3-element CuArray{Float64, 1}:
 -1.2974509929372127
 -0.574395622968929
  1.13580364783116

julia> cu_sol[2]    # 对应特征向量（列
3×3 CuArray{Float64, 2}:
 -0.617345    0.676925  0.400822
  0.784067    0.487837  0.383737
 -0.0642259  -0.551169  0.831918

Julia中使用CUDA对角化较大尺寸时速度比CPU快很多

我电脑是I7-8700ES，显卡蓝天1070，分别使用CPU和GPU对角化了三个尺寸的实对称矩阵，测定他们对角化的时间，其中未计入GPU内存复制的时间，因为我推测只有开始和结束的一次复制罢了，目前GPU插在PCIEx16槽上，复制应该占不了多少时间。

实际操作中使用CPU对角化最多占用50%，应该是julia调用的库（LAPACK）自带的多线程

使用CUDA对角化3000x3000的矩阵功率跑到60w/115w，明显没有跑满；对角化6000、12870的矩阵则111.9w/115w，应该是跑满了

矩阵尺寸	3000	6000	12870
CPU	6s	43s	未测
CUDA	2s	9.4s	84s

julia> versioninfo() # I7-8700es
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Genuine Intel(R) CPU 0000 @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

julia> or = rand(3000, 3000);   #3000尺寸

julia> cp_or = Symmetric(or);

julia> @time eigen(cp_or);
  6.023117 seconds (164 allocations: 207.078 MiB, 0.10% compilation time)

julia> @time eigen(cp_or);
  6.700551 seconds (17 allocations: 207.070 MiB)

julia> cu_or = CuArray(Symmetric(or));

julia> @time CUSOLVER.syevd!('V', 'U', cu_or);
  2.238742 seconds (472.85 k allocations: 12.055 MiB, 1.36% gc time, 7.59% compilation time)

julia> cu_or = CuArray(Symmetric(or));

julia> @time CUSOLVER.syevd!('V', 'U', cu_or);
  1.571150 seconds (445.83 k allocations: 6.804 MiB)

julia> or = rand(6000, 6000);   #6000尺寸

julia> cp_or = Symmetric(or);

julia> @time eigen(cp_or);
 42.822619 seconds (17 allocations: 826.127 MiB)

julia> @time eigen(cp_or);
 42.796838 seconds (17 allocations: 826.127 MiB, 0.28% gc time)

julia> cu_or = CuArray(Symmetric(or));

julia> @time CUSOLVER.syevd!('V', 'U', cu_or);
  9.449970 seconds (3.15 M allocations: 48.097 MiB)

julia> or = rand(12870, 12870); #12870尺寸

julia> cu_or = CuArray(Symmetric(or));

julia> @time CUSOLVER.syevd!('V', 'U', cu_or);
 83.690132 seconds (30.04 M allocations: 458.418 MiB)

julia> cu_or = CuArray(Symmetric(or));

julia> @time CUSOLVER.syevd!('V', 'U', cu_or);
 85.245942 seconds (28.41 M allocations: 433.664 MiB, 0.63% gc time, 0.00% compilation time)

julia>

有参考：

点击链接末尾的回车符可以跳转回引用处~

cuSOLVER :: CUDA Toolkit Documentation
https://docs.nvidia.com/cuda/cusolver/index.html#eig_examples ↩︎