StarNEig Library  version v0.1-beta.5
A task-based library for solving nonsymmetric eigenvalue problems
Introduction

StarNEig library aims to provide a complete task-based software stack for solving dense nonsymmetric (generalized) eigenvalue problems. The library is built on top of the StarPU runtime system and targets both shared memory and distributed memory machines. Some components of the library support GPUs.

The four main components of the library are:

  • Hessenberg(-triangular) reduction: A dense matrix (or a dense matrix pair) is reduced to upper Hessenberg (or Hessenberg-triangular) form.
  • Schur reduction: A upper Hessenberg matrix (or a Hessenberg-triangular matrix pair) is reduced to (generalized) Schur form. The (generalized) eigenvalues can be determined from the diagonal blocks.
  • Eigenvalue reordering: Reorders a user-selected set of (generalized) eigenvalues to the upper left corner of an updated (generalized) Schur form.
  • Eigenvectors: Computes (generalized) eigenvectors for a user-selected set of (generalized) eigenvalues.

A brief summary of the StarNEig library can be found from a recent poster: Task-based, GPU-accelerated and Robust Algorithms for Solving Dense Nonsymmetric Eigenvalue Problems, Swedish eScience Academy, Lund, Sweden, October 15-16, 2019 (download)

The library has been developed as a part of the NLAFET project. The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 671633. Support has also been received from eSSENCE, a collaborative e-Science programme funded by the Swedish Government via the Swedish Research Council (VR).

The library is open source and published under BSD 3-Clause licence.

Please cite the following article when refering to StarNEig:

Mirko Myllykoski, Carl Christian Kjelgaard Mikkelsen: Introduction to StarNEig — A Task-based Library for Solving Nonsymmetric Eigenvalue Problems, In Parallel Processing and Applied Mathematics, 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I, Lecture Notes in Computer Science, Vol. 12043, Wyrzykowski R., Deelman E., Dongarra J., Karczewski K. (eds), Springer International Publishing, pp. 70-81, 2020, doi: 10.1007/978-3-030-43229-4_7

Current status

The library is currently in a beta state and only real arithmetic is supported. In addition, some interface functions are implemented as LAPACK and ScaLAPACK wrappers.

Current status:

Component Shared memory Distributed memory Accelerators (GPUs)
Hessenberg reduction Complete ScaLAPACK wrapper Single GPU
Schur reduction Complete Complete Experimental
Eigenvalue reordering Complete Complete Experimental
Eigenvectors Complete Waiting integration Not planned
Hessenberg-triangular reduction LAPACK wrapper / PlannedScaLAPACK wrapper Not planned
Generalized Schur reduction Complete Complete Experimental
Generalized eigenvalue reordering Complete Complete Experimental
Generalized eigenvectors Complete Waiting integration Not planned

Known problems

  • Some older OpenMPI versions (pre summer 2017, e.g. <= 2.1.1) have a bug that might lead to a segmentation fault during a parallel AED.
  • OpenBLAS version 0.3.1 has a bug that might lead to an incorrect result.
  • OpenBLAS versions 0.3.3-0.3.5 might lead to poor scalability.
  • Some MKL versions might lead to poor scalability. The problem appears to be related to Intel's OpenMP library. Setting the KMP_AFFINITY environmental variable to disabled fixes the problem in all known cases.
  • StarPU versions 1.2.4 - 1.2.8 and some StarPU 1.3 snapshots cause poor CUDA performance. The problem can be fixed by compiling StarPU with --disable-cuda-memcpy-peer. It is possible that newer versions of StarPU are also effected by this problem.
  • The STARPU_MINIMUM_AVAILABLE_MEM and STARPU_TARGET_AVAILABLE_MEM environmental variables can be used to fix some GPU related memory allocation problems:
    STARPU_MINIMUM_AVAILABLE_MEM=10 STARPU_TARGET_AVAILABLE_MEM=15 ...
  • The library has an unsolved memory leak problem with OpenMPI. Only large problem sizes are effected. It is not known whether this problem is related to StarNEig, StarPU, OpenMPI or something else. A memory leak is sometimes accompanied by the following warning:
    mpool.c:38 UCX WARN object 0x2652000 was not returned to mpool ucp_requests
    The problem is known to occur with PMIx 2.2.1, UCX 1.5.0, OpenMPI 3.1.3, and StarPU 1.2.8.
  • If the GPU support is enabled, then the starneig_SEP_SM_Hessenberg() interface function cannot always handle problems that do not fit into GPU's memory. The cause of this problem is is not known.
  • The outputs of the starneig_GEP_SM_Schur() and starneig_GEP_DM_Schur() interface functions are not in the so-called standard format. It is possible that some diagonal entries in the right-hand side output matrix are negative. This will be fixed in the next version of the library.
  • The starneig_GEP_SM_Eigenvectors() interface function may scale the input matrices. This will be fixed in the next version of the library.

Related publications

Research papers

  • Mirko Myllykoski, Carl Christian Kjelgaard Mikkelsen: Task-based, GPU-accelerated and Robust Library for Solving Dense Nonsymmetric Eigenvalue Problems, Invited article submitted to Concurrency and Computation: Practice and Experience, arXiv:2002.05024
  • Mirko Myllykoski, Carl Christian Kjelgaard Mikkelsen: Introduction to StarNEig — A Task-based Library for Solving Nonsymmetric Eigenvalue Problems, In Parallel Processing and Applied Mathematics, 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I, Lecture Notes in Computer Science, Vol. 12043, Wyrzykowski R., Deelman E., Dongarra J., Karczewski K. (eds), Springer International Publishing, pp. 70-81, 2020, doi: 10.1007/978-3-030-43229-4_7
  • Carl Christian Kjelgaard Mikkelsen, Mirko Myllykoski: Parallel Robust Computation of Generalized Eigenvectors of Matrix Pencils, presented at PPAM 2019, In Parallel Processing and Applied Mathematics, 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I, Lecture Notes in Computer Science, Vol. 12043, Wyrzykowski R., Deelman E., Dongarra J., Karczewski K. (eds), Springer International Publishing, pp. 58-69, 2020, doi: 10.1007/978-3-030-43229-4_6
  • Carl Christian Kjelgaard Mikkelsen, Angelika Schwarz, and Lars Karlsson: Parallel Robust Solution of Triangular Linear Systems, Concurrency and Computation: Practice and Experience, 31 (19), 2019, doi: 10.1016/j.parco.2019.04.001
  • Mirko Myllykoski: A Task-Based Algorithm for Reordering the Eigenvalues of a Matrix in Real Schur Form, In Parallel Processing and Applied Mathematics, 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017, Revised Selected Papers, Part I, Lecture Notes in Computer Science, Vol. 10777, Wyrzykowski R., Dongarra J., Deelman E., Karczewski K. (eds), Springer International Publishing, pp. 207-216, 2018, doi: 10.1007/978-3-319-78024-5_19
  • Carl Christian Kjelgaard Mikkelsen, Lars Karlsson. Blocked Algorithms for Robust Solution of Triangular Linear Systems, In Parallel Processing and Applied Mathematics, 12th International Conference, PPAM 2017, Lublin, Poland, September 10-13, 2017, Revised Selected Papers, Part I, Lecture Notes in Computer Science, Vol. 10777, Wyrzykowski R., Dongarra J., Deelman E., Karczewski K. (eds), Springer International Publishing, pp. 207-216, 2018, doi: 10.1007/978-3-319-78024-5_7

Reports, deliverables etc

  • Angelika Schwarz, Carl Christian Kjelgaard Mikkelsen, Lars Karlsson: Robust Parallel Eigenvector Computation For the Non-Symmetric Eigenvalue Problem, Report UMINF 20.02, Department of Computing Science, Umeå University, SE-901 87 Umeå, Sweden, 2020 (download)
  • Angelika Schwarz: Towards efficient overflow-free solvers for systems of triangular type, Licentiate thesis, Department of computing science, Umeå University, ISSN: 0348-0542, 2019
  • Mirko Myllykoski, Carl Christian Kjelgaard Mikkelsen, Angelika Schwarz, Bo Kågström: D2.7 Eigenvalue solvers for nonsymmetric problems, public NLAFET deliverable, 2019 (download)
  • Lars Karlsson, Mahmoud Eljammaly, Mirko Myllykoski: D6.5 Evaluation of auto-tuning techniques, public NLAFET deliverable, 2019 (download)
  • Bo Kågström et al.: D7.8 Release of the NLAFET library, public NLAFET deliverable, 2019 (download)
  • Mirko Myllykoski, Lars Karlsson, Bo Kågström, Mahmoud Eljammaly, Srikara Pranesh, Mawussi Zounon: D2.6 Prototype Software for Eigenvalue Problem Solvers, public NLAFET deliverable, 2018 (download)
  • Mirko Myllykoski, Carl Christian Kjelgaard Mikkelsen, Lars Karlsson, Bo Kågström: Task-Based Parallel Algorithms for Reordering of Matrices in Real Schur Forms, NLAFET Working Note WN-11, 2017. Also as Report UMINF 17.11, Department of Computing Science, Umeå University, SE-901 87 Umeå, Sweden (download)
  • Carl Christian Kjelgaard Mikkelsen, Mirko Myllykoski, Björn Adlerborn, Lars Karlsson, Bo Kågström: D2.5 Eigenvalue Problem Solvers, public NLAFET deliverable, 2017 (download)

The standard eigenvalue problem

Given a square matrix $A$ of size $n \times n$, the standard eigenvalue problem (SEP) consists of finding eigenvalues $\lambda_i \in \mathbb C$ and associated eigenvectors $0 \neq v_i \in \mathbb C^{n}$ such that

\[ A v_i = \lambda_i v_i, \text{ for } i = 1, 2, \dots, n. \]

The eigenvalues are the $n$ (potentially complex) roots of the polynomial $\text{det}(A - \lambda I) = 0$ of degree $n$. There is often a full set of $n$ linearly independent eigenvectors, but if there are multiple eigenvalues (i.e., if $\lambda_{i} = \lambda_{j}$ for some $i \neq j$) then there might not be a full set of independent eigenvectors.

Reduction to Hessenberg form

The dense matrix $A$ is condensed to Hessenberg form by computing a Hessenberg decomposition

\[ A = Q_{1} H Q_{1}^{H}, \]

where $Q_{1}$ is unitary and $H$ is upper Hessenberg. This is done in order to greatly accelerate the subsequent computation of a Schur decomposition since when working on $H$ of size $n \times n$, the amount of work in each iteration of the QR algorithm is reduced from $\mathcal{O}(n^3)$ to $\mathcal{O}(n^2)$ flops.

Reduction to Schur form

Starting from the Hessenberg matrix $H$ we compute a Schur decomposition

\[ H = Q_{2} S Q_{2}^H, \]

where $Q_{2}$ is unitary and $S$ is upper triangular. The eigenvalues of $A$ can now be determined as they appear on the diagonal of $S$, i.e., $\lambda_i = s_{ii}$. For real matrices there is a similar decomposition known as the real Schur decomposition

\[ H = Q_{2} S Q_{2}^T, \]

where $Q_{2}$ is orthogonal and $S$ is upper quasi-triangular with $1\times 1$ and $2 \times 2$ blocks on the diagonal. The $1 \times 1$ blocks correspond to the real eigenvalues and each $2 \times 2$ block corresponds to a pair of complex conjugate eigenvalues.

Eigenvalue reordering and invariant subspaces

Given a subset consisting of $m \leq n$ of the eigenvalues, we can reorder the eigenvalues on the diagonal of the Schur form by constructing a unitary matrix $Q_{3}$ such that

\[ S = Q_{3} \begin{bmatrix} \hat S_{11} & \hat S_{12} \\ 0 & \hat S_{22} \end{bmatrix} Q_{3}^{H} \]

and the eigenvalues of the $m \times m$ block $\hat S_{11}$ are the selected eigenvalues. The first $m$ columns of $Q_{3}$ span an invariant subspace associated with the selected eigenvalues.

Computation of eigenvectors

Given a subset consisting of $m \leq n$ of the eigenvalues $\lambda_{i}$ for $i = 1, 2, \ldots, m$ and a Schur decomposition $A = Q S Q^{H}$, we can compute for each $\lambda_{i}$ an eigenvector $v_{i} \neq 0$ such that $A v_{i} = \lambda_{i} v_{i}$ by first computing an eigenvector $w_{i}$ of $S$ and then transform it back to the original basis by pre-multiplication with $Q$.

The generalized eigenvalue problem

Given a square matrix pencil $A - \lambda B$, where $A, B \in \mathbb{C}^{n \times n}$, the generalized eigenvalue problem (GEP) consists of finding generalized eigenvalues $\lambda_i \in \mathbb C$ and associated generalized eigenvectors $0 \neq v_i \in \mathbb C^{n}$ such that

\[ A v_{i} = \lambda_{i} B v_{i}, \text{ for } i = 1, 2, \dots, n. \]

The eigenvalues are the $n$ (potentially complex) roots of the polynomial $\text{det}(A - \lambda B) = 0$ of degree $n$. There is often a full set of $n$ linearly independent generalized eigenvectors, but if there are multiple eigenvalues (i.e., if $\lambda_{i} = \lambda_{j}$ for some $i \neq j$) then there might not be a full set of independent eigenvectors.

At least in principle, a GEP can be transformed into a SEP provided that $B$ is invertible, since

\[ A v = \lambda B v \Leftrightarrow (B^{-1} A) v = \lambda v. \]

However, in finite precision arithmetic this practice is not recommended.

Reduction to Hessenberg-triangular form

The dense matrix pair $(A, B)$ is condensed to Hessenberg-triangular form by computing a Hessenberg-triangular decomposition

\[ A = Q_{1} H Z_{1}^{H}, \quad B = Q_{1} Y Z_{1}^{H}, \]

where $Q_{1}, Z_{1}$ are unitary, $H$ is upper Hessenberg, and $Y$ is upper triangular. This is done in order to greatly accelerate the subsequent computation of a generalized Schur decomposition.

Reduction to generalized Schur form

Starting from the Hessenberg-triangular pencil $H - \lambda Y$ we compute a generalized Schur decomposition

\[ H = Q_{2} S Z_{2}^H, \quad Y = Q_{2} T Z_{2}^{H}, \]

where $Q_{2}, Z_{2}$ are unitary and $S, T$ are upper triangular. The eigenvalues of $A - \lambda B$ can now be determined from the diagonal element pairs $(s_{ii}, t_{ii})$, i.e., $\lambda_i = s_{ii} / t_{ii}$ (if $t_{ii} \neq 0$). If $s_{ii} \neq 0$ and $t_{ii} = 0$, then $\lambda_{i} = \infty$ is an infinite eigenvalue of the matrix pair $(S,T)$. (If both $s_{ii} = 0$ and $t_{ii} = 0$ for some $i$, then the pencil is singular and the eigenvalues are undetermined; all complex numbers are eigenvalues). For real matrix pairs there is a similar decomposition known as the real generalized Schur decomposition

\[ H = Q_{2} S Z_{2}^T, \quad Y = Q_{2} T Z_{2}^{T}, \]

where $Q_{2}, Z_{2}$ are orthogonal, $S$ is upper quasi-triangular with $1 \times 1$ and $2 \times 2$ blocks on the diagonal, and $T$ is upper triangular. The $1 \times 1$ blocks on the diagonal of $S - \lambda T$ correspond to the real generalized eigenvalues and each $2 \times 2$ block corresponds to a pair of complex conjugate generalized eigenvalues.

Eigenvalue reordering and deflating subspaces

Given a subset consisting of $m \leq n$ of the generalized eigenvalues, we can reorder the generalized eigenvalues on the diagonal of the generalized Schur form by constructing unitary matrices $Q_{3}$ and $Z_{3}$ such that

\[ S - \lambda T = Q_{3} \begin{bmatrix} \hat S_{11} - \lambda \hat T_{11} & \hat S_{12} - \lambda \hat T_{12} \\ 0 & \hat S_{22} - \lambda \hat T_{22} \end{bmatrix} Z_{3}^{H} \]

and the eigenvalues of the $m \times m$ block pencil $\hat S_{11} - \lambda \hat T_{11}$ are the selected generalized eigenvalues. The first $m$ columns of $Z_{3}$ spans a right deflating subspace associated with the selected generalized eigenvalues.

Computation of generalized eigenvectors

Given a subset consisting of $m \leq n$ of the eigenvalues $\lambda_{i}$ for $i = 1, 2, \ldots, m$ and a generalized Schur decomposition $A - \lambda B = Q (S - \lambda T) Z^{H}$, we can compute for each $\lambda_{i}$ a generalized eigenvector $v_{i} \neq 0$ such that $A v_{i} = \lambda_{i} B v_{i}$ by first computing a generalized eigenvector $w_{i}$ of $S - \lambda_{i} T$ and then transform it back to the original basis by pre-multiplication with $Z$.