StarNEig User's Guide  v0.1.4
A task-based library for solving dense nonsymmetric eigenvalue problems
Introduction

StarNEig library aims to provide a complete task-based software stack for solving dense nonsymmetric (generalized) eigenvalue problems. The library is built on top of the StarPU runtime system and targets both shared memory and distributed memory machines. Some components of the library support GPUs.

The four main components of the library are:

  • Hessenberg(-triangular) reduction: A dense matrix (or a dense matrix pair) is reduced to upper Hessenberg (or Hessenberg-triangular) form.
  • Schur reduction (QR/QZ algorithm): A upper Hessenberg matrix (or a Hessenberg-triangular matrix pair) is reduced to (generalized) Schur form. The (generalized) eigenvalues can be determined from the diagonal blocks.
  • Eigenvalue reordering and deflating subspaces: Reorders a user-selected set of (generalized) eigenvalues to the upper left corner of an updated (generalized) Schur form.
  • Computation of eigenvectors: Computes (generalized) eigenvectors for a user-selected set of (generalized) eigenvalues.

The library has been developed as a part of the NLAFET project. The project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 671633. Support has also been received from eSSENCE, a collaborative e-Science programme funded by the Swedish Government via the Swedish Research Council (VR), and VR Grant E0485301. The library is published under open-source BSD 3-Clause license.

Please cite the following article when referring to StarNEig:

Mirko Myllykoski, Carl Christian Kjelgaard Mikkelsen: Task-based, GPU-accelerated and Robust Library for Solving Dense Nonsymmetric Eigenvalue Problems, Concurrency and Computation: Practice and Experience, 2020, doi: 10.1002/cpe.5915

Please see publications and authors.

Performance

Performance comparisons against MAGMA (GPU) and ScaLAPACK (distributed memory), and strong scalability on shared and distributed memory machines:

performance.png

Also, see following publications:

  • Mirko Myllykoski: A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation, arXiv:2007.03576
  • Mirko Myllykoski, Carl Christian Kjelgaard Mikkelsen: Task-based, GPU-accelerated and Robust Library for Solving Dense Nonsymmetric Eigenvalue Problems, Concurrency and Computation: Practice and Experience, 2020, doi: 10.1002/cpe.5915
  • Mirko Myllykoski, Carl Christian Kjelgaard Mikkelsen, Angelika Schwarz, Bo Kågström: D2.7 Eigenvalue solvers for nonsymmetric problems, public NLAFET deliverable, 2019 (download)

Current status (stable series)

The library currently supports only real arithmetic (real input and output matrices but real and/or complex eigenvalues and eigenvectors). In addition, some interface functions are implemented as LAPACK and ScaLAPACK wrapper functions.

Standard eigenvalue problems:

Component Shared memory Distributed memory CUDA
Hessenberg reduction Complete ScaLAPACK Single GPU
Schur reduction Complete Complete Experimental
Eigenvalue reordering Complete Complete Experimental
Eigenvectors Complete

Generalized eigenvalue problems:

Component Shared memory Distributed memory CUDA
HT reduction LAPACK 3rd party
Schur reduction Complete Complete Experimental
Eigenvalue reordering Complete Complete Experimental
Eigenvectors Complete

Please see changelog and known problems.

Documentation:

The StarNEig User's Guide is available in both HTML and PDF formats at https://nlafet.github.io/StarNEig. The PDF version is also available under releases.

Quickstart guide

Dependencies

Library dependencies:

  • Linux
  • CMake 3.3 or newer
  • Portable Hardware Locality (hwloc)
  • Starpu 1.2 or 1.3
    • Newer versions require the user set the STARPU_LIBRARIES, STARPU_MPI_LIBRARIES and STARPU_INCLUDE_PATH environmental variables.
  • OpenBLAS, MKL, GotoBLAS or single-threaded BLAS library
  • LAPACK
  • MPI (optional)
  • CUDA + cuBLAS (optional)
  • ScaLAPACK + BLACS (optional)

Test program and example code dependencies:

  • pkg-config
  • GNU Scientific Library (optional)
  • MAGMA (optional)

Configure, build and install

Execute in the same directory as this README.md file:

$ mkdir build
$ cd build/
$ cmake ../
$ make
$ make test
$ sudo make install

Example

The following example demonstrates how a dense matrix A is reduced to real Schur form:

#include <stdlib.h>
#include <time.h>
int main()
{
int n = 3000;
srand((unsigned) time(NULL));
// generate a random matrix A
int ldA = ((n/8)+1)*8;
double *A = malloc(n*ldA*sizeof(double));
for (int j = 0; j < n; j++)
for (int i = 0; i < n; i++)
A[j*ldA+i] = C[j*ldC+i] = 2.0*rand()/RAND_MAX - 1.0;
// generate an identity matrix Q
int ldQ = ((n/8)+1)*8;
double *Q = malloc(n*ldA*sizeof(double));
for (int j = 0; j < n; j++)
for (int i = 0; i < n; i++)
Q[j*ldQ+i] = i == j ? 1.0 : 0.0;
// allocate space for the eigenvalues
double *real = malloc(n*sizeof(double));
double *imag = malloc(n*sizeof(double));
// initialize the StarNEig library
// reduce matrix A to real Schur form S = Q^T A Q
n, A, ldA, Q, ldQ, real, imag, NULL, NULL, NULL, NULL);
// de-initialize the StarNEig library
free(A); free(Q); free(real); free(imag);
return 0;
}