StarNEig User's Guide
v0.1.4
A task-based library for solving dense nonsymmetric eigenvalue problems
|
Configuration structures and functions for the expert interface functions. More...
Data Structures | |
struct | starneig_hessenberg_conf |
Hessenberg reduction configuration structure. More... | |
struct | starneig_schur_conf |
Schur reduction configuration structure. More... | |
struct | starneig_reorder_conf |
Eigenvalue reordering configuration structure. More... | |
struct | starneig_eigenvectors_conf |
Eigenvector computation configuration structure. More... | |
Hessenberg reduction | |
void | starneig_hessenberg_init_conf (struct starneig_hessenberg_conf *conf) |
Initializes a Hessenberg reduction configuration structure with default parameters. More... | |
#define | STARNEIG_HESSENBERG_DEFAULT_TILE_SIZE -1 |
Default tile size. | |
#define | STARNEIG_HESSENBERG_DEFAULT_PANEL_WIDTH -1 |
Default panel width. | |
#define | STARNEIG_HESSENBERG_DEFAULT_PARALLEL_WORKER_SIZE -1 |
Default parallel worker size. | |
Schur reduction | |
void | starneig_schur_init_conf (struct starneig_schur_conf *conf) |
Initializes a Schur reduction configuration structure with default parameters. More... | |
#define | STARNEIG_SCHUR_DEFAULT_INTERATION_LIMIT -1 |
Default iteration limit. | |
#define | STARNEIG_SCHUR_DEFAULT_TILE_SIZE -1 |
Default tile size. | |
#define | STARNEIG_SCHUR_DEFAULT_SMALL_LIMIT -1 |
Default sequential QR limit. | |
#define | STARNEIG_SCHUR_DEFAULT_AED_WINDOW_SIZE -1 |
Default AED window size. | |
#define | STARNEIG_SCHUR_DEFAULT_AED_NIBBLE -1 |
Default nibble value. | |
#define | STARNEIG_SCHUR_DEFAULT_AED_PARALLEL_SOFT_LIMIT -1 |
Default soft sequential AED limit. | |
#define | STARNEIG_SCHUR_DEFAULT_AED_PARALLEL_HARD_LIMIT -1 |
Default hard sequential AED limit. | |
#define | STARNEIG_SCHUR_DEFAULT_SHIFT_COUNT -1 |
Default shift count. | |
#define | STARNEIG_SCHUR_DEFAULT_WINDOW_SIZE -1 |
Default bulge chasing window size. | |
#define | STARNEIG_SCHUR_ROUNDED_WINDOW_SIZE -2 |
Rounded bulge chasing window. | |
#define | STARNEIG_SCHUR_DEFAULT_SHIFTS_PER_WINDOW -1 |
Default number of shifts per bulge chasing window. | |
#define | STARNEIG_SCHUR_DEFAULT_UPDATE_WIDTH -1 |
Default left-hand side update width. | |
#define | STARNEIG_SCHUR_DEFAULT_UPDATE_HEIGHT -1 |
Default right-hand side update height. | |
#define | STARNEIG_SCHUR_DEFAULT_THRESHOLD -1 |
Default deflation threshold. | |
#define | STARNEIG_SCHUR_NORM_STABLE_THRESHOLD -2 |
Norm stable deflation threshold. | |
#define | STARNEIG_SCHUR_LAPACK_THRESHOLD -3 |
LAPACK-style deflation threshold. | |
Eigenvectors | |
void | starneig_eigenvectors_init_conf (struct starneig_eigenvectors_conf *conf) |
Initializes an eigenvectors configuration structure with default parameters. More... | |
#define | STARNEIG_EIGENVECTORS_DEFAULT_TILE_SIZE -1 |
Default tile size. | |
Configuration structures and functions for the expert interface functions.
struct starneig_hessenberg_conf |
Hessenberg reduction configuration structure.
Data Fields | ||
---|---|---|
int | tile_size |
The matrices are divided into square tiles. This parameter defines the used tile size. If the parameter is set to STARNEIG_HESSENBERG_DEFAULT_TILE_SIZE, then the implementation will determine a suitable tile size automatically. |
int | panel_width |
The reduction is performed one panel at a time. This parameter defines the used panel width. If the parameter is set to STARNEIG_HESSENBERG_DEFAULT_PANEL_WIDTH, then the implementation will determine a suitable panel width automatically. |
int | parallel_worker_size |
The CPU variants of the panel reduction and trailing matrix update tasks are multithreaded. This parameter defines the number of cores allocated to these tasks. If the parameter is set to STARNEIG_HESSENBERG_DEFAULT_PARALLEL_WORKER_SIZE, then the implementation will determine a suitable CPU core count automatically. |
struct starneig_schur_conf |
Schur reduction configuration structure.
Data Fields | ||
---|---|---|
int | iteration_limit |
The QR/QZ is an iterative algorithm. This parameter defines the maximum number of iterations the algorithm is allowed to perform. If the parameter is STARNEIG_SCHUR_DEFAULT_INTERATION_LIMIT, then the implementation will determine a suitable iteration limit automatically. |
int | tile_size |
The matrices are divided into square tiles. This parameter defines the used tile size. If the parameter is set to STARNEIG_SCHUR_DEFAULT_TILE_SIZE, then the implementation will determine a suitable tile size automatically. |
int | small_limit |
As the QR/QZ algorithm progresses, the size of the active region shrinks. Once the size of the active region is small enough, then the remaining problem is solved in a sequential manner. This parameter defines the transition point where the implementation switches to a sequential QR algorithm. If the parameter is set to STARNEIG_SCHUR_DEFAULT_SMALL_LIMIT, then the implementation will determine a suitable switching point automatically. |
int | aed_window_size |
The implementation relies on a so-called Aggressive Early Deflation (AED) technique to accelerate the convergence of the algorithm. Each AED is performed inside a small diagonal window. This parameter defines used AED window size. If the parameter is set to STARNEIG_SCHUR_DEFAULT_AED_WINDOW_SIZE, then the implementation will determine a suitable AED window size automatically. |
int | aed_nibble |
The implementation relies on a so-called Aggressive Early Deflation (AED) technique to accelerate the convergence of the algorithm. Each AED is performed inside a small diagonal window. If the number deflated (converged) eigenvalues is larger than |
int | aed_parallel_soft_limit |
The implementation relies on a so-called Aggressive Early Deflation (AED) technique to accelerate the convergence of the algorithm. Each AED is performed inside a small diagonal window. An AED can be performed sequentially or in parallel. This parameter defines the transition point where the implementation allowed to switch to a sequential AED algorithm. The decision is made based on the size of the AED window. If the parameter is set to STARNEIG_SCHUR_DEFAULT_AED_PARALLEL_SOFT_LIMIT, then the implementation will determine a suitable switching point automatically. |
int | aed_parallel_hard_limit |
The implementation relies on a so-called Aggressive Early Deflation (AED) technique to accelerate the convergence of the algorithm. Each AED is performed inside a small diagonal window. An AED can be performed sequentially or in parallel. This parameter defines the transition point where the implementation switches to a sequential AED algorithm. The decision is made based on the size of the AED window. If the parameter is set to STARNEIG_SCHUR_DEFAULT_AED_PARALLEL_HARD_LIMIT, then the implementation will determine a suitable switching point automatically. |
int | shift_count |
The QR/QZ algorithm chases a set of bulges across the diagonal of the Hessenberg(-triangular) decomposition. Two shifts (eigenvalue estimates) are required to generate each bulge. This parameter defines the number of shifts to use. If the parameter is set to STARNEIG_SCHUR_DEFAULT_SHIFT_COUNT, then the implementation will determine a suitable shift count automatically. |
int | window_size |
The QR/QZ algorithm chases a set of bulges across the diagonal of the Hessenberg(-triangular) decomposition. The bulges are chased in batches. The related similarity transformations are initially restricted to inside a small diagonal window and the accumulated transformation are applied only later as BLAS-3 updates. This parameter defines the used bulge chasing window size. If the parameter is set to STARNEIG_SCHUR_ROUNDED_WINDOW_SIZE, then
If the parameter is set to STARNEIG_SCHUR_DEFAULT_WINDOW_SIZE, then the implementation will determine a suitable window size automatically. |
int | shifts_per_window |
The QR/QZ algorithm chases a set of bulges across the diagonal of the Hessenberg(-triangular) decomposition. The bulges are chased in batches. This parameter defines the used batch size. If the parameter is set to STARNEIG_SCHUR_DEFAULT_SHIFTS_PER_WINDOW then the implementation will determine a suitable batch size automatically. |
int | update_width |
The similarity similarity transformations are initially restricted to inside a small diagonal window and the accumulated transformation are applied only later as BLAS-3 updates. This parameter defines the width of each left-hand side update task. The value should be multiple of the tile size. If the parameter is set to STARNEIG_SCHUR_DEFAULT_UPDATE_WIDTH, then the implementation will determine a suitable width automatically. |
int | update_height |
The similarity similarity transformations are initially restricted to inside a small diagonal window and the accumulated transformation are applied only later as BLAS-3 updates. This parameter defines the height of each right-hand side update task. The value should be multiple of the tile size. If the parameter is set to STARNEIG_SCHUR_DEFAULT_UPDATE_HEIGHT, then the implementation will determine a suitable height automatically. |
double | left_threshold |
The QR/QZ algorithm is allowed to set tiny matrix entires to zero as long as their magnitudes are smaller that a given threshold. This parameter defines the threshold for the left-hand side matrix ( ). If the parameter is set to STARNEIG_SCHUR_DEFAULT_THRESHOLD, then the implementation will determine a suitable threshold automatically. If the parameter is set to STARNEIG_SCHUR_NORM_STABLE_THRESHOLD, then the implementation will use the threshold , where is the unit roundoff and is the Frobenius norm of the matrix . If the parameter is set to STARNEIG_SCHUR_LAPACK_THRESHOLD, then the implementation will use a deflation threshold that is compatible with LAPACK. |
double | right_threshold |
The QZ algorithm is allowed to set tiny matrix entires to zero as long as their magnitudes are smaller that a given threshold. This parameter defines the threshold for the right-hand side matrix ( ) off-diagonal entires. If the parameter is set to STARNEIG_SCHUR_DEFAULT_THRESHOLD, then the implementation will determine a suitable threshold automatically. If the parameter is set to STARNEIG_SCHUR_NORM_STABLE_THRESHOLD, then the implementation will use the threshold , where is the unit roundoff and is the Frobenius norm of the matrix . If the parameter is set to STARNEIG_SCHUR_LAPACK_THRESHOLD, then the implementation will use a deflation threshold that is compatible with LAPACK. |
double | inf_threshold |
The QZ algorithm is allowed to set tiny matrix entires to zero as long as their magnitudes are smaller that a given threshold. This parameter defines the threshold for the right-hand side matrix ( ) diagonal entries. If the parameter is set to STARNEIG_SCHUR_DEFAULT_THRESHOLD, then the implementation will determine a suitable threshold automatically. If the parameter is set to STARNEIG_SCHUR_NORM_STABLE_THRESHOLD, then the implementation will use the threshold , where is the unit roundoff and is the Frobenius norm of the matrix . |
struct starneig_reorder_conf |
Eigenvalue reordering configuration structure.
Data Fields | ||
---|---|---|
starneig_reorder_plan_t | plan |
This parameter plan defines the used reordering plan. If the parameter is set to STARNEIG_REORDER_DEFAULT_PLAN, then the implementation will determine a suitable reordering plan automatically. |
starneig_reorder_blueprint_t | blueprint |
This parameter defines the used task insertion blueprint. If the parameter is set to STARNEIG_REORDER_DEFAULT_BLUEPRINT, then the implementation will determine a suitable task insertion blueprint automatically. |
int | tile_size |
The matrices are divided into square tiles. This parameter defines the used tile size. If the parameter is set to STARNEIG_REORDER_DEFAULT_TILE_SIZE, then the implementation will determine a suitable tile size automatically. |
int | values_per_chain |
The selected eigenvalues are processed in batches and each batch is assigned a window chain. This parameter defines the number of selected eigenvalues processed by each window chain. If the parameter is set to STARNEIG_REORDER_DEFAULT_VALUES_PER_CHAIN, then the implementation will determine a suitable value automatically. |
int | window_size |
The similarity similarity transformations are initially restricted to inside a small diagonal window and the accumulated transformation are applied only later as BLAS-3 updates. This parameter defines the size of the window. If the parameter is set to STARNEIG_REORDER_ROUNDED_WINDOW_SIZE, then
If the parameter is set to STARNEIG_REORDER_DEFAULT_WINDOW_SIZE, then the implementation will determine a suitable window size automatically. |
int | small_window_size |
Larger diagonal window are processed using even smaller diagonal windows in a recursive manner. This parameter defines the used small window size. If the parameter is set to STARNEIG_REORDER_DEFAULT_SMALL_WINDOW_SIZE, then the implementation will determine a suitable small window size automatically. |
int | small_window_threshold |
Larger diagonal window are processed using even smaller diagonal windows in a recursive manner. This parameter defines the largest diagonal window that is processed in a scalar manner. If the parameter is set to STARNEIG_REORDER_DEFAULT_SMALL_WINDOW_THRESHOLD, then the implementation will determine a suitable threshold automatically. |
int | update_width |
The similarity similarity transformations are initially restricted to inside a small diagonal window and the accumulated transformation are applied only later as BLAS-3 updates. This parameter defines the width of each left-hand side update task. The value should be multiple of the tile size. If the parameter is set to STARNEIG_REORDER_DEFAULT_UPDATE_WIDTH, then the implementation will determine a suitable width automatically. |
int | update_height |
The similarity similarity transformations are initially restricted to inside a small diagonal window and the accumulated transformation are applied only later as BLAS-3 updates. This parameter defines the height of each right-hand side update task. The value should be multiple of the tile size. If the parameter is set to STARNEIG_REORDER_DEFAULT_UPDATE_HEIGHT, then the implementation will determine a suitable height automatically. |
struct starneig_eigenvectors_conf |
Eigenvector computation configuration structure.
Data Fields | ||
---|---|---|
int | tile_size |
The matrices are divided into tiles. This parameter defines the used tile size. If the parameter is set to STARNEIG_EIGENVECTORS_DEFAULT_TILE_SIZE, then the implementation will determine a suitable tile size automatically. |
Reordering plan enumerator.
Eigenvalues that fall within a diagonal computation window are reordered such that all selected eigenvalues are moved to the upper left corner of the window. The corresponding orthogonal transformations are accumulated to separate accumulator matrix / matrices.
A window chain comprises from multiple overlapping diagonal computation windows that are intended to be processed in a particular order. More precisely, the windows are placed such that the overlap between two windows is big enough to accommodate all selected eigenvalues that fall within the preceding windows. In this way, the windows can be processed in sequential order, starting from the bottom window, such that the reordering that takes place in one window always moves the preceding selected eigenvalues to the lower right corner of the next window. In the end, all selected that fall within the combined computation area of the chain are moved to the upper left corner of the topmost window.
An example showing how an eigenvalue can be moved six entries upwards by using three diagonal windows:
The number of selected eigenvalues that can be moved by a single window chain is limited by the windows size. Thus, the whole reordering procedure usually involves multiple chains that must be be processed in a particular order. A chain list describes a list of chains that are intended to be processed together. Window chains that belong to different chain lists are processed separately.
A plan consists from one or more chain lists that are intended to be processed in a particular order.
STARNEIG_REORDER_ONE_PART_PLAN :
The first chain is placed in the upper left corner of the matrix and its size is chosen such that it contains a desired number of selected eigenvalues (starneig_reorder_conf::values_per_chain parameter). The next chain is places such that its upper left corner is located one entry after the location where the last selected eigenvalue, that falls within the first chain, would be after the reordering. The chain is sized such that the part of the chain, that does not intersect the first chain, contain the desired number of selected eigenvalues. This same procedure is repeated until all selected eigenvalues have been accounted for. All chains belong to the same chain lists and are intended to be processed sequentially.
An example showing the placement of the chains in a case where each chain wields two selected eigenvalues:
An example showing what happens when the first three chains are processed:
If necessary, each chain is re-sized to avoid splitting any tiles.
Windows are placed such that the first window is located in the lower right corner of the computation area of the window chain. The last window is correspondingly placed in the upper left corner of the computation area.
If necessary, each window is re-sized to avoid splitting any tiles.
STARNEIG_REORDER_MULTI_PART_PLAN :
A multi-part reordering plan is derived from an one-part reordering plan by splitting the chains into sub-chains as shown below:
Note that the chains that belong to the same chain list are independent from each other and can therefore be processed in an arbitrary order.
Enumerator | |
---|---|
STARNEIG_REORDER_DEFAULT_PLAN | Default plan. |
STARNEIG_REORDER_ONE_PART_PLAN | One part plan. |
STARNEIG_REORDER_MULTI_PART_PLAN | Multi part plan. |
Task insertion blueprint.
A task insertion blueprint defines how a reordering plan is carried out.
void starneig_hessenberg_init_conf | ( | struct starneig_hessenberg_conf * | conf | ) |
Initializes a Hessenberg reduction configuration structure with default parameters.
[out] | conf | The Hessenberg reduction configuration structure. |
void starneig_schur_init_conf | ( | struct starneig_schur_conf * | conf | ) |
Initializes a Schur reduction configuration structure with default parameters.
[out] | conf | The Schur reduction configuration structure. |
void starneig_reorder_init_conf | ( | struct starneig_reorder_conf * | conf | ) |
Initializes an eigenvalue reordering configuration structure with default parameters.
[out] | conf | The eigenvalue reordering configuration structure. |
void starneig_eigenvectors_init_conf | ( | struct starneig_eigenvectors_conf * | conf | ) |
Initializes an eigenvectors configuration structure with default parameters.
[out] | conf | The eigenvectors configuration structure. |