Yes.
Approach No. 1
Within cuSOLVER you can use QR decomposition, see QR decomposition for solving linear systems in CUDA .
Approach No. 2
Alternatively, you can calculate the inverse of the sequence involution
cublas<t>getrfBatched()
which calculates the LU decomposition of the matrix, and
cublas<t>getriBatched()
which computes the inverse matrix starting with its LU decomposition.
Approach No. 3
The ultimate opportunity is to use
cublas<t>getrfBatched()
followed by a double call
cublas<t>trsm()
which solves the upper or lower triangular linear systems.
As Robert Kroella noted, the answer may vary depending on the size and type of matrices involved.
Code for the nr approach. 1
Please see the QR decomposition for solving linear systems in CUDA .
Code for nr approaches. 2 and nr. 3
Below I report a working example for implementing nr approaches. 2 and 3 . Hankel matrices are used to power approaches with well-conditioned, reversible matrices. Please note that the approach is nr. 3 requires a permutation (permutation) of the vector of system coefficients in accordance with the summary array obtained after calling cublas<t>getrfBatched() . This permutation can be performed on the CPU.
#include <stdio.h>
The Utilities.cu and Utilities.cuh files needed to run this example are supported on this github page. TimingGPU.cu and TimingGPU.cuh supported on this github page.
Some useful links to the third approach:
NAG Fortran Training Document
Computational Library Library (SCSL) User Guide
https://www.cs.drexel.edu/~jjohnson/2010-11/summer/cs680/programs/lapack/Danh/verify_sequential.c
EDIT
Dates (in ms) for nr approaches. 2 and 3 (tests performed on a GTX960 card, see 5.2).
N LU decomposition Approach nr. 2 Approach nr. 3 100 1.08 2.75 1.28 500 45.4 161 45.7 1000 302 1053 303
As it arises, we approach nr. 3 is more convenient, and its cost is, in fact, the cost of calculating the factorization of LU. Moreover:
- Solving linear systems using LU decomposition is faster than using QR decomposition (see QR decomposition for solving linear systems in CUDA );
- LU decomposition is limited to square linear systems, and QR decomposition helps with non-square linear systems .
Below is the matlab code to check the results
clear all close all clc warning off N = 1000; % --- Setting the problem solution x = ones(N, 1); %%%%%%%%%%%%%%%%%%%%% % NxN HANKEL MATRIX % %%%%%%%%%%%%%%%%%%%%% % --- https://en.wikipedia.org/wiki/Hankel_matrix load A.txt load y.txt A = reshape(A, N, N); yMatlab = A * x; fprintf('Percentage rms between coefficients vectors in Matlab and CUDA %f\n', 100 * sqrt(sum(sum(abs(yMatlab - y).^2)) / sum(sum(abs(yMatlab).^2)))); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % COMPUTATION OF THE LU DECOMPOSITION % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [Lmatlab, Umatlab] = lu(A); load Adecomposed.txt Adecomposed = reshape(Adecomposed, N, N); L = eye(N); for k = 1 : N L(k + 1 : N, k) = Adecomposed(k + 1 : N, k); end U = zeros(N); for k = 1 : N U(k, k : N) = Adecomposed(k, k : N); end load pivotArray.txt Pj = eye(N); for j = 1 : N tempVector = Pj(j, :); Pj(j, :) = Pj(pivotArray(j), :); Pj(pivotArray(j), :) = tempVector; end fprintf('Percentage rms between Pj * A and L * U in CUDA %f\n', 100 * sqrt(sum(sum(abs(Pj * A - L * U).^2)) / sum(sum(abs(Pj * A).^2)))); xprime = inv(Lmatlab) * yMatlab; xMatlab = inv(Umatlab) * xprime; fprintf('Percentage rms between reference solution and solution in Matlab %f\n', 100 * sqrt(sum(sum(abs(xMatlab - x).^2)) / sum(sum(abs(x).^2)))); load xApproach1.txt fprintf('Percentage rms between reference solution and solution in CUDA for approach nr.1 %f\n', 100 * sqrt(sum(sum(abs(xApproach1 - x).^2)) / sum(sum(abs(x).^2)))); load xApproach2.txt fprintf('Percentage rms between reference solution and solution in CUDA for approach nr.2 %f\n', 100 * sqrt(sum(sum(abs(xApproach2 - x).^2)) / sum(sum(abs(x).^2))));