On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method

Myllykoski, Mirko; Rossi, Tuomo; Toivanen, Jari

doi:10.1016/j.jpdc.2018.01.004

dc.contributor.author	Myllykoski, Mirko
dc.contributor.author	Rossi, Tuomo
dc.contributor.author	Toivanen, Jari
dc.date.accessioned	2018-02-20T13:07:09Z
dc.date.available	2020-06-01T21:35:12Z
dc.date.issued	2018
dc.identifier.citation	Myllykoski, M., Rossi, T., & Toivanen, J. (2018). On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method. <i>Journal of Parallel and Distributed Computing</i>, <i>115</i>(May), 56-66. <a href="https://doi.org/10.1016/j.jpdc.2018.01.004" target="_blank">https://doi.org/10.1016/j.jpdc.2018.01.004</a>
dc.identifier.other	CONVID_27908455
dc.identifier.other	TUTKAID_76845
dc.identifier.uri	https://jyx.jyu.fi/handle/123456789/57129
dc.description.abstract	Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques.	en
dc.language.iso	eng
dc.publisher	Academic Press
dc.relation.ispartofseries	Journal of Parallel and Distributed Computing
dc.subject.other	reduction
dc.subject.other	Fast direct solver
dc.subject.other	GPU computing
dc.subject.other	partial solution technique
dc.subject.other	PSCR method
dc.subject.other	Roofline model
dc.subject.other	Separable block tridiagonal linear system
dc.title	On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method
dc.type	article
dc.identifier.urn	URN:NBN:fi:jyu-201802191534
dc.contributor.laitos	Informaatioteknologian tiedekunta	fi
dc.contributor.laitos	Faculty of Information Technology	en
dc.contributor.oppiaine	Tietotekniikka	fi
dc.contributor.oppiaine	Mathematical Information Technology	en
dc.type.uri	http://purl.org/eprint/type/JournalArticle
dc.date.updated	2018-02-19T13:15:05Z
dc.type.coar	http://purl.org/coar/resource_type/c_2df8fbb1
dc.description.reviewstatus	peerReviewed
dc.format.pagerange	56-66
dc.relation.issn	0743-7315
dc.relation.numberinseries	May
dc.relation.volume	115
dc.type.version	acceptedVersion
dc.rights.copyright	© 2018 Elsevier Inc. This is a final draft version of an article whose final and definitive form has been published by Elsevier Inc. Published in this repository with the kind permission of the publisher.
dc.rights.accesslevel	openAccess	fi
dc.relation.grantnumber	295897
dc.subject.yso	tietotekniikka
dc.subject.yso	lineaariset mallit
dc.subject.yso	pienennys
jyx.subject.uri	http://www.yso.fi/onto/yso/p5462
jyx.subject.uri	http://www.yso.fi/onto/yso/p25748
jyx.subject.uri	http://www.yso.fi/onto/yso/p25829
dc.relation.doi	10.1016/j.jpdc.2018.01.004
dc.relation.funder	Suomen Akatemia	fi
dc.relation.funder	Research Council of Finland	en
jyx.fundingprogram	Akatemiahanke, SA	fi
jyx.fundingprogram	Academy Project, AoF	en
jyx.fundinginformation	The research of the first author was supported by the Academy of Finland[grant number 252549]; the Jyväskylä Doctoral Program in Computing and Mathematical Sciences ; and the Foundation of Nokia Corporation (Project number 201510310). The research of the third author was supported by the Academy of Finland[grant numbers 252549, 295897]
dc.type.okm	A1

Aineistoon kuuluvat tiedostot

Nimi:: myllykoskirossitoivanenonsolvi ...
Koko:: 456.0Kb
Tiedostomuoto:: PDF
Kuvaus:: Final Draft

Katso/Avaa

Aineisto kuuluu seuraaviin kokoelmiin

Informaatioteknologian tiedekunta [2136]

Näytä suppeat kuvailutiedot

On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method

Aineistoon kuuluvat tiedostot

Aineisto kuuluu seuraaviin kokoelmiin

Samankaltainen aineisto

Implementation techniques for the lattice Boltzmann method ﻿

Lyapunov quantities and limit cycles in two-dimensional dynamical systems : analytical methods, symbolic computation and visualization ﻿

On GPU-accelerated fast direct solvers and their applications in image denoising ﻿

Stability and oscillation of dynamical systems : theory and applications ﻿

Efficient Bayesian generalized linear models with time-varying coefficients : The walker package in R ﻿

Implementation techniques for the lattice Boltzmann method

Lyapunov quantities and limit cycles in two-dimensional dynamical systems : analytical methods, symbolic computation and visualization

On GPU-accelerated fast direct solvers and their applications in image denoising

Stability and oscillation of dynamical systems : theory and applications

Efficient Bayesian generalized linear models with time-varying coefficients : The walker package in R