dc.contributor.author | Myllykoski, Mirko | |
dc.contributor.author | Rossi, Tuomo | |
dc.contributor.author | Toivanen, Jari | |
dc.date.accessioned | 2018-02-20T13:07:09Z | |
dc.date.available | 2020-06-01T21:35:12Z | |
dc.date.issued | 2018 | |
dc.identifier.citation | Myllykoski, M., Rossi, T., & Toivanen, J. (2018). On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method. <i>Journal of Parallel and Distributed Computing</i>, <i>115</i>(May), 56-66. <a href="https://doi.org/10.1016/j.jpdc.2018.01.004" target="_blank">https://doi.org/10.1016/j.jpdc.2018.01.004</a> | |
dc.identifier.other | CONVID_27908455 | |
dc.identifier.other | TUTKAID_76845 | |
dc.identifier.uri | https://jyx.jyu.fi/handle/123456789/57129 | |
dc.description.abstract | Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques. | en |
dc.language.iso | eng | |
dc.publisher | Academic Press | |
dc.relation.ispartofseries | Journal of Parallel and Distributed Computing | |
dc.subject.other | reduction | |
dc.subject.other | Fast direct solver | |
dc.subject.other | GPU computing | |
dc.subject.other | partial solution technique | |
dc.subject.other | PSCR method | |
dc.subject.other | Roofline model | |
dc.subject.other | Separable block tridiagonal linear system | |
dc.title | On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method | |
dc.type | article | |
dc.identifier.urn | URN:NBN:fi:jyu-201802191534 | |
dc.contributor.laitos | Informaatioteknologian tiedekunta | fi |
dc.contributor.laitos | Faculty of Information Technology | en |
dc.contributor.oppiaine | Tietotekniikka | fi |
dc.contributor.oppiaine | Mathematical Information Technology | en |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | |
dc.date.updated | 2018-02-19T13:15:05Z | |
dc.type.coar | http://purl.org/coar/resource_type/c_2df8fbb1 | |
dc.description.reviewstatus | peerReviewed | |
dc.format.pagerange | 56-66 | |
dc.relation.issn | 0743-7315 | |
dc.relation.numberinseries | May | |
dc.relation.volume | 115 | |
dc.type.version | acceptedVersion | |
dc.rights.copyright | © 2018 Elsevier Inc. This is a final draft version of an article whose final and definitive form has been published by Elsevier Inc. Published in this repository with the kind permission of the publisher. | |
dc.rights.accesslevel | openAccess | fi |
dc.relation.grantnumber | 295897 | |
dc.subject.yso | tietotekniikka | |
dc.subject.yso | lineaariset mallit | |
dc.subject.yso | pienennys | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p5462 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p25748 | |
jyx.subject.uri | http://www.yso.fi/onto/yso/p25829 | |
dc.relation.doi | 10.1016/j.jpdc.2018.01.004 | |
dc.relation.funder | Suomen Akatemia | fi |
dc.relation.funder | Research Council of Finland | en |
jyx.fundingprogram | Akatemiahanke, SA | fi |
jyx.fundingprogram | Academy Project, AoF | en |
jyx.fundinginformation | The research of the first author was supported by the Academy of Finland[grant number 252549]; the Jyväskylä Doctoral Program in Computing and Mathematical Sciences ; and the Foundation of Nokia Corporation (Project number 201510310). The research of the third author was supported by the Academy of Finland[grant numbers 252549, 295897] | |
dc.type.okm | A1 | |