Näytä suppeat kuvailutiedot

dc.contributor.authorMyllykoski, Mirko
dc.contributor.authorRossi, Tuomo
dc.contributor.authorToivanen, Jari
dc.date.accessioned2018-02-20T13:07:09Z
dc.date.available2020-06-01T21:35:12Z
dc.date.issued2018
dc.identifier.citationMyllykoski, M., Rossi, T., & Toivanen, J. (2018). On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method. <i>Journal of Parallel and Distributed Computing</i>, <i>115</i>(May), 56-66. <a href="https://doi.org/10.1016/j.jpdc.2018.01.004" target="_blank">https://doi.org/10.1016/j.jpdc.2018.01.004</a>
dc.identifier.otherCONVID_27908455
dc.identifier.otherTUTKAID_76845
dc.identifier.urihttps://jyx.jyu.fi/handle/123456789/57129
dc.description.abstractPartial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques.en
dc.language.isoeng
dc.publisherAcademic Press
dc.relation.ispartofseriesJournal of Parallel and Distributed Computing
dc.subject.otherreduction
dc.subject.otherFast direct solver
dc.subject.otherGPU computing
dc.subject.otherpartial solution technique
dc.subject.otherPSCR method
dc.subject.otherRoofline model
dc.subject.otherSeparable block tridiagonal linear system
dc.titleOn solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method
dc.typearticle
dc.identifier.urnURN:NBN:fi:jyu-201802191534
dc.contributor.laitosInformaatioteknologian tiedekuntafi
dc.contributor.laitosFaculty of Information Technologyen
dc.contributor.oppiaineTietotekniikkafi
dc.contributor.oppiaineMathematical Information Technologyen
dc.type.urihttp://purl.org/eprint/type/JournalArticle
dc.date.updated2018-02-19T13:15:05Z
dc.type.coarhttp://purl.org/coar/resource_type/c_2df8fbb1
dc.description.reviewstatuspeerReviewed
dc.format.pagerange56-66
dc.relation.issn0743-7315
dc.relation.numberinseriesMay
dc.relation.volume115
dc.type.versionacceptedVersion
dc.rights.copyright© 2018 Elsevier Inc. This is a final draft version of an article whose final and definitive form has been published by Elsevier Inc. Published in this repository with the kind permission of the publisher.
dc.rights.accesslevelopenAccessfi
dc.relation.grantnumber295897
dc.subject.ysotietotekniikka
dc.subject.ysolineaariset mallit
dc.subject.ysopienennys
jyx.subject.urihttp://www.yso.fi/onto/yso/p5462
jyx.subject.urihttp://www.yso.fi/onto/yso/p25748
jyx.subject.urihttp://www.yso.fi/onto/yso/p25829
dc.relation.doi10.1016/j.jpdc.2018.01.004
dc.relation.funderSuomen Akatemiafi
dc.relation.funderResearch Council of Finlanden
jyx.fundingprogramAkatemiahanke, SAfi
jyx.fundingprogramAcademy Project, AoFen
jyx.fundinginformationThe research of the first author was supported by the Academy of Finland[grant number 252549]; the Jyväskylä Doctoral Program in Computing and Mathematical Sciences ; and the Foundation of Nokia Corporation (Project number 201510310). The research of the third author was supported by the Academy of Finland[grant numbers 252549, 295897]
dc.type.okmA1


Aineistoon kuuluvat tiedostot

Thumbnail

Aineisto kuuluu seuraaviin kokoelmiin

Näytä suppeat kuvailutiedot