The PETSc (Portable, Extensible Toolkit for Scientific Computation) library is one of the fundamental general-purpose numerical libraries in high-performance computing environments. It is widely employed for solving problems related to partial differential equations, sparse linear algebra, and other related issues. PETSc plays a crucial role in assisting developers in rapidly creating parallel programs, thereby enhancing the computational efficiency of high-performance computing. This paper initially ports the PETSc program onto the SW26010-pro processor. Following that, we choose a representative solver from the KSP module of PETSc. Addressing four hotspot functions invoked by this solver, we implement many-core optimization for its execution under the Sunway heterogeneous architecture. The experimental results show that the maximum speedup on a single node can reach 62.3 after many-core optimizations of core hotspot functions. This indicates that the many-core optimization of hotspot functions for PETSc linear solvers demonstrates excellent parallel efficiency on the Sunway supercomputer.
Solving large-scale sparse linear systems is a critical problem in scientific and engineering computing. Partial differential equations can solve problems in many fields. They can be transformed into large-scale linear systems with a series of methods, and the parallel solution of tridiagonal linear systems is one of them. The solution of linear systems is very time-consuming in most of the problems, accounting for more than half of the total time. Load balancing can reduce process time for waiting and improves computational efficiency, and it is the focus of many algorithms. The article is based on Stone's proposed recursive doubling algorithm, an improved algorithm for solving tridiagonal linear systems using the full-recursive-doubling communication model and the Möbiu transform. The improved algorithm can calculate the million-dimensional linear systems. Numerical experiments show that compared with ordinary parallel algorithms, the improved algorithm shows up to 2x improvement than the original version, and some results even show up to 3x. In addition, the load-balancing performance has been greatly improved, and the time difference of the processes is 1/7 of the original version. The improved algorithm has a good load balancing, and the running time of each process is not much different, avoiding process waiting and resource wastage.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.