Publikationen
2025
- | Mpisee: communicator-centric profiling of MPI applications auf reposiTUm , öffnet eine externe URL in einem neuen FensterVardas, I., Träff, J. L., Laso, R., & Hunold, S. (2025). Mpisee: communicator-centric profiling of MPI applications. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 37(15–17), Article e70158. https://doi.org/10.1002/cpe.70158
- | Communication Round and Computation Efficient Exclusive Prefix-Sums Algorithms (for MPI_Exscan) auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2025). Communication Round and Computation Efficient Exclusive Prefix-Sums Algorithms (for MPI_Exscan). arXiv. https://doi.org/10.34726/10821
- | Optimizing Distributed Deep Learning Training by Tuning NCCL auf reposiTUm , öffnet eine externe URL in einem neuen FensterSalimi Beni, M., Laso, R., Cosenza, B., Benkner, S., & Hunold, S. (2025). Optimizing Distributed Deep Learning Training by Tuning NCCL. In ASHPC25 : Austrian-Slovenian HPC Meeting 2025 : Rimske Terme, Slovenia : 19-22 May 2025 (pp. 38–38). https://doi.org/10.34726/10424
- | ncclsee: A Lightweight Profiling Tool for NCCL auf reposiTUm , öffnet eine externe URL in einem neuen FensterVardas, I., Laso Rodriguez, R., & Salimi Beni, M. (2025). ncclsee: A Lightweight Profiling Tool for NCCL. In ASHPC25 : Austrian-Slovenian HPC Meeting 2025 : Rimske Terme, Slovenia : 19-22 May 2025 (pp. 39–39). https://doi.org/10.34726/10426
- | Optimal, Non-pipelined Reduce-scatter and Allreduce Algorithms auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2025). Optimal, Non-pipelined Reduce-scatter and Allreduce Algorithms. arXiv. https://doi.org/10.34726/10760
- | Phase-Based Frequency Scaling for Energy-Efficient Heterogeneous Computing auf reposiTUm , öffnet eine externe URL in einem neuen FensterCarpentieri, L., De Caro, A., Salimibeni, M., Fan, K., & Cosenza, B. (2025). Phase-Based Frequency Scaling for Energy-Efficient Heterogeneous Computing. In 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 824–836). IEEE. https://doi.org/10.1109/IPDPS64566.2025.00078
2024
- | MPI Collective Algorithm Selection in the Presence of Process Arrival Patterns auf reposiTUm , öffnet eine externe URL in einem neuen FensterSalimibeni, M., Cosenza, B., & Hunold, S. (2024). MPI Collective Algorithm Selection in the Presence of Process Arrival Patterns. In Proceedings : 2024 IEEE International Conference on Cluster Computing : 24 – 27 September 2024 Kobe, Japan (pp. 108–119). https://doi.org/10.1109/CLUSTER59578.2024.00017
- | Exploring Mapping Strategies for Co-allocated HPC Applications auf reposiTUm , öffnet eine externe URL in einem neuen FensterVardas, I., Hunold, S., Swartvagher, P., & Träff, J. L. (2024). Exploring Mapping Strategies for Co-allocated HPC Applications. In Demetris Zeinalipour, D. Blanco Heras, G. Pallis, H. Herodotou, D. Trihinas, D. Balouek, P. Diehl, T. Cojean, K. Fürlinger, M. H. Kirkeby, M. Nardelli, & P. Di Sanzo (Eds.), Euro-Par 2023: Parallel Processing Workshops : Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28 – September 1, 2023, Revised Selected Papers, Part II (pp. 271–276). Springer Nature. https://doi.org/10.1007/978-3-031-48803-0_41
- | Lectures on Parallel Computing auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2024). Lectures on Parallel Computing. arXiv. https://doi.org/10.34726/10819
- | Optimal Broadcast Schedules in Logarithmic Time with Applications to Broadcast, All-Broadcast, Reduction and All-Reduction auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2024). Optimal Broadcast Schedules in Logarithmic Time with Applications to Broadcast, All-Broadcast, Reduction and All-Reduction. arXiv. https://doi.org/10.34726/10820
- | pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations auf reposiTUm , öffnet eine externe URL in einem neuen FensterLaso Rodriguez, R., Krupitza, D., & Hunold, S. (2024). pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations. arXiv. https://doi.org/10.48550/arXiv.2402.06384
- | Benchmarking, Measuring, and Optimizing : 15th BenchCouncil International Symposium, Bench 2023, Revised Selected Papers auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., Xie, B., & Shu, K. (Eds.). (2024). Benchmarking, Measuring, and Optimizing : 15th BenchCouncil International Symposium, Bench 2023, Revised Selected Papers (Vol. 14521). Springer Singapore. https://doi.org/10.1007/978-981-97-0316-6
- | Exploring Scalability in C++ Parallel STL Implementations auf reposiTUm , öffnet eine externe URL in einem neuen FensterLaso Rodriguez, R., Krupitza, D., & Hunold, S. (2024). Exploring Scalability in C++ Parallel STL Implementations. In ICPP ’24: Proceedings of the 53rd International Conference on Parallel Processing (pp. 284–293). ACM. https://doi.org/10.1145/3673038.3673065
- | Analysis and prediction of performance variability in large-scale computing systems auf reposiTUm , öffnet eine externe URL in einem neuen FensterSalimi Beni, M., Hunold, S., & Cosenza, B. (2024). Analysis and prediction of performance variability in large-scale computing systems. Journal of Supercomputing, 80(10), 14978–15005. https://doi.org/10.1007/s11227-024-06040-w
- | Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping auf reposiTUm , öffnet eine externe URL in einem neuen FensterVardas, I., Hunold, S., SWARTVAGHER, P., & Träff, J. L. (2024). Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping. In 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid) (pp. 119–124). IEEE. https://doi.org/10.1109/CCGrid59990.2024.00023
2023
- | Round-optimal 𝑛-Block Broadcast Schedules in Logarithmic Time auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2023). Round-optimal 𝑛-Block Broadcast Schedules in Logarithmic Time. arXiv. https://doi.org/10.34726/7320
- | Unveiling the Complexities of Performance Analysis and Optimization in HPC Systems auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S. (2023, December 8). Unveiling the Complexities of Performance Analysis and Optimization in HPC Systems. Universität Münster, Münster, Germany.
- | Verifying Performance Guidelines for MPI Collectives at Scale auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S. (2023). Verifying Performance Guidelines for MPI Collectives at Scale. In Proceedings of 2023 SC23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC23 Workshops) (pp. 1264–1268). ACM. https://doi.org/10.1145/3624062.3625532
- | Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures auf reposiTUm , öffnet eine externe URL in einem neuen FensterSwartvagher, P., Hunold, S., Träff, J. L., & Vardas, I. (2023). Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures. In Proceedings of 2023 SC23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis (SC 2023 Workshops) (pp. 405–415). ACM. https://doi.org/10.1145/3624062.3624109
- | The research career after the PhD auf reposiTUm , öffnet eine externe URL in einem neuen FensterLaso Rodriguez, R., & Casado, F. E. (2023, November 3). The research career after the PhD. CiTIUS (USC), Santiago de Compostela, Spain.
- | Library Development with MPI: Attributes, Request Objects, Group Communicator Creation, Local Reductions, and Datatypes auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L., & Vardas, I. (2023). Library Development with MPI: Attributes, Request Objects, Group Communicator Creation, Local Reductions, and Datatypes. In Proceedings of the 30th European MPI Users’ Group Meeting (EUROMPI 23). 30th European MPI Users’ Group Meeting (EuroMPI 2023), Bristol, United Kingdom of Great Britain and Northern Ireland (the). ACM. https://doi.org/10.1145/3615318.3615323
- | Synchronizing MPI Processes in Space and Time auf reposiTUm , öffnet eine externe URL in einem neuen FensterSchuchart, J., Hunold, S., & Bosilca, G. (2023). Synchronizing MPI Processes in Space and Time. In EuroMPI “23: Proceedings of the 30th European MPI Users” Group Meeting (pp. 1–11). ACM. https://doi.org/10.1145/3615318.3615325
- | Realizing multioperations and multiprefixes in Thick Control Flow processors auf reposiTUm , öffnet eine externe URL in einem neuen FensterForsell, M., Roivainen, J., Leppänen, V., & Träff, J. L. (2023). Realizing multioperations and multiprefixes in Thick Control Flow processors. Microprocessors and Microsystems, 98, Article 104807. https://doi.org/10.1016/j.micpro.2023.104807
- | Preliminary Performance and Memory Access Scalability Study of Thick Control Flow Processors auf reposiTUm , öffnet eine externe URL in einem neuen FensterForsell, M., Roivainen, J., Leppänen, V., & Träff, J. L. (2023). Preliminary Performance and Memory Access Scalability Study of Thick Control Flow Processors. In J. Nurmi, M. Shen, P. Ellervee, P. Koch, & F. Moradi (Eds.), Proceedings 2023 IEEE Nordic Circuits and Systems Conference (NorCAS) (pp. 1–7). IEEE. https://doi.org/10.1109/NorCAS58970.2023.10305463
- | MPI is Good, Control is Better: Checking Performance Guidelines of Collectives auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., & Hagn, M. (2023). MPI is Good, Control is Better: Checking Performance Guidelines of Collectives. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 60–60). EuroCC Austria. https://doi.org/10.34726/5367
- | A Quantitative Analysis of OpenMP Task Runtime Systems auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., & Kraßnitzer, K. D. V. (2023). A Quantitative Analysis of OpenMP Task Runtime Systems. In A. Gainaru, C. Zhang, & C. Luo (Eds.), Benchmarking, Measuring, and Optimizing : 14th BenchCouncil International Symposium, Bench 2022, Virtual Event, November 7-9, 2022, Revised Selected Papers (pp. 3–18). Springer. https://doi.org/10.1007/978-3-031-31180-2_1
- | OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., & Steiner, S. (2023). OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning. In Proceedings of PMBS 2022: performance modeling, benchmarking and simulation of high performance computer systems (pp. 123–128). IEEE. https://doi.org/10.1109/PMBS56514.2022.00016
- | Massively Scaling Molecular Screening Workloads on EuroHPC Supercomputers auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., Vardas, I., Ibis, G., & Langer, T. (2023). Massively Scaling Molecular Screening Workloads on EuroHPC Supercomputers. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 51–51). EuroCC Austria. https://doi.org/10.34726/5366
- | Rank Reordering within MPI Communicators to Exploit Deep Hierarchal Architectures of Supercomputers auf reposiTUm , öffnet eine externe URL in einem neuen FensterSwartvagher, P., Vardas, I., Hunold, S., & Träff, J. L. (2023). Rank Reordering within MPI Communicators to Exploit Deep Hierarchal Architectures of Supercomputers. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 61–61). EuroCC Austria. https://doi.org/10.34726/5368
- | Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L., Hunold, S., Vardas, I., & Funk, N. M. (2023). Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI. In 2023 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 284–294). IEEE. https://doi.org/10.1109/CLUSTER52292.2023.00031
- | Effects of Mapping Strategies on Average Duration and Throughput of Colocated HPC Applications auf reposiTUm , öffnet eine externe URL in einem neuen FensterVardas, I., Hunold, S., Swartvagher, P., & Träff, J. L. (2023). Effects of Mapping Strategies on Average Duration and Throughput of Colocated HPC Applications. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 10–10). EuroCC Austria. https://doi.org/10.34726/5330
2022
- | Brief Announcement: Fast(er) Construction of Round-optimal n-Block Broadcast Schedules auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2022). Brief Announcement: Fast(er) Construction of Round-optimal n-Block Broadcast Schedules. In K. Agrawal & I.-T. A. Lee (Eds.), Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2022) (pp. 143–146). ACM. https://doi.org/10.1145/3490148.3538560
- | An Overhead Analysis of MPI Profiling and Tracing Tools auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., Ajanohoun, J. I., Vardas, I., & Träff, J. L. (2022). An Overhead Analysis of MPI Profiling and Tracing Tools. In C. Scully-Allison, R. Liem, & A. V. Solorzano (Eds.), PERMAVOST 2022: Proceedings of the 2nd Workshop on Performance Engineering, Modelling, Analysis, and Visualization Strategy (pp. 5–13). Association for Computing Machinery (ACM). https://doi.org/10.1145/3526063.3535353
- | Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., & Przybylski, B. (2022, May 18). Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia. New Challenges in Scheduling Theory (Centre CNRS “Paul-Langevin”, Aussois, France), Aussois, France.
- | MPI Performance Tools under the Microscope: A Thorough Overhead Analysis auf reposiTUm , öffnet eine externe URL in einem neuen FensterAjanohoun, J. I., Vardas, I., Träff, J. L., & Hunold, S. (2022). MPI Performance Tools under the Microscope: A Thorough Overhead Analysis. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2022 - ASHPC22 (p. 16). EuroCC Austria.
- | Performance and programmability comparison of the thick control flow architecture and current multicore processors auf reposiTUm , öffnet eine externe URL in einem neuen FensterForsell, M., Nikula, S., Roivainen, J., Leppänen, V., & Träff, J. L. (2022). Performance and programmability comparison of the thick control flow architecture and current multicore processors. The Journal of Supercomputing, 78(3), 3152–3183. https://doi.org/10.1007/s11227-021-03985-0
- | Performance Tuning of MPI Collectives - Status Quo and Open Problems auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S. (2022). Performance Tuning of MPI Collectives - Status Quo and Open Problems. CaSToRC HPC National Competence Center Fall Seminar Series 2022, Unknown.
- | (Poly)Logarithmic Time Construction of Round-optimal n-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2022). (Poly)Logarithmic Time Construction of Round-optimal n-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI. arXiv. https://doi.org/10.48550/arXiv.2205.10072
- | Fast(er) Construction of Round-optimal n-Block Broadcast Schedules auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2022). Fast(er) Construction of Round-optimal n-Block Broadcast Schedules. In Proceedings IEEE International Conference on Cluster Computing (CLUSTER 2022) (pp. 142–151). IEEE. https://doi.org/10.1109/CLUSTER51413.2022.00028
- | mpisee: MPI Profiling for Communication and Communicator Structure auf reposiTUm , öffnet eine externe URL in einem neuen FensterVardas, I., Hunold, S., Ajanohoun, J. I., & Traff, J. L. (2022). mpisee: MPI Profiling for Communication and Communicator Structure. In 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2022) (pp. 520–529). IEEE. https://doi.org/10.1109/IPDPSW55747.2022.00092
- | mpisee: MPI Profiling for Communication and Communicator Structure auf reposiTUm , öffnet eine externe URL in einem neuen FensterVardas, I., Hunold, S., Ajanohoun, J. I., & Träff, J. L. (2022). mpisee: MPI Profiling for Communication and Communicator Structure. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2022 - ASHPC22 (p. 15). EuroCC Austria.
2021
- | Teaching Complex Scheduling Algorithms auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., & Przybylski, B. (2021). Teaching Complex Scheduling Algorithms. In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 11th NSF/TCPP Workshop on Parallel and Distributed Computing Education (EduPar 2021) in conjunction with 35th IEEE IPDPS 2021 - Online Conference, Portland, Oregon, USA, United States of America (the). IEEE. https://doi.org/10.1109/ipdpsw52791.2021.00058
- | MicroBench Maker: Reproduce, Reuse, Improve auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., Ajanohoun, J. I., & Carpen-Amarie, A. (2021). MicroBench Maker: Reproduce, Reuse, Improve. In 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 12th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS 2021) in conjunction with SC 2021, St. Louis, Missouri, United States of America (the). IEEE. https://doi.org/10.1109/pmbs54543.2021.00013
- | A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2021). A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation. arXiv. https://doi.org/10.48550/arXiv.2109.12626
- | A more pragmatic implementation of the lock-free, ordered, linked list auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L., & Pöter, M. (2021). A more pragmatic implementation of the lock-free, ordered, linked list. In J. Lee & E. Petrank (Eds.), Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM. https://doi.org/10.1145/3437801.3441579
- | MPI collective communication through a single set of interfaces: A case for orthogonality auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L., Hunold, S., Mercier, G., & Holmes, D. J. (2021). MPI collective communication through a single set of interfaces: A case for orthogonality. Parallel Computing: Systems & Applications, 107(102826), 102826. https://doi.org/10.1016/j.parco.2021.102826
2020
- | High-Quality Hierarchical Process Mapping auf reposiTUm , öffnet eine externe URL in einem neuen FensterFaraj, M. F., van der Grinten, A., Meyerhenke, H., Träff, J. L., & Schulz, C. (2020). High-Quality Hierarchical Process Mapping. arXiv. https://doi.org/10.48550/arXiv.2001.07134
- | High-Quality Hierarchical Process Mapping auf reposiTUm , öffnet eine externe URL in einem neuen FensterFaraj, M. F., van der Grinten, A., Meyerhenke, H., Träff, J. L., & Schulz, C. (2020). High-Quality Hierarchical Process Mapping. In S. Faro & D. Cantone (Eds.), 18th International Symposium on Experimental Algorithms, SEA 2020 (pp. 4:1-4:15). Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.SEA.2020.4
- | Optimizing Memory Access in TCF Processors with Compute-Update Operations auf reposiTUm , öffnet eine externe URL in einem neuen FensterForsell, M., Roivainen, J., & Träff, J. L. (2020). Optimizing Memory Access in TCF Processors with Compute-Update Operations. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020) in conjunction with IPDPS 2020 - Online Conference, New Orleans, United States of America (the). IEEE. https://doi.org/10.1109/ipdpsw50202.2020.00100
- | Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., & Przybylski, B. (2020). Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia. arXiv. https://doi.org/10.48550/arXiv.2003.05217
- | Predicting MPI Collective Communication Performance Using Machine Learning auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., Bhatele, A., Bosilca, G., & Knees, P. (2020). Predicting MPI Collective Communication Performance Using Machine Learning. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00036
- | Efficient Process-to-Node Mapping Algorithms for Stencil Computations auf reposiTUm , öffnet eine externe URL in einem neuen FensterHunold, S., von Kirchbach, K., Lehr, M., Schulz, C., & Träff, J. L. (2020). Efficient Process-to-Node Mapping Algorithms for Stencil Computations. arXiv. https://doi.org/10.48550/arXiv.2005.09521
- | Better Process Mapping and Sparse Quadratic Assignment auf reposiTUm , öffnet eine externe URL in einem neuen FensterKirchbach, K. V., Schulz, C., & Träff, J. L. (2020). Better Process Mapping and Sparse Quadratic Assignment. ACM Journal on Experimental Algorithmics, 25, 1–19. https://doi.org/10.1145/3409667
- | Improved Cartesian Topology Mapping in MPI auf reposiTUm , öffnet eine externe URL in einem neuen FensterLehr, M., & von Kirchbach, K. (2020). Improved Cartesian Topology Mapping in MPI. In A. Schlögl, J. Kiss, & S. Elefante (Eds.), Austrian High-Performance-Computing Meeting (AHPC 2020) (p. 27). IST Austria. https://doi.org/10.15479/AT:ISTA:7474
- | Classical and pipelined preconditioned conjugate gradient methods with node-failure resilience auf reposiTUm , öffnet eine externe URL in einem neuen FensterPachajoa, C., Levonyak, M., Pacher, C., Träff, J. L., & Gansterer, W. (2020). Classical and pipelined preconditioned conjugate gradient methods with node-failure resilience. In A. Schlögl, J. Kiss, & S. Elefante (Eds.), Austrian High-Performance-Computing Meeting (AHPC 2020) (p. 13). IST Austria. https://doi.org/10.15479/AT:ISTA:7474
- | Decomposing MPI Collectives for Exploiting Multi-lane Communication auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2020). Decomposing MPI Collectives for Exploiting Multi-lane Communication. SPCL_Bcast, ETH Zürich, Zürich, Switzerland.
- | k-ported vs. k-lane Broadcast, Scatter, and Alltoall Algorithms auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2020). k-ported vs. k-lane Broadcast, Scatter, and Alltoall Algorithms. arXiv. https://doi.org/10.48550/arXiv.2008.12144
- | Exploiting Multi-lane Communication in MPI Collectives auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2020). Exploiting Multi-lane Communication in MPI Collectives. In A. Schlögl, J. Kiss, & S. Elefante (Eds.), Austrian High-Performance-Computing Meeting (AHPC 2020) (p. 30). IST Austria. https://doi.org/10.15479/AT:ISTA:7474
- | Signature Datatypes for Type Correct Collective Operations, Revisited auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L. (2020). Signature Datatypes for Type Correct Collective Operations, Revisited. In 27th European MPI Users’ Group Meeting. 27th European MPI Users’ Group Meeting (EuroMPI/USA 2020) - Online Conference, Austin, United States of America (the). IEEE. https://doi.org/10.1145/3416315.3416324
- | Special issue: Selected papers from EuroMPI 2019 auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L., & Hoefler, T. (2020). Special issue: Selected papers from EuroMPI 2019. Parallel Computing, 99, Article 102695. https://doi.org/10.1016/j.parco.2020.102695
- | Decomposing MPI Collectives for Exploiting Multi-lane Communication auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L., & Hunold, S. (2020). Decomposing MPI Collectives for Exploiting Multi-lane Communication. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00037
- | A more Pragmatic Implementation of the Lock-free, Ordered, Linked List auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L., & Pöter, M. (2020). A more Pragmatic Implementation of the Lock-free, Ordered, Linked List. arXiv. https://doi.org/10.48550/arXiv.2010.15755
- | Collectives and Communicators: A Case for Orthogonality auf reposiTUm , öffnet eine externe URL in einem neuen FensterTräff, J. L., Hunold, S., Mercier, G., & Holmes, D. J. (2020). Collectives and Communicators: A Case for Orthogonality. In 27th European MPI Users’ Group Meeting. 27th European MPI Users’ Group Meeting (EuroMPI/USA 2020) - Online Conference, Austin, United States of America (the). IEEE. https://doi.org/10.1145/3416315.3416319
- | Efficient Process-to-Node Mapping Algorithms for Stencil Computations auf reposiTUm , öffnet eine externe URL in einem neuen Fenstervon Kirchbach, K., Lehr, M., Hunold, S., Schulz, C., & Träff, J. L. (2020). Efficient Process-to-Node Mapping Algorithms for Stencil Computations. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00011