Publications
2025
- | Mpisee: communicator-centric profiling of MPI applications at reposiTUm , opens an external URL in a new windowVardas, I., Träff, J. L., Laso, R., & Hunold, S. (2025). Mpisee: communicator-centric profiling of MPI applications. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 37(15–17), Article e70158. https://doi.org/10.1002/cpe.70158
- | Communication Round and Computation Efficient Exclusive Prefix-Sums Algorithms (for MPI_Exscan) at reposiTUm , opens an external URL in a new windowTräff, J. L. (2025). Communication Round and Computation Efficient Exclusive Prefix-Sums Algorithms (for MPI_Exscan). arXiv. https://doi.org/10.34726/10821
- | Optimizing Distributed Deep Learning Training by Tuning NCCL at reposiTUm , opens an external URL in a new windowSalimi Beni, M., Laso, R., Cosenza, B., Benkner, S., & Hunold, S. (2025). Optimizing Distributed Deep Learning Training by Tuning NCCL. In ASHPC25 : Austrian-Slovenian HPC Meeting 2025 : Rimske Terme, Slovenia : 19-22 May 2025 (pp. 38–38). https://doi.org/10.34726/10424
- | ncclsee: A Lightweight Profiling Tool for NCCL at reposiTUm , opens an external URL in a new windowVardas, I., Laso Rodriguez, R., & Salimi Beni, M. (2025). ncclsee: A Lightweight Profiling Tool for NCCL. In ASHPC25 : Austrian-Slovenian HPC Meeting 2025 : Rimske Terme, Slovenia : 19-22 May 2025 (pp. 39–39). https://doi.org/10.34726/10426
- | Optimal, Non-pipelined Reduce-scatter and Allreduce Algorithms at reposiTUm , opens an external URL in a new windowTräff, J. L. (2025). Optimal, Non-pipelined Reduce-scatter and Allreduce Algorithms. arXiv. https://doi.org/10.34726/10760
- | Phase-Based Frequency Scaling for Energy-Efficient Heterogeneous Computing at reposiTUm , opens an external URL in a new windowCarpentieri, L., De Caro, A., Salimibeni, M., Fan, K., & Cosenza, B. (2025). Phase-Based Frequency Scaling for Energy-Efficient Heterogeneous Computing. In 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (pp. 824–836). IEEE. https://doi.org/10.1109/IPDPS64566.2025.00078
2024
- | MPI Collective Algorithm Selection in the Presence of Process Arrival Patterns at reposiTUm , opens an external URL in a new windowSalimibeni, M., Cosenza, B., & Hunold, S. (2024). MPI Collective Algorithm Selection in the Presence of Process Arrival Patterns. In Proceedings : 2024 IEEE International Conference on Cluster Computing : 24 – 27 September 2024 Kobe, Japan (pp. 108–119). https://doi.org/10.1109/CLUSTER59578.2024.00017
- | Exploring Mapping Strategies for Co-allocated HPC Applications at reposiTUm , opens an external URL in a new windowVardas, I., Hunold, S., Swartvagher, P., & Träff, J. L. (2024). Exploring Mapping Strategies for Co-allocated HPC Applications. In Demetris Zeinalipour, D. Blanco Heras, G. Pallis, H. Herodotou, D. Trihinas, D. Balouek, P. Diehl, T. Cojean, K. Fürlinger, M. H. Kirkeby, M. Nardelli, & P. Di Sanzo (Eds.), Euro-Par 2023: Parallel Processing Workshops : Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28 – September 1, 2023, Revised Selected Papers, Part II (pp. 271–276). Springer Nature. https://doi.org/10.1007/978-3-031-48803-0_41
- | Lectures on Parallel Computing at reposiTUm , opens an external URL in a new windowTräff, J. L. (2024). Lectures on Parallel Computing. arXiv. https://doi.org/10.34726/10819
- | Optimal Broadcast Schedules in Logarithmic Time with Applications to Broadcast, All-Broadcast, Reduction and All-Reduction at reposiTUm , opens an external URL in a new windowTräff, J. L. (2024). Optimal Broadcast Schedules in Logarithmic Time with Applications to Broadcast, All-Broadcast, Reduction and All-Reduction. arXiv. https://doi.org/10.34726/10820
- | pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations at reposiTUm , opens an external URL in a new windowLaso Rodriguez, R., Krupitza, D., & Hunold, S. (2024). pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ Parallel STL Implementations. arXiv. https://doi.org/10.48550/arXiv.2402.06384
- | Benchmarking, Measuring, and Optimizing : 15th BenchCouncil International Symposium, Bench 2023, Revised Selected Papers at reposiTUm , opens an external URL in a new windowHunold, S., Xie, B., & Shu, K. (Eds.). (2024). Benchmarking, Measuring, and Optimizing : 15th BenchCouncil International Symposium, Bench 2023, Revised Selected Papers (Vol. 14521). Springer Singapore. https://doi.org/10.1007/978-981-97-0316-6
- | Exploring Scalability in C++ Parallel STL Implementations at reposiTUm , opens an external URL in a new windowLaso Rodriguez, R., Krupitza, D., & Hunold, S. (2024). Exploring Scalability in C++ Parallel STL Implementations. In ICPP ’24: Proceedings of the 53rd International Conference on Parallel Processing (pp. 284–293). ACM. https://doi.org/10.1145/3673038.3673065
- | Analysis and prediction of performance variability in large-scale computing systems at reposiTUm , opens an external URL in a new windowSalimi Beni, M., Hunold, S., & Cosenza, B. (2024). Analysis and prediction of performance variability in large-scale computing systems. Journal of Supercomputing, 80(10), 14978–15005. https://doi.org/10.1007/s11227-024-06040-w
- | Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping at reposiTUm , opens an external URL in a new windowVardas, I., Hunold, S., SWARTVAGHER, P., & Träff, J. L. (2024). Improved Parallel Application Performance and Makespan by Colocation and Topology-aware Process Mapping. In 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid) (pp. 119–124). IEEE. https://doi.org/10.1109/CCGrid59990.2024.00023
2023
- | Round-optimal 𝑛-Block Broadcast Schedules in Logarithmic Time at reposiTUm , opens an external URL in a new windowTräff, J. L. (2023). Round-optimal 𝑛-Block Broadcast Schedules in Logarithmic Time. arXiv. https://doi.org/10.34726/7320
- | Unveiling the Complexities of Performance Analysis and Optimization in HPC Systems at reposiTUm , opens an external URL in a new windowHunold, S. (2023, December 8). Unveiling the Complexities of Performance Analysis and Optimization in HPC Systems. Universität Münster, Münster, Germany.
- | Verifying Performance Guidelines for MPI Collectives at Scale at reposiTUm , opens an external URL in a new windowHunold, S. (2023). Verifying Performance Guidelines for MPI Collectives at Scale. In Proceedings of 2023 SC23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC23 Workshops) (pp. 1264–1268). ACM. https://doi.org/10.1145/3624062.3625532
- | Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures at reposiTUm , opens an external URL in a new windowSwartvagher, P., Hunold, S., Träff, J. L., & Vardas, I. (2023). Using Mixed-Radix Decomposition to Enumerate Computational Resources of Deeply Hierarchical Architectures. In Proceedings of 2023 SC23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis (SC 2023 Workshops) (pp. 405–415). ACM. https://doi.org/10.1145/3624062.3624109
- | The research career after the PhD at reposiTUm , opens an external URL in a new windowLaso Rodriguez, R., & Casado, F. E. (2023, November 3). The research career after the PhD. CiTIUS (USC), Santiago de Compostela, Spain.
- | Library Development with MPI: Attributes, Request Objects, Group Communicator Creation, Local Reductions, and Datatypes at reposiTUm , opens an external URL in a new windowTräff, J. L., & Vardas, I. (2023). Library Development with MPI: Attributes, Request Objects, Group Communicator Creation, Local Reductions, and Datatypes. In Proceedings of the 30th European MPI Users’ Group Meeting (EUROMPI 23). 30th European MPI Users’ Group Meeting (EuroMPI 2023), Bristol, United Kingdom of Great Britain and Northern Ireland (the). ACM. https://doi.org/10.1145/3615318.3615323
- | Synchronizing MPI Processes in Space and Time at reposiTUm , opens an external URL in a new windowSchuchart, J., Hunold, S., & Bosilca, G. (2023). Synchronizing MPI Processes in Space and Time. In EuroMPI “23: Proceedings of the 30th European MPI Users” Group Meeting (pp. 1–11). ACM. https://doi.org/10.1145/3615318.3615325
- | Realizing multioperations and multiprefixes in Thick Control Flow processors at reposiTUm , opens an external URL in a new windowForsell, M., Roivainen, J., Leppänen, V., & Träff, J. L. (2023). Realizing multioperations and multiprefixes in Thick Control Flow processors. Microprocessors and Microsystems, 98, Article 104807. https://doi.org/10.1016/j.micpro.2023.104807
- | Preliminary Performance and Memory Access Scalability Study of Thick Control Flow Processors at reposiTUm , opens an external URL in a new windowForsell, M., Roivainen, J., Leppänen, V., & Träff, J. L. (2023). Preliminary Performance and Memory Access Scalability Study of Thick Control Flow Processors. In J. Nurmi, M. Shen, P. Ellervee, P. Koch, & F. Moradi (Eds.), Proceedings 2023 IEEE Nordic Circuits and Systems Conference (NorCAS) (pp. 1–7). IEEE. https://doi.org/10.1109/NorCAS58970.2023.10305463
- | MPI is Good, Control is Better: Checking Performance Guidelines of Collectives at reposiTUm , opens an external URL in a new windowHunold, S., & Hagn, M. (2023). MPI is Good, Control is Better: Checking Performance Guidelines of Collectives. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 60–60). EuroCC Austria. https://doi.org/10.34726/5367
- | A Quantitative Analysis of OpenMP Task Runtime Systems at reposiTUm , opens an external URL in a new windowHunold, S., & Kraßnitzer, K. D. V. (2023). A Quantitative Analysis of OpenMP Task Runtime Systems. In A. Gainaru, C. Zhang, & C. Luo (Eds.), Benchmarking, Measuring, and Optimizing : 14th BenchCouncil International Symposium, Bench 2022, Virtual Event, November 7-9, 2022, Revised Selected Papers (pp. 3–18). Springer. https://doi.org/10.1007/978-3-031-31180-2_1
- | OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning at reposiTUm , opens an external URL in a new windowHunold, S., & Steiner, S. (2023). OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning. In Proceedings of PMBS 2022: performance modeling, benchmarking and simulation of high performance computer systems (pp. 123–128). IEEE. https://doi.org/10.1109/PMBS56514.2022.00016
- | Massively Scaling Molecular Screening Workloads on EuroHPC Supercomputers at reposiTUm , opens an external URL in a new windowHunold, S., Vardas, I., Ibis, G., & Langer, T. (2023). Massively Scaling Molecular Screening Workloads on EuroHPC Supercomputers. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 51–51). EuroCC Austria. https://doi.org/10.34726/5366
- | Rank Reordering within MPI Communicators to Exploit Deep Hierarchal Architectures of Supercomputers at reposiTUm , opens an external URL in a new windowSwartvagher, P., Vardas, I., Hunold, S., & Träff, J. L. (2023). Rank Reordering within MPI Communicators to Exploit Deep Hierarchal Architectures of Supercomputers. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 61–61). EuroCC Austria. https://doi.org/10.34726/5368
- | Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI at reposiTUm , opens an external URL in a new windowTräff, J. L., Hunold, S., Vardas, I., & Funk, N. M. (2023). Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI. In 2023 IEEE International Conference on Cluster Computing (CLUSTER) (pp. 284–294). IEEE. https://doi.org/10.1109/CLUSTER52292.2023.00031
- | Effects of Mapping Strategies on Average Duration and Throughput of Colocated HPC Applications at reposiTUm , opens an external URL in a new windowVardas, I., Hunold, S., Swartvagher, P., & Träff, J. L. (2023). Effects of Mapping Strategies on Average Duration and Throughput of Colocated HPC Applications. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2023 - ASHPC23 (pp. 10–10). EuroCC Austria. https://doi.org/10.34726/5330
2022
- | Brief Announcement: Fast(er) Construction of Round-optimal n-Block Broadcast Schedules at reposiTUm , opens an external URL in a new windowTräff, J. L. (2022). Brief Announcement: Fast(er) Construction of Round-optimal n-Block Broadcast Schedules. In K. Agrawal & I.-T. A. Lee (Eds.), Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2022) (pp. 143–146). ACM. https://doi.org/10.1145/3490148.3538560
- | An Overhead Analysis of MPI Profiling and Tracing Tools at reposiTUm , opens an external URL in a new windowHunold, S., Ajanohoun, J. I., Vardas, I., & Träff, J. L. (2022). An Overhead Analysis of MPI Profiling and Tracing Tools. In C. Scully-Allison, R. Liem, & A. V. Solorzano (Eds.), PERMAVOST 2022: Proceedings of the 2nd Workshop on Performance Engineering, Modelling, Analysis, and Visualization Strategy (pp. 5–13). Association for Computing Machinery (ACM). https://doi.org/10.1145/3526063.3535353
- | Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia at reposiTUm , opens an external URL in a new windowHunold, S., & Przybylski, B. (2022, May 18). Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia. New Challenges in Scheduling Theory (Centre CNRS “Paul-Langevin”, Aussois, France), Aussois, France.
- | MPI Performance Tools under the Microscope: A Thorough Overhead Analysis at reposiTUm , opens an external URL in a new windowAjanohoun, J. I., Vardas, I., Träff, J. L., & Hunold, S. (2022). MPI Performance Tools under the Microscope: A Thorough Overhead Analysis. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2022 - ASHPC22 (p. 16). EuroCC Austria.
- | Performance and programmability comparison of the thick control flow architecture and current multicore processors at reposiTUm , opens an external URL in a new windowForsell, M., Nikula, S., Roivainen, J., Leppänen, V., & Träff, J. L. (2022). Performance and programmability comparison of the thick control flow architecture and current multicore processors. The Journal of Supercomputing, 78(3), 3152–3183. https://doi.org/10.1007/s11227-021-03985-0
- | Performance Tuning of MPI Collectives - Status Quo and Open Problems at reposiTUm , opens an external URL in a new windowHunold, S. (2022). Performance Tuning of MPI Collectives - Status Quo and Open Problems. CaSToRC HPC National Competence Center Fall Seminar Series 2022, Unknown.
- | (Poly)Logarithmic Time Construction of Round-optimal n-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI at reposiTUm , opens an external URL in a new windowTräff, J. L. (2022). (Poly)Logarithmic Time Construction of Round-optimal n-Block Broadcast Schedules for Broadcast and irregular Allgather in MPI. arXiv. https://doi.org/10.48550/arXiv.2205.10072
- | Fast(er) Construction of Round-optimal n-Block Broadcast Schedules at reposiTUm , opens an external URL in a new windowTräff, J. L. (2022). Fast(er) Construction of Round-optimal n-Block Broadcast Schedules. In Proceedings IEEE International Conference on Cluster Computing (CLUSTER 2022) (pp. 142–151). IEEE. https://doi.org/10.1109/CLUSTER51413.2022.00028
- | mpisee: MPI Profiling for Communication and Communicator Structure at reposiTUm , opens an external URL in a new windowVardas, I., Hunold, S., Ajanohoun, J. I., & Traff, J. L. (2022). mpisee: MPI Profiling for Communication and Communicator Structure. In 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2022) (pp. 520–529). IEEE. https://doi.org/10.1109/IPDPSW55747.2022.00092
- | mpisee: MPI Profiling for Communication and Communicator Structure at reposiTUm , opens an external URL in a new windowVardas, I., Hunold, S., Ajanohoun, J. I., & Träff, J. L. (2022). mpisee: MPI Profiling for Communication and Communicator Structure. In E. Reiter (Ed.), Austrian-Slovenian HPC Meeting 2022 - ASHPC22 (p. 15). EuroCC Austria.
2021
- | Teaching Complex Scheduling Algorithms at reposiTUm , opens an external URL in a new windowHunold, S., & Przybylski, B. (2021). Teaching Complex Scheduling Algorithms. In 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 11th NSF/TCPP Workshop on Parallel and Distributed Computing Education (EduPar 2021) in conjunction with 35th IEEE IPDPS 2021 - Online Conference, Portland, Oregon, USA, United States of America (the). IEEE. https://doi.org/10.1109/ipdpsw52791.2021.00058
- | MicroBench Maker: Reproduce, Reuse, Improve at reposiTUm , opens an external URL in a new windowHunold, S., Ajanohoun, J. I., & Carpen-Amarie, A. (2021). MicroBench Maker: Reproduce, Reuse, Improve. In 2021 International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 12th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS 2021) in conjunction with SC 2021, St. Louis, Missouri, United States of America (the). IEEE. https://doi.org/10.1109/pmbs54543.2021.00013
- | A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation at reposiTUm , opens an external URL in a new windowTräff, J. L. (2021). A Doubly-pipelined, Dual-root Reduction-to-all Algorithm and Implementation. arXiv. https://doi.org/10.48550/arXiv.2109.12626
- | A more pragmatic implementation of the lock-free, ordered, linked list at reposiTUm , opens an external URL in a new windowTräff, J. L., & Pöter, M. (2021). A more pragmatic implementation of the lock-free, ordered, linked list. In J. Lee & E. Petrank (Eds.), Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM. https://doi.org/10.1145/3437801.3441579
- | MPI collective communication through a single set of interfaces: A case for orthogonality at reposiTUm , opens an external URL in a new windowTräff, J. L., Hunold, S., Mercier, G., & Holmes, D. J. (2021). MPI collective communication through a single set of interfaces: A case for orthogonality. Parallel Computing: Systems & Applications, 107(102826), 102826. https://doi.org/10.1016/j.parco.2021.102826
2020
- | High-Quality Hierarchical Process Mapping at reposiTUm , opens an external URL in a new windowFaraj, M. F., van der Grinten, A., Meyerhenke, H., Träff, J. L., & Schulz, C. (2020). High-Quality Hierarchical Process Mapping. arXiv. https://doi.org/10.48550/arXiv.2001.07134
- | High-Quality Hierarchical Process Mapping at reposiTUm , opens an external URL in a new windowFaraj, M. F., van der Grinten, A., Meyerhenke, H., Träff, J. L., & Schulz, C. (2020). High-Quality Hierarchical Process Mapping. In S. Faro & D. Cantone (Eds.), 18th International Symposium on Experimental Algorithms, SEA 2020 (pp. 4:1-4:15). Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.SEA.2020.4
- | Optimizing Memory Access in TCF Processors with Compute-Update Operations at reposiTUm , opens an external URL in a new windowForsell, M., Roivainen, J., & Träff, J. L. (2020). Optimizing Memory Access in TCF Processors with Compute-Update Operations. In 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020) in conjunction with IPDPS 2020 - Online Conference, New Orleans, United States of America (the). IEEE. https://doi.org/10.1109/ipdpsw50202.2020.00100
- | Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia at reposiTUm , opens an external URL in a new windowHunold, S., & Przybylski, B. (2020). Scheduling.jl - Collaborative and Reproducible Scheduling Research with Julia. arXiv. https://doi.org/10.48550/arXiv.2003.05217
- | Predicting MPI Collective Communication Performance Using Machine Learning at reposiTUm , opens an external URL in a new windowHunold, S., Bhatele, A., Bosilca, G., & Knees, P. (2020). Predicting MPI Collective Communication Performance Using Machine Learning. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00036
- | Efficient Process-to-Node Mapping Algorithms for Stencil Computations at reposiTUm , opens an external URL in a new windowHunold, S., von Kirchbach, K., Lehr, M., Schulz, C., & Träff, J. L. (2020). Efficient Process-to-Node Mapping Algorithms for Stencil Computations. arXiv. https://doi.org/10.48550/arXiv.2005.09521
- | Better Process Mapping and Sparse Quadratic Assignment at reposiTUm , opens an external URL in a new windowKirchbach, K. V., Schulz, C., & Träff, J. L. (2020). Better Process Mapping and Sparse Quadratic Assignment. ACM Journal on Experimental Algorithmics, 25, 1–19. https://doi.org/10.1145/3409667
- | Improved Cartesian Topology Mapping in MPI at reposiTUm , opens an external URL in a new windowLehr, M., & von Kirchbach, K. (2020). Improved Cartesian Topology Mapping in MPI. In A. Schlögl, J. Kiss, & S. Elefante (Eds.), Austrian High-Performance-Computing Meeting (AHPC 2020) (p. 27). IST Austria. https://doi.org/10.15479/AT:ISTA:7474
- | Classical and pipelined preconditioned conjugate gradient methods with node-failure resilience at reposiTUm , opens an external URL in a new windowPachajoa, C., Levonyak, M., Pacher, C., Träff, J. L., & Gansterer, W. (2020). Classical and pipelined preconditioned conjugate gradient methods with node-failure resilience. In A. Schlögl, J. Kiss, & S. Elefante (Eds.), Austrian High-Performance-Computing Meeting (AHPC 2020) (p. 13). IST Austria. https://doi.org/10.15479/AT:ISTA:7474
- | Decomposing MPI Collectives for Exploiting Multi-lane Communication at reposiTUm , opens an external URL in a new windowTräff, J. L. (2020). Decomposing MPI Collectives for Exploiting Multi-lane Communication. SPCL_Bcast, ETH Zürich, Zürich, Switzerland.
- | k-ported vs. k-lane Broadcast, Scatter, and Alltoall Algorithms at reposiTUm , opens an external URL in a new windowTräff, J. L. (2020). k-ported vs. k-lane Broadcast, Scatter, and Alltoall Algorithms. arXiv. https://doi.org/10.48550/arXiv.2008.12144
- | Exploiting Multi-lane Communication in MPI Collectives at reposiTUm , opens an external URL in a new windowTräff, J. L. (2020). Exploiting Multi-lane Communication in MPI Collectives. In A. Schlögl, J. Kiss, & S. Elefante (Eds.), Austrian High-Performance-Computing Meeting (AHPC 2020) (p. 30). IST Austria. https://doi.org/10.15479/AT:ISTA:7474
- | Signature Datatypes for Type Correct Collective Operations, Revisited at reposiTUm , opens an external URL in a new windowTräff, J. L. (2020). Signature Datatypes for Type Correct Collective Operations, Revisited. In 27th European MPI Users’ Group Meeting. 27th European MPI Users’ Group Meeting (EuroMPI/USA 2020) - Online Conference, Austin, United States of America (the). IEEE. https://doi.org/10.1145/3416315.3416324
- | Special issue: Selected papers from EuroMPI 2019 at reposiTUm , opens an external URL in a new windowTräff, J. L., & Hoefler, T. (2020). Special issue: Selected papers from EuroMPI 2019. Parallel Computing, 99, Article 102695. https://doi.org/10.1016/j.parco.2020.102695
- | Decomposing MPI Collectives for Exploiting Multi-lane Communication at reposiTUm , opens an external URL in a new windowTräff, J. L., & Hunold, S. (2020). Decomposing MPI Collectives for Exploiting Multi-lane Communication. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00037
- | A more Pragmatic Implementation of the Lock-free, Ordered, Linked List at reposiTUm , opens an external URL in a new windowTräff, J. L., & Pöter, M. (2020). A more Pragmatic Implementation of the Lock-free, Ordered, Linked List. arXiv. https://doi.org/10.48550/arXiv.2010.15755
- | Collectives and Communicators: A Case for Orthogonality at reposiTUm , opens an external URL in a new windowTräff, J. L., Hunold, S., Mercier, G., & Holmes, D. J. (2020). Collectives and Communicators: A Case for Orthogonality. In 27th European MPI Users’ Group Meeting. 27th European MPI Users’ Group Meeting (EuroMPI/USA 2020) - Online Conference, Austin, United States of America (the). IEEE. https://doi.org/10.1145/3416315.3416319
- | Efficient Process-to-Node Mapping Algorithms for Stencil Computations at reposiTUm , opens an external URL in a new windowvon Kirchbach, K., Lehr, M., Hunold, S., Schulz, C., & Träff, J. L. (2020). Efficient Process-to-Node Mapping Algorithms for Stencil Computations. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan. IEEE. https://doi.org/10.1109/cluster49012.2020.00011