This page contains some of the research papers associated with the ROSE project over the last several years. For numerous reasons, we feel that the latest papers are the best papers, this is likely typical of any ambitious project; but we have included everything for completeness. It is hoped that the underlying goal within each paper of supporting the use of high-level abstractions will be clear together with our attempts to address the performance issues required for the use of high-level abstractions within scientific computing.


  • Chunhua Liao and Yonghong Yan and Bronis R. de Supinski and Daniel J. Quinlan and Barbara Chapman, Early Experiences With The OpenMP Accelerator Model, 9th International Workshop on OpenMP (IWOMP), Canberra, Australia, 2013 LLNL-CONF-636479  pdf

In this paper, we examine the newly released accelerator directives and create an initial reference implementation, referred to as HOMP (Heterogeneous OpenMP). Focused on targeting NVIDIA GPUs, our work is based on an existing OpenMP implementation in the ROSE source-to-source compiler infrastructure. HOMP includes extensions to parse the new constructs and to represent them in the AST and other compiler translation details. Further we provide initial runtime support. For our evaluation, we have adapted a few existing OpenMP codes to use the accelerator model directives and present preliminary performance results. Finally, we critique the accelerator model in terms of its impact on developers and compiler writers and suggest possible improvements.


  • Hongyi Ma, Qichang Chen, and Liqiang Wang, Chunhua Liao and Daniel Quinlan, “OpenMP-Checker: Detecting Concurrency Errors of OpenMP Programs Using Hybrid Program Analysis”, Poster paper accepted by ICPP’12, The 41st International Conference on Parallel Processing, Pittsburgh, PA, September 10-13, 2012. LLNL-CONF-543377

This paper presents a novel technique to detect data races and deadlocks of OpenMP programs, using hybrid program analysis. Specifically, we use an SMT-solver based static analysis to analyze OpenMP source code. Then we use a dynamic analysis to confirm, or rule out, the potential errors. The static analysis narrows down the code regions and events that need to be monitored, significantly reducing the overhead of the dynamic analysis. Our experiments show that OpenMP-Checker is more scalable and accurate at pinpointing concurrency errors within a set of chosen benchmarks, compared to the two commercial tools, Sun Thread Analyzer and Intel Thread Checker.

  • Jacob Lidman, Daniel J. Quinlan, Chunhua Liao, Sally A. McKee, “ROSE::FTTransform – A Source-to-Source Translation Framework for Exascale Fault-Tolerance Research”, submitted to Fault-Tolerance for HPC at Extreme Scale (FTXS 2012), Boston, June 25-28, 2012. LLNL-CONF-541631

This paper presents a compiler based transformation released in ROSE and demonstrates the use of Triple Modular Redundancy as an approach to provide HPC software with fault tolerance against transient faults, as we expect them to manifest themselves on future Exascale architectures. The paper presents performance results showing that for a randomly selected subset of benchmarks the overhead of this extra layer of support is about 20%. We expect that may be competitive with future approaches to fault tolerance using check-point restart that may be much more expensive or maybe even intractable for Exascale. This work is released as a framework within ROSE to support research work in this area by ourselves and collaborators.

  • Sara Royuela, Alejandro Duran, Chunhua Liao, Daniel J. Quinlan, “Auto-scoping for OpenMP tasks”, accepted by the 8th International Workshop on OpenMP, IWOMP 2012, Rome, June 11-13, 2012. LLNL-CONF-534493

This paper presents an auto-scoping algorithm to work with OpenMP tasks. (Auto-scoping is the process of automatically determining the data sharing dependencies of variables in OpenMP programs). This is a much more complex challenge due to the uncertainty of when a task will be executed, which makes it harder to determine what parts of the program will run concurrently. We also introduce an implementation of the algorithm and results with several benchmarks showing that the algorithm is able to correctly scope a large percentage of the variables appearing in them.

  • Shah Mohammad Faizur Rahman, Jichi Guo, Akshatha Bhat, Carlos Garcia, Majedul Haque Sujon, Qingy Yi, Chunhua Liao, Daniel J. Quinlan, “Studying The Impact Of Application-level Optimizations On The Power Consumption Of Multi-Core Architectures”, accepted by The ACM International Conference on Computing Frontiers 2012 (CF’12), May 15th-17th, 2012, Cagliari, Italy. LLNL-CONF-599780

This paper presents an extensive study of the impact of application level optimizations on both the performance and power efficiencies of applications from a wide range of scientific and embedded systems domains. We observe that application-level optimizations often have a much larger impact on performance than on power consumption. However, optimizing for performance does not necessarily lead to better power consumption, and vice versa. Compared to sequential applications, multithreaded applications give more room for performance and power improvements. Additionally, a number of optimizations, including loop and thread affinity optimizations, have shown great potential in supporting collective enhancement of both performance and power efficiency. Our experimental results provide several insights to help exploit these optimizations effectively.


  • Shalf, J. and Quinlan, D. and Janssen, C., “Rethinking Hardware-Software Codesign for Exascale Systems”, Computer, Vol. 44, issue 11, pages 22-30; November 2011.

This paper presents work combining the LBL node-simulator, the SNL, network simulator, and the ROSE compiler to demonstrate analysis of software and the workflow required for such tools to analyze the power requirements of HPC code using autotuning to define optimial points in the design space. The paper lays out an approach to co-design at the start of work that is a part of the CoDEX project lead by LBL and including both SNL and LLNL.

  • Dan Quinlan and Chunhua Liao, The ROSE Source-to-Source Compiler Infrastructure, Cetus Users and Compiler Infrastructure Workshop, in conjunction with PACT 2011, Galveston Island, Texas, USA, October 10, 2011. pdf
  • M.J. Sottile, C. Rasmussen, W.N. Weseloh, R.W. Robey, D. Quinlan, J. Overbey (2011). “ForOpenCL: Transformations Exploiting Array Syntax in Fortran for Accelerator Programming.” Proceedings of the 2nd International Workshop on GPUs and Scientific Applications (GPUScA), Galveston Island, Texas. October, 2011.

This paper presents an OpenCL code generator leveraging the semantics of the F90 array constructs. Such GPU work is expected to be an important part of future Exascale programming environments, this work demonstrates how ROSE is used to support the analysis of the input code, and the translation and code generation required to generate OpenCL code for GPUs.

  • Peter Pirkelbauer, Chunhua Liao, Thomas Panas, Daniel Quinlan, “Runtime Detection of C-style Errors in UPC code”, 5th Conference on Partitioned Global Address Space Programming Models. October 2011.

This paper present work to define a dynamic analysis for correctness of UPC usage and leverages the RTED test suite from Iowa State University. This work is released in ROSE and shows how to build a dynamic analysis level of support to catch errors as represented by test codes in the RTED test suit for UPC. The correctness of using programming models is an important aspect of the design of future programming models for Exascale. This paper shows how to design dynamic analysis-based tools to evaluate correctness of the UPC languages programming model.

  • Peter Pirkelbauer, Chunhua Liao, Thomas Panas, and Daniel J. Quinlan, Runtime Detection of C-Style Errors in UPC Code, Fifth Conference on Partitioned Global Address Space Programming Models (PGAS’11), Galveston, TX, October 2011. LLNL-CONF-502592. pdf


  • Chunhua Liao, Daniel J. Quinlan , Thomas Panas and Bronis de Supinski, A ROSE-based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries, international Workshop on OpenMP (IWOMP) 2010, accepted in March. 2010 LLNL-CONF-422873 pdf
  • Chunhua Liao, Daniel J. Quinlan, Jeremiah J. Willcock and Thomas Panas, Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions, Journal of Parallel Programming, Accepted in Jan. 2010 pdf


  • Chunhua Liao, Daniel J. Quinlan and Thomas Panas, Towards an Abstraction-Friendly Programming Model for High Productivity and High Performance Computing, Workshop on Non-Traditional Programming Models for High-Performance Computing, Los Alamos Computer Science Symposium (LACSS) 2009, Santa Fe, New Mexico, October 13-14, 2009 LLNL-CONF-417691 pdf .
    This is a short position paper.
  • Chunhua Liao, Daniel J. Quinlan, Richard Vuduc and Thomas Panas, Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization, The 22nd International Workshop on Languages and Compilers for Parallel Computing, Newark, Delaware, USA. October 8-10, 2009. pdf .

This paper describes our work of using ROSE to build an effective source-to-source outliner in order to support whole program empirical optimization (also called autotuning). The ROSE outliner addresses the problem of extracting tunable kernels out of large scale applications, thereby helping to convert the challenging whole-program tuning problem into a set of more manageable kernel tuning tasks. In particular, the outliner can generate kernels which preserve performance characteristics of tuning targets which can be easily handled by other tools. This work also demonstrates how one can use ROSE’s compiler analyses to enhance the quality of source-to-source translation.

  • Saebjornsen, A., Willcock, J., Panas, T., Quinlan, D., and Su, Z. 2009. Detecting code clones in binary executables. In Proceedings of the Eighteenth international Symposium on Software Testing and Analysis (Chicago, IL, USA, July 19 – 23, 2009). ISSTA ’09. ACM, New York, NY, 117-128. DOI= pdf
  • Panas, T. and Quinlan, D. 2009. Techniques for software quality analysis of binaries: applied to Windows and Linux. In Proceedings of the 2nd international Workshop on Defects in Large Software Systems: Held in Conjunction with the ACM SIGSOFT international Symposium on Software Testing and Analysis (ISSTA 2009) (Chicago, Illinois, July 19 – 19, 2009). B. Liblit, N. Nagappan, and T. Zimmermann, Eds. DEFECTS ’09. ACM, New York, NY, 6-10. DOI= pdf
  • Chunhua Liao, Daniel J. Quinlan, Jeremiah J. Willcock and Thomas Panas, “Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore,” In Proceedings of the 5th international Workshop on OpenMP: Evolving OpenMP in An Age of Extreme Parallelism (Dresden, Germany, June 03 – 05, 2009). pdf .

This paper describes an approach to extending automatic parallelization to optimize applications written using high level abstractions. This work exemplifies a typical usage of ROSE and an initial work by us on the general subject of how to leverage semantics associated with high level of abstractions to enable more optimizations.


  • Panas, T. Signature visualization of software binaries. In Proceedings of the 4th ACM Symposium on Software Visualization (Ammersee, Germany, September 16 – 17, 2008). SoftVis ’08. ACM, New York, NY, 185-188. DOI= pdf
  • Daniel J. Quinlan, Gergo Barany and Thomas Panas, Towards Distributed Memory Parallel Program Analysis, Scalable Program Analysis, Dagstuhl Seminar Proceedings, 1862-4405,Dagstuhl, Germany, 2008 pdf


  • Quinlan, D ; Barany, G ; Panas, T, Shared and Distributed Memory Parallel Security Analysis of Large-Scale Source Code and Binary Applications, Static Analysis Summit II, Fairfax, VA, United States, Nov 08 – Nov 09, 2007 pdf
  • Panas, T., Epperly, T., Quinlan, D., Saebjornsen, A., and Vuduc, R. 2007. Communicating Software Architecture using a Unified Single-View Visualization. In Proceedings of the 12th IEEE international Conference on Engineering Complex Computer Systems (July 11 – 14, 2007). ICECCS. IEEE Computer Society, Washington, DC, 217-228. DOI= pdf
  • Quinlan, D. J., Vuduc, R. W., and Misherghi, G. 2007. Techniques for specifying bug patterns. In Proceedings of the 2007 ACM Workshop on Parallel and Distributed Systems: Testing and Debugging (London, United Kingdom, July 09 – 09, 2007). PADTAD ’07. ACM, New York, NY, 27-35. DOI= pdf
  • Thomas Panas, Dan Quinlan, and Richard Vuduc. Analyzing and visualizing whole program architectures. In Proc. 3rd Workshop on Aerospace Software Engineering (AeroSE) at the Int’l Conf. on Software Engineering (ICSE), Minneapolis, MN, USA, May 2007. pdf
  • Panas, T., Quinlan, D., and Vuduc, R. 2007. Tool Support for Inspecting the Code Quality of HPC Applications. In Proceedings of the 29th international Conference on Software Engineering Workshops (May 20 – 26, 2007). ICSEW. IEEE Computer Society, Washington, DC, 182. DOI= pdf


  • Vuduc, R., Schulz, M., Quinlan, D., de Supinski, B., and Saebjornsen, A. 2006. Improving distributed memory applications testing by message perturbation. In Proceedings of the 2006 Workshop on Parallel and Distributed Systems: Testing and Debugging (Portland, Maine, USA, July 17 – 17, 2006). PADTAD ’06. ACM, New York, NY, 27-36. DOI= pdf
  • Dan Quinlan, Richard Vuduc, Thomas Panas, Jochen Hardtlein, and Andreas Saebjornsen. Support for Whole-Program Analysis and the Verification of the One-Definition Rule in C++, NIST – National Institute of Standards and Technology – Static Analysis Summit, Gaithersburg, MD, USA, June 2006 pdf
  • Parameterization and Search-space Exploitation of Loop Fusion, 2006 pdf .
    This represent some recent work on empirical optimization (not yet published). This entry should likely be removed until it is published.


  • White, B. S., McKee, S. A., de Supinski, B. R., Miller, B., Quinlan, D., and Schulz, M. 2005. Improving the computational intensity of unstructured mesh applications. In Proceedings of the 19th Annual international Conference on Supercomputing (Cambridge, Massachusetts, June 20 – 22, 2005). ICS ’05. ACM, New York, NY, 341-350. DOI= pdf .

This paper is about the optimization of unstructured grid applications and represent preparatory work for future automated transformations specific to unstructured grid applications within DOE using ROSE.

  • Qing Yi, Daniel J. Quinlan: Applying Loop Optimizations to Object-Oriented Abstractions Through General Classification of Array Semantics. LCPC 2004: 253-267 pdf .

This paper outlines an approach to the optimization of user-defined abstractions. This work represents a substantial goal for ROSE and an initial work by us on the general subject of how to write code at a very high level of abstraction and have the lower level code required to get good performance be automatically generated. This paper covers the details of optimizing object-oriented abstractions usingROSE. Unfortunately, ROSE is not mentioned anywhere in the paper, a ridiculous oversight, but oh well. The subject is the optimization, not the ROSE compiler infrastructure.

  • Daniel J. Quinlan, Markus Schordan, Qing Yi, Andreas Saebjornsen: Classification and Utilization of Abstractions for Optimization. ISoLA 2004: 57-73 pdf .
    This paper is a general introduction to recent work in the ROSE project.
  • Schordan M., Quinlan D., “A Source-To-Source Architecture for User-Defined Optimizations”, Joint Modular Languages Conference held in conjunction with EuroPar’03, Austria, August 2003 pdf .
    This paper covers the architecture of ROSE as a project.
  • Daniel J. Quinlan, Markus Schordan, Qing Yi, Bronis R. de Supinski: Semantic-Driven Parallelization of Loops Operating on User-Defined Containers. LCPC 2003: 524-538 pdf .

This paper is the informal proceedings version and demonstrates the optimization of generalized container abstractions and is related to Active Library research (or so I understand). It is also related to Telescoping Language research. The paper demonstrates a few of the newest features in ROSE and has served an an introduction for the authors into the optimization of the STL library more generally.

  • Daniel J. Quinlan, Markus Schordan, Qing Yi, Bronis R. de Supinski: A C++ Infrastructure for Automatic Introduction and Translation of OpenMP Directives. WOMPAT 2003: 13-25 pdf .

This paper demonstrates the use of ROSE to recognize OpenMP pragmas and, using the Nanos OpenMP runtime library, build a subset of an OpenMP specific compiler for C++.

  • Quinlan, D. J., Miller, B., Philip, B., and Schordan, M. 2002. Treating a User-Defined Parallel Library as a Domain-Specific Language. In Proceedings of the 16th international Parallel and Distributed Processing Symposium (April 15 – 19, 2002). IEEE Computer Society, Washington, DC, 324. pdf .

This paper is specific to compile-time optimization of array classes. It demonstrates what was at the time the most current work on the compile-time optimization of an array class library. ROSE is more general, but this paper is very specific to the optimization of a single library.

  • Quinlan, D. Schordan, M. Philip, B. Kowarschik, M. “Parallel Object-Oriented Framework Optimization”, Special Issue of Concurrency: Practice and Experience (2003), also in Proceedings of Conference on Parallel Compilers (CPC2001), Edinburgh, Scotland, June 2001. pdf .

This is one of the first papers on ROSE presented at CPC2001 and later updated for publication into the Journal of Concurrency, Practice, and Experience.

  • Quinlan, D., Schordan, M. Philip, B. Kowarschik, M. “The Specification of Source-To-Source Transformations for the Compile-Time Optimization of Parallel Object-Oriented Scientific Applications”, Submitted to Parallel Processing Letters, also in Proceedings of 14th Workshop on Languages and Compilers for Parallel Computing (LCPC2001), Cumberland Falls, KY, August 1-3 2001. pdf .

This was a paper which specified some elements of what later became the string based AST rewrite mechanism used in ROSE.

  • D. Quinlan and B. Philip, “ROSETTA: The Compile-Time Recognition of Object-Oriented Library Abstractions and Their Use Within User Applications”, in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001), 2001 pdf .

This paper describes the development of a tool, ROSETTA, which build object-oriented Intermediate Representations (IRs) for compilers. It is a tool used within ROSE to build the SAGE III IR which we use internally with the EDG front-end. It is specific to details of the internal ROSE compiler infrastructure.

2000 and Earlier

  • Quinlan, D., “ROSE: Compiler Support for Object-Oriented Frameworks” Proceedings of Conference on Parallel Compilers (CPC2000), Aussois, France, January 2000. Also published in special issue of Parallel Processing Letters, Vol. 10. pdf 

This paper was an introduction to the work being done at the time on ROSE complete with a more detailed motivation for compile-time optimization of specific libraries.

  • Kei Davis and Dan Quinlan, ROSE II: An Optimizing Code Transformer for C++ Object-Oriented Array Class Libraries, World Multiconference on Systemics, Cybernetics and Informatics and 5th International Conference on Information Systems Analysis and Synthesis Vol.5: Computer Science and Engineering, Jul 31-Aug 4, 1999, Orlando, Florida pdf 
    This paper present preliminary work on the compile-time optimization of array class libraries.
  • F. Bassetti, K. Davis, D. Quinlan, “C++ Expression Templates Performance Issues in Scientific Computing,” ipps, pp.0635, 12th. International Parallel Processing Symposium, 1998 pdf .

Discusses the different approaches to the optimization of array class libraries. Optimization of array class libraries led to the development of ROSE as a project, though ROSE is not at all specific to array class libraries and addresses the optimization of libraries generally. This paper can be helpful in understanding what work was done using language template features within C++ before attempting to address the optimization issues more generally at compile time. Prior work started on ROSE had been abandoned because of the perceived significant advantages of template meta-programming techniques for scientific computing. Several papers on the details of template use were written, this is the most complete of them. It is included with these papers to provide a bit of perspective (currently historical).

Related Papers

This section is incomplete.

  •  Dan Quinlan and Rebecca Parsons, A++/P++ array classes for architecture independent finite differences computations. In Proceedings of the Second Annual Object-Oriented Numerics Conference (ONNSKI’94), April 1994.
  •  Lemke, M., Quinlan, D. P++, a C++ Virtual Shared Grids Based Programming Environment for Architecture-Independent Development of Structured Grid Applications, In Preceeding of the CONPAR/VAPP V, September 1992, Lyon, France. Published in Lecture Notes in Computer Science, Springer Verlag, September 1992.
  •  Brown, D., Henshaw, W., Quinlan, D., “OVERTURE: A Framework for the Complex Geometries,” Proceedings of the ISCOPE’99 Conference, San Francisco, CA, Dec 1999.
  • Guy Steele’s “Growing A Language”
  • Telescoping Languages work at Rice (Ken)
  • Plus/Minus Languages (Bjarne)