Minerando Repositórios de Código com MetricMiner
Maurício Finavaro Aniche
Marco Aurélio Gerosa
Resumo
Nos últimos anos, a área de engenharia de software empírica tem extraído muitas informações de repositórios de código, como Git e SVN. No entanto, o trabalho é árduo: o pesquisador precisa escrever programas que navegam por cada commit, arquivo e modificação desse repositório, extraiam as informações que precisa, e as exportem em um formato aceitável. Raramente as ferramentas desenvolvidas são reutilizadas, fazendo com o que o desenvolvedor gaste muito tempo sempre que começa um novo estudo. Neste minicurso, apresentaremos o arcabouço criado pelo nosso grupo de pesquisa, o MetricMiner, cuja ideia é dar suporte a desenvolvedores que querem analisar repositórios. Dada sua simplicidade, com poucas linhas de código, o pesquisador consegue extrair e analisar uma grande quantidade de repositórios de código.
Instrutores
Maurício Finavaro Aniche é doutorando pela Universidade de São Paulo (IME-USP) sob supervisão do professor Marco Gerosa. É também instrutor e desenvolvedor pela Caelum, renomada instituição de ensino brasileira. Atua na área de engenharia de software e suas principais linhas de pesquisa são Test-Driven Development e Métricas de Código. É criador do MetricMiner, ferramenta de código aberto, que ajuda pesquisadores a minerar repositórios. Para mais informações, visite http://www.aniche.com.br .
Marco Aurélio Gerosa é professor associado no Departamento de Ciência da Computação da Universidade de São Paulo. Sua pesquisa está na interseção entre Engenharia de Software e Sistemas Colaborativos, focando em engenharia de software experimental, mineração de repositórios de software, evolução de software e dimensões sociais do desenvolvimento de software. Recebe bolsa de produtividade do CNPq e coordena projetos de software livre que já receberam diversos prêmios. Para mais informações, visite: http://www.ime.usp.br/~gerosa .
Duração: 3,5 horas
Esse tutorial tem atividades práticas. Traga seu notebook com Eclipse instalado.
The Utrecht Compiler Construction Suite
Abstract
Haskell is the ideal language for building language processing . In the course we will show how to use the Utrecht Parser Combinator Library (uu-parsinglib) and the Utrecht Attribute Grammar System (uuagc) in constructing compilers in a stepwise fashion. We will construct a compiler for a small language on th fly. We will point our common mistakes and how to avoid them. At the end of the course we will shortly show how attribute gramars can be directy realised in Haskell, thus avoiding the use of a preprocesor. We will also show how interleaved parsers (uu-interleaved) can be used in processing command line options (uu-options).
The course is both interesting for people who have written compilers using more conventional tools since we show how things can be done elegantly in Haskell, and for people who know Haskell since we show how to easily build a wide variety of applications.
If you do not know any Haskell the course is an ideal way of getting acquainted with the language and its expressive power. The toools have been used to construct a complete Haskell compiler (uhc).
Instructor
Doaitse Swierstra (1950) studied theoretical physics at Groningen University and got a Ph.D. from Twente University with a thesis subject in the area of programming language design. From 1983 to 2013 he has been a full professor of Computer Science at Utrecht University. He is one of the initiators of the Dutch research school IPA. Currently he is a professor emeritus.
Doaitse Swierstra is a long standing member of the IFIP Working Group 2.1, and has served on numerous program committees, such as POPL, ICFP and the Haskell Symposium. His research interests are programming methodology, functional programming, combinator languages and libraries, compositional programming methods, program analysis, program specification and verification, compiler construction, parsing, attribute grammars, generic programming, programming environments. He has thus far supervised 24 Ph.D. students (see: http://swierstra.net/SupervisedTheses)
Analyzing Software Product Lines
Abstract
Variability is paramount in many software systems: Users configure products tailored for their needs. For example, the Linux kernel has over 10000 compile-time configuration options. However, variability often implies complexity. Already with few configuration options, we can generate a vast number (easily exceeding billions) of potential products. Checking all products individually using classic testing or analysis techniques is not feasible. Recently, researchers have begun to adapt analysis techniques, such as type checking, model checking, theorem proving, static analysis, parsing, and even testing, to incorporate variability. Sampling configurations deliberately can be effective, but additionally researchers have explored ideas to check the entire product line in a single step and to reason about variability locally where it manifests at the code level. In such variability-aware analysis, when the product line passes the check (e.g., the product line is well-typed), it is guaranteed that all possible generated products pass the check as well (e.g., all products that can be generated are well-typed, too). Such variability-aware analyses take variability information into account, including feature models and implementation artifacts, thus spanning problem and solution space. In the tutorial, I will give an overview of the problems and concepts for analyzing software product lines. I will discuss different strategies (brute force, sampling, family-based analysis, feature-based analysis), and show recurring ideas and patterns of how existing analysis are extended for variability. While taking a broad view, I will illustrate key ideas and also initial results by means of a number of case studies, including the development of a variability-aware type system and its application to Linux and the application of variability-aware verification techniques to product-line models. Finally, I will outline how such analysis techniques might be useful beyond software product lines.
Instructor
Christian Kästner is an Assistant Professor of Computer Science in the School of Computer Science at Carnegie Mellon University. He joined Carnegie Mellon in 2012, after a 2-year postdoc in the group of Klaus Ostermann at the University of Marburg. He received his PhD from the University of Magdeburg in 2010 for his work on virtual separation of concerns. For his dissertation he received the prestigious GI Dissertation Award for the best computer-science dissertation in 2010. His research focuses on quality assurance for highly-configurable systems and in understanding and managing variability-induced complexity. He combines programming-language research and software-engineering research in the areas of software product lines, feature-oriented programming, modularity, metaprogramming, software analysis, program comprehension, empirical studies of developers and development artifacts, and program transformations.
Programming by Examples
Abstract
Programming by Examples (PBE) involves synthesizing intended programs in an underlying domain-specific language from example based specifications (Espec). In this tutorial, I will formalize our notion of Espec and describe some principles behind designing appropriate domain-specific languages. A key technical challenge in PBE is to search for programs that are consistent with the Espec provided by the user. I will describe a divide-and-conquer based search paradigm that leverages deductive rules and version space algebras to achieve real time efficiency. Another technical challenge in PBE is to resolve the ambiguity that is inherent in the Espec. I will describe machine learning based ranking techniques to predict an intended program within a set of programs that are consistent with the Espec, and active-learning based user interaction models to help resolve ambiguity in the Espec. These principles will be illustrated using various PBE technologies (FlashFill, FlashExtract, and FlashRelate) that have been developed for the domain of data manipulation. These technologies are quite useful for data scientists who typically spend 80% of their time cleaning data, and for 99% of those end users who do not know programming.
Instructor
Sumit Gulwani is a principal researcher at Microsoft Research (in Redmond, USA), and an adjunct faculty in the Computer Science Department at IIT Kanpur (India). He has expertise in formal methods and automated program analysis and synthesis techniques. His recent research interests are in the cross-disciplinary areas of automating end-user programming and building intelligent tutoring systems. His programming-by-example work led to the Flash Fill feature of Microsoft Excel 2013 that is used by hundreds of millions of people. He was awarded the ACM SIGPLAN Robin Milner Young Researcher Award in 2014. He obtained his PhD in Computer Science from UC-Berkeley in 2005, and was awarded the ACM SIGPLAN Outstanding Doctoral Dissertation Award. He obtained his BTech in Computer Science and Engineering from IIT Kanpur, and was awarded the President's Gold Medal."
Selected topics in the verification of shared-memory multithreaded programs
Abstract
Verification of multithreaded programs is an important and difficult research area. In general, all verification attempts boil down to a continuous fight against state explosion. In the multithreaded context, state explosion refers to the fact that the number of program states (or the number of predicates describing combinations of thread states) grows exponentially in the number of threads. We are going to presents several high-level approaches for dealing with state explosion, based on modular and nonmodular reasoning. We pick the following highlights from the sheer abundance of verification methods for multithreaded programs. We will discuss the following modular approaches:
- multithread-Cartesian abstract interpretation in different disguises (Owicki-Gries proof system, thread-modular verification),
- applying multithread-Cartesian abstract interpretation to threads with recursion,
- refinement of multithread-Cartesian abstract interpretation for finite-state programs,
- abstractions of large threads by small ones,
- precision of reasoning based on resource invariants,
- concurrent separation logic.
We also touch some aspects of nonmodular verification and model-checking for multithreaded programs:
- partial order reduction,
- automata representation of state spaces,
- theorem proving for nonmonotonic properties.
Instructor
Alexander Malkis has obtained his Diploma degree from the University of Saarland, Germany, in 2004-2005, for a work on polyedges (in other terminology, bond animals) under the guidance of Prof. Dr. Raimund Seidel; during his studies Alexander was financed by the prominent foundation “Studienstiftung des deutschen Volkes”. He continued his studies in Saarbrücken and Freiburg, funded by the Max-Planck society and the DFG (German science foundation), obtaining his PhD thesis in 2010 at the University of Freiburg for a work on verification of multithreaded programs under guidance of Prof. Dr. Andreas Podelski. Currently, Alexander is working at Technische Universität München, Germany.
There is a wide range of topics in which Alexander is interested in, among them: polynomial verification of large program classes; emptiness of language intersection (complexity and algorithms); the diameter of multithreaded programs; thread simulations, procedure abstractions under concurrency; multithreaded-Cartesian abstract interpretation and thread-modular verification; multithreaded programs with recursion; automatic verification of software barriers; dataflow analysis and information exposure checkers for multithreaded programs; last but not least multicomponent architectures.
Duration: 3.5 hours