Google Summer of Code 2016 Scala Projects

Google Summer of Code

This year the Scala team applied again for the Google Summer of Code program to work with enthusiastic students on challenging Scala projects

This page provides a list of project ideas. The suggestions are only a starting point for students. We expect students to explore the ideas in much more detail, preferably with their own suggestions and detailed plans on how they want to proceed. Don’t feel constrained by the provided list! We welcome any of your own challenging ideas, but make sure that the proposed project satisfies the main requirements mentioned below.

How to get involved

The best place to propose and discuss your proposals is our “scala-language” mailing list. This way you will get quickly responses from the whole Scala community.

Previous Summer of Code

We encourage you to have a look at our 2015, 2014, 2013, 2012, 2011, Summer of Code 2010 pages to get an idea on what we and you can expect while working on Scala.

Project Ideas

Here are some project ideas. The list is non-binding and any reasonable project related to Scala that is proposed by a student will be thoroughly reviewed.

Revamp Slick Code-generator

There is a long list of smaller and medium sized open tickets for the Slick code generator. Time to tackle them and give it another overhaul. This is going to be a very useful and somewhat easier project.

https://github.com/slick/slick/issues?q=is%3Aopen+is%3Aissue+label%3Atopic%3Atype-providers

Supervised by @cvogt and/or @szeiger

Slick bug and feature hunt

There are lots of open tickets in the Slick issue tracker. Some which have long time fallen behind. This project would be about tackling as many of them as time permits. It will be a good chance to learn about the Slick code base and have an impact in this successful project. Some issues may be simple, but many issues may be challenging to fix and will require someone smart, dedicated and persistent enough to learn the details of how Slick works.

https://github.com/slick/slick/issues

Supervised by @cvogt and/or @szeiger

Super-charged for-comprehensions

Super-charged for-comprehensions are an effort to implement an alternative direct-style dyntax for monadic comprehensions in Scala. The goals are reduction in syntax, enabling more control flow expressions, inline Monad value extraction and transformer operations on the Monad context (as in Comprehensive Comprehensions). At the time of writing the project is in heavy flux. Depending on its status there will likely be interesting idea how to take the project further. One likely topic would be exploring type changing comprehensive comprehension transformers, such as groupBy or aggregations. The implementation heavily relies on macros and will require learning how to work with Scala’s compiler ASTs and make changes to the AST transformation stages.

Work in progress: https://github.com/cvogt/flow-comprehensions

Supervised by @cvogt

A next generation Scala build tool

This project is about extending and contributing to the implementation of a vision that only exist in partial, unreleased prototypes right now. The idea is to build a tool for a intuitive, easy to use, composable, statically checked builds that is used like an ordinary Scala library with no surprises. Several pieces of the puzzle exist as prototypes, but need improvement and many other pieces are entirely missing. In particular this GSOC project could be about building an interface to interoperate with SBT builds and SBT plugins, using them as components in new style builds. Other features yet to be built include publishing artifacts, signing them using GPG, packaging and many more. The exact tasks will depend on the state of the project at the time GSOC applications happen. It will be a chance to work on tooling and contribute to something that has a chance to maybe some day change something fundamental in the Scala eco system.

Supervised by @cvogt

Pushing Scala Coroutines to the Next Level

Scala Coroutines is a library-level extension for the Scala programming language that introduces first-class coroutines.

Coroutines are a language abstraction that generalizes subroutines (i.e. procedures, methods or functions). Unlike a subroutine, which is invoked once and executes until it completed, a coroutine can pause execution and yield control back to the caller, or another coroutine. The caller can then resume the coroutine when appropriate. Coroutines have a number of use cases, including but not limited to:

  • data structure iterators
  • event-driven code without the inversion of control
  • cooperative multitasking
  • concurrency frameworks such as actors, async-await and dataflow networks
  • expressing asynchrony, and better composition of asynchronous systems
  • capturing continuations
  • expressing backtracking
  • AI agents such as behaviour trees

The goal of this project is to extend documentation and tests of Scala Coroutines, potentially improve parts of the Coroutines implementation, and then implement a standard suite of extensions for Scala Coroutines. These extensions will be a separate module in the project, and will offer support for the use cases listed above: concisely creating collection iterators using coroutines, async-await support, continuation support, backtracking testing support, and/or a fiber library module. All the extensions will have to be documented.

Supervised by @axel22

Better Scripts in Scala

There are several hacky, ad-hoc ways to write scripts in Scala already, e.g. SBT’s script-runner, and the Ammonite REPL’s script loader. Your task in this project would be to improve the experience writing scripts using Ammonite to be on-par with the experience writing scripts in any other language, as well as on-par with the experience writing Scala code in a large project. This includes:

  • Improving startup time: SBT is terribly slow, and even Ammonite scripts take 1-2 seconds to boot up, This is speed-up-able.
  • Improved error reporting: Ammonite’s scripts currently report line-numbers and file-names based on the mangled compilation-unit. We should report based on the location within the source file a user wrote.
  • Ammonite requires two steps to install (Java, Ammonite). There is no reason it should take more than 1 command to bootstrap the script-runner on a new system
  • IDE support: Ammonite’s syntax isn’t supported by major IDEs, meaning you lose in-editor errors, autocomplete, and other things when writing your script. This can be added
  • Windows support: the Ammonite-REPL does not currently work on windows due to terminal-interaction issues. We should make it work and put it in CI.

Supervised by @lihaoyi

Flexible Fast Parser Combinators

The FastParse library is a modern replacement for Scala’s old parser combinators library. It provides similar API, drops a lot of redundant operators, and runs 50-100x faster. However, right now it only parses Strings.

This project would be to extend FastParse to parse other things. Cases to cover include:

  • Binary data: Array[Byte], ByteBuffer, ByteString, letting us parse things like binary files, network protocols, or other formats
  • Streaming data: Iterator[String] and Iterator[Char], letting us parse files while not materializing the entire contents in memory
  • Pre-tokenized data: Seq[T] for an arbitrary T, letting you re-use a pre-written lexer before your parser

Using virtual classes, we should be able to extend FastParse to do this while still re-using most of our code. Your task would be to:

  • Make FastParse able to handle 2 or 3 of the above cases
  • Demonstrate use-cases by using FastParse to write parsers for well-known formats: .zip, .class, streaming JSON or XML-SAX parsing

Supervised by @lihaoyi

Fault Handling for Function-Passing

The Function-Passing programming model is a new programming model designed with distributed programming in mind. The model is formalized and comes with a prototype implementation, but while fault-handling is worked out and formalized, it is not yet implemented.

This project would include implementing F-P’s fault-handling specification, and empirically evaluating the model across different sorts of applications.

(Note, there is already a waiting list for this project.)

Supervised by @heathercmiller

DottyDoc: A Documentation Generator for Dotty

Dotty is a brand new Scala compiler, designed to help us try out concepts of future Scala language versions. The compiler is a new design intended to reflect the lessons we learned from work with the Scala compiler.

Along with a new compiler and new foundation for Scala, we’d also like to reimagine generated documentation for Scala. This project aims to build a new documentation generation tool (à la Scaladoc) for the Dotty compiler.

(Note, there is already a waiting list for this project.)

Supervised by @heathercmiller

DottyREPL: A Better REPL for Dotty

Apart from better documentation tool, we’d also like to improve upon the REPL experience in Dotty. In this project you’re going to expand current minimalistic REPL with advanced functionality to improve end-user’s experience.

https://github.com/lampepfl/dotty/issues?utf8=✓&q=is%3Aissue+is%3Aopen+repl

Supervised by @densh

Better off-heap collections for Scala

scala-offheap is a new project that exposes user-controllable memory management to Scala programmers. At the moment it has fairly limited array-based collection API. Your goal in this project would be:

  • Expand existing functionality of combinator methods on offheap.Array[T]
  • Port implementation of maps to scala-offheap (offheap.Map[K, V])
  • Port implementation of sets to scala-offheap (offheap.Set[T])

Supervised by @densh

Java source code (or bytecode) to Scala.js IR compiler

Scala.js is the Scala to JavaScript compiler. While it can compile any .scala source files to JavaScript, it is unable to compiler Java source files. This can be annoying to Scala.js developers, as they cannot reuse existing Java libraries like Scala/JVM developers do.

This project consists in writing a compiler from Java source code to the Intermediate Representation (IR) of Scala.js. The later phases of the existing pipeline would then be reused to compile that IR to JavaScript.

The Java-to-IR compiler should probably reuse an existing an existing Java parser and typechecker. From typechecked Java ASTs, it should be relatively straightforward to compile them down to the Scala.js IR (except some corner cases, but not everything needs to be supported).

Alternative: Writing a compiler from the JVM bytecode to the Scala.js IR. This is a priori much more difficult, but it might be easier for a student already familiar with the JVM bytecode.

The compiler will be written in Scala for the JVM. Working knowledge of Scala is expected.

The project requires a relatively good knowledge of compiler construction in general. Your application should include your background in compiler construction. Ideally, it should point to a compiler project you have written (e.g., a student project of a compiler course).

Supervised by @sjrd

Scala Compiler Plugin for Parser Combinators

Parsing can be found everywhere in computing today: from log analysis to computer languages. Functional programming offers concise approach to parser generators development: a parsing grammar for generator is composed via higher-order functions – parser combinators.

As a program, parser generator should be programmed easily and work fast. Popular parser generators in Scala (Scala Combinator Parsers, parboiled2, FastParse, PapaCarlo, etc.) can be much faster or have complications in development.

Scala Combinator Parsers library is most concise. It is implemented as plain library in Scala. And because of that has few facilities for optimizations.

parboiled2 drops off most of algebraic sanity, and leaves a programmer with imperative-style parsing and Value Stack. That causes type soundness bugs, clumsy code that is hard to debug, and bad error reporting. FastParse fixes most critical issues of parboiled2. But both lacks can’t perform cross-rules analysis and deep optimisations.

The proposed solution requires users to use SBT build system with external plugin. Scala compiler plugin should rewrite Scala AST of a parser: compile high-level concise monadic Scala Combinator Parsers to highly-effective runtime code.

The plugin would be written in Scala. Existing projects written in Scala are expected. Ideally you should have understanding of compilers in general and Scala compiler inner parts.

Supervised by @alexander-myltsev, a core contributor to parboiled2 during GSoC’2013 and biological names parser based or parboiled2 writing notes sometimes in his blog.

New XML for Scala

This project is going to be about new platform-independent (i.e. cross-compile to both JVM and Scala.js) implementation of XML library based on fastparse parser combinators.

  • Finish the XML 1.0 parser
  • Refine the AST model
  • Implement quasiquotes based on the parser and AST model

Original prototype is available at:

https://github.com/densh/scala-nxml

Supervised by @densh

ENSIME project ideas

ENSIME is a JVM process that indexes your dependencies and understands your source code using the scala interactive compiler - the same compiler that compiles your code. This brings IDE features to your favourite text editor.

A number of project ideas around ENSIME project are listed on the following page:

http://ensime.github.io/contributing/ideas/

Mentored by @fommil and/or @a_dev_musing.

Requirements and Guidelines

General Student Application Requirements

This is the sixth time the Scala project has applied to the Summer of Code, and from last years experience, increased popularity of the language and stories of other mentor organizations we expect a high number of applications. First, be aware of the following:

  • Make sure that you understand, fulfill and agree to the general Google Summer of Code rules
  • The work done during GSoC requires some discipline from the students as they have to plan their day-to-day activities by themselves. Nevertheless we expect regular contact with the mentors by the usual forms of communication (mail, chat, phone) to make sure that the development is going according to the plan and students don’t get stuck for weeks at a time (3 months may seem long, but in reality it is very easy to run out of time).
  • The official SoC timetable mentions May 23 as the official start of coding, but if you have time you are encouraged to research your proposals even before that (and definitely learn the basics of Scala, if you haven’t done that already).

Student Application Guidelines

  • Student proposals should be very specific. We want to see evidence that you can succeed in the project. Applications with one-liners and general descriptions definitely won’t make the cut.
  • Because of the nature of our projects students must have at some knowledge of the Scala language. Applicants with Scala programming experience will be preferred. Alternatively, experience with functional programming could suffice, but in your application we want to see evidence that you can quickly be productive in Scala.
  • You can think of Google Summer of Code as a kind of independent internship. Therefore, we expect you to work full-time during the duration. Applicants with other time commitments are unlikely to be selected. From our previous experience we know that students’ finishing their studies (either Bachelor, Master of PhD) are likely to be overwhelmed by their final work, so please don’t be too optimistic and carefully plan your time for the project.
  • If you are unsure whether your proposal is suitable, feel free to discuss it on our “scala-language” mailing list. We have many community members on our mailing list who will quickly answer any of your questions regarding the project. Mentors are also constantly monitoring the mailing list. Don’t be afraid of asking questions, we enjoy solving puzzles like that!

General Proposal Requirements

The proposal will be submitted via the standard web-interface at https://summerofcode.withgoogle.com/, therefore plain text is the best way to go. We expect your application to be in the range of 700-1500 words. Anything less than that will probably not contain enough information for us to determine whether you are the right person for the job.

Your proposal should contain at least the following information, but feel free to include anything that you think is relevant:

  • Please include your name (weird as it may be, people do forget about it)
  • Title of your proposal
  • Abstract of your proposal
  • Detailed description of your idea including explanation on why is it innovative (maybe you already have some prototype?), what contribution do you expect to make to the Scala community and why do you think your project is needed, a rough plan of your development and possible architecture sketches.
  • Description of previous work, existing solutions (links to prototypes, bibliography are more than welcome!)
  • Write us about yourself and convince us that you are the right person for the job (linking to your resume/CV is good but not sufficient)
    • Mention the details of your academic studies, any previous work, internships
    • Any relevant skills that will help you to achieve the goal (programming languages, frameworks)?
    • Any previous open-source projects (or even previous GSoC) you have contributed to?
    • Do you plan to have any other commitments during SoC that may affect you work? Any vacations/holidays planned? Please be specific as much as you can.
  • If you apply to more than one GSoC project, especially if you also apply for a project in another organization, specify which project has your preference. In case two organizations choose to accept your applications, we can then give you the project that is most important to you. Preferring the project of another organization will not influence our decision whether to accept your application.
  • Contact details (very important!)