The Scala community has evolved from a mostly academic and research-oriented group, with early champions such as Twitter and Foursquare, to a language that has become a standard for businesses, start-ups, and universities alike.
However, while companies and individuals use Scala to develop their new ideas, they also use other excellent tools such as Play Framework, Akka, Apache Spark, and Kafka…which are not only some of the top trending tools and projects on the market right now, but were also purposefully built-in Scala (for many reasons…).
So, to respect these outstanding Scala-based technologies, we thought we’d spotlight a few favorites.
Big Fast Data and Scala
Although MapReduce has been the officially approved Big Data engine for authoring all compute tasks in Hadoop, its inability to handle event stream processing, a complex API, and current shifts in consumer behavior have sparked interest in alternatives. Scala has dominated the field of “Fast” Data, which some refer to as the next generation of computation engines that focus on the speed of data processing rather than the batch size, and the capacity to handle event streams in real-time. Spark, Scalding, Kafka, and Samza are a few famous instances of this trend that quickly gain recognition and application.
Over the last few years, the Hadoop community has concluded that a comprehensive replacement for MapReduce is required. While MR has been beneficial to the community, it is already a decade old, and the limitations noted above must be addressed. Cloudera, the largest Hadoop provider, formally adopted Apache Spark as the successor in late 2013. The majority of the other Hadoop suppliers have now followed suit.
Simply said, Spark is a high-performance, general-purpose engine for large-scale data processing. It was developed at UC Berkeley’s AMPLab as an alternative to Hadoop’s two-stage MapReduce model. Spark’s fine-grained operators, in-memory caching of intermediate data, and dataflow optimizations boost performance for some workloads up to 100 times quicker. Typesafe is a major Spark enthusiast, and we provide Spark support and training to our clients. Spark streaming is also prevalent, as it meets most event streaming cases’ demands.
Scalding (by Twitter)
Because the Mathematics perspective matches common analytical scenarios, functional programming is a good fit for Big Data. The requirement for a Hadoop API that exposes usable operations and a dataflow processing model prompted Twitter to create Scalding, a Scala API that sits on top of Cascading, which offers a higher-level Java API on top of MapReduce that exposes useful operations and a dataflow processing model. Scalding gives you access to all of Scala’s syntax and functional operations.
Apache Kafka, a publish-subscribe messaging system reimagined as a distributed commit log, is likewise implemented in Scala and exemplifies the language’s versatility. It is at the heart of Linkedin’s infrastructure, processing hundreds of megabytes of read-write traffic every second from thousands of clients.
Yahoo created and open-sourced a Kafka Manager console, which several of their organizations utilize, including the Media Analytics team. It is developed in Scala as well, with the web console created with the Play Framework. The console interacts with an actor-based, in-memory model created using Akka and Apache Curator behind the scenes.
Apache Samza is a framework for distributed stream processing. It communicates using Kafka and provides fault tolerance, processor isolation, security, and resource management using Apache Hadoop YARN.
Scala for Performance and Concurrency
Finagle (by Twitter)
Finagle focuses on what is likely Scala’s finest use case: building services with high scalability through concurrency and effective use of system resources. It was created in Scala by our Twitter friends and had Scala and Java idiomatic APIs.
Finagle is a JVM-based extendable RPC system used to build high-concurrency servers. It offers consistent client and server APIs for various protocols and is optimized for performance and concurrency. The majority of Finagle’s code is protocol agnostic, making it easier to add new protocols.
For Scala’s high concurrency technologies, Akka is a very fast, extremely concurrent framework for developing distributed systems. Akka handles a lot of low-level IO operations and code that used to make web app developers throw up their hands and scream, “just throw more boxes at it.” Typesafe co-founder, CTO, and jazz enthusiast Jonas Bonér created Akka, an essential component of the Typesafe Reactive Platform.
Akka, like Finagle, is developed in Scala and has both Scala and Java APIs.
Other interesting information
ADAM is a genomics processing engine and customized file format created with Apache Avro, Apache Spark, and Parquet. Because entire genome files are typically quite big, ADAM assists in crunching petabytes of population data required for rapid analysis, which aids in addressing the life-or-death circumstances that specialists in the area frequently confront.
Lichess is a chess hobby program that allows thousands of concurrent chess games on a single server. “It’s a Play application that’s open-source and leverages virtually every single feature that Play offers,” Typesafer Will Sargent says. The majority of the time, it is based on a snapshot build. Not sure how he does it, but it makes for some exciting source code reading.”