Scala
created: 18 January 2021
revision: 1
Equality
- Any:
==
delegates toequals
- AnyRef:
new String("A") == new String("A") // true in Scala, false in Java
new String("A").equals(new String("A")) // true in both Scala and Java
new String("A") eq new String("A") // false in Scala, eq() is from AnyRef
Math
map()
does not preserve associativityreduce()
must be associative- floating point is not associative
(d1+d2)+d3 != d1+(d2+d3)
- (Fatin, n.d.) Write pure functions with no side effects. Otherwise, isolate side effects from code.
Notes
- (Prokopec, 2017) As turning a single field case class into a value class is as trivial as extending the AnyVal trait, we recommend that you always use AnyVal wherever possible. The overhead is quite low, and it generate high benefits in terms of garbage collection’s performance. To learn more about value classes, their limitations, and use cases, you can find detailed descriptions at http://docs.scala-lang.org/overviews/core/value-classes.html
-
[other] functional literal syntax example
def check[A](condition: Boolean, onTrue: () => A, onFalse: () => A): A = { if(condition) onTrue() else onFalse() } // () => A is a syntactic alias for Funtion0[A] check(a < 22, () => println("a"), () => println("b"))
Implicit
Implicit Parameters
- (Smith, 2016)* Implicit parameters are method parameters which do not have to be explicitly passed to the method when it is called. If they’re missing, the compiler will look in the surrounding scope for something that fits the bill.
def multiply(x: Int)(implicit y: Int) = x * y
multiply(3)(10) // 30
multiply(4)(10) // 40
multiply(3) // error: could not find implicit value for parameter factor: Int
implicit val z: Int = 10
multiply(3) // 30
multiply(4) // 40
// IN CASE OF TWO IMPLICIT VARS => ERROR
implicit val z2: Int = 11
multiply(3) // error: ambiguous implicit values:
Implicit Conversions
- (Smith, 2016)* Scala compiler will look for opportunities to do an implicit conversion whenever there is a type mismatch.
def alert(msg: String): Unit = println(msg)
alert(7) // error: type mismatch; found: Int(7); required: String
implicit def intToString(i: Int): String = i.toString
alert(7) // 7
// So what’s actually happening on the last line there is:
alert(intToString(7))
- (Smith, 2016)* The compiler will also look for opportunities to implicitly convert types when code tries to access an object member which is not defined for that type.
3.chat // error: value chat is not a member of Int
class LoquaciousInt(x: Int) {
def chat: Unit = for(i <- 1 to x) println("Hi!")
}
implicit def intToLoquaciousInt(x: Int) = new LoquaciousInt(x)
3.chat
// Hi!
// Hi!
// Hi!
- (Smith, 2016)* Shorthand for the above - implicit class and type enrichment
implicit class LoquaciousInt(x: Int) {
def chat: Unit = for(i <- 1 to x) println("Hi!")
}
3.chat
// Hi!
// Hi!
// Hi!
- (Smith, 2016)* Life with type classes.
// INTERFACE OBJECT SYNTAX
// the type class (actually an i-face)
trait Chat[A] { def chat(x: A): String }
object ChattyAddons {
// the type class instance PersonChat
implicit object PersonChat extends Chat[Person]{
def chat(x: Person) = s"Hi, I'm ${x.firstName}"
}
// the type class instance DogChat
implicit object DogChat extends Chat[Dog]{
def chat(x: Dog) = s"Woof, my name is ${x.name}"
}
}
// the type interface
object ChatUtil{
def chat[A](x: A)(implicit chattyThing: CanChat[A]) = chattyThing.chat(x)
}
// in another package:
import ChattyAddons._
ChatUtil.chat(Person("John", "Smith"))
ChatUtil.chat(Dog("Meg"))
- (Smith, 2016)* … plus implicit conversion
// INTERFACE SYNTAX
object ChattyAddons {
implicit object PersonCanChat extends CanChat[Person] {
def chat(x: Person) = s"Hi, I'm ${x.firstName}"
}
implicit object DogCanChat extends CanChat[Dog] {
def chat(x: Dog) = s"Woof, my name is ${x.name}"
}
implicit class ChatUtil[A](x: A) {
def chat(implicit makesChatty: CanChat[A]) = {
makesChatty.chat(x)
}
}
}
// in another package...
import ChattyAddons._
Person("John", "Smith").chat
Dog("Meg").chat
Functional
- In a functional language, functions are treated the same way as data. They can be stored in objects the same way as integers or strings, returned from functions, and passed to other functions. (In Java) these two entities are separate and treated quite differently by the Java compiler.
- View Bound -
T <% V
requires the existence of an implicit conversion fromT
toV
. A context bound has the formT : M
, whereM
is another generic type. It requires that there is an “implicit value” of typeM[T]
.
class Pair[T <: Comparable[T]](val first: T, val second: T) {
def smaller = if (first.compareTo(second) < 0) first else second }
- View Bound - Type constraints give you another way of restricting types. There are three relationships that you can use:
T =:= U
T <:< U
T <%< U
. These constraints test whetherT
equalsU
, is a subtype ofU
, or is view-convertible toU
. To use such a constraint, you add an “implicit evidence parameter”.
class Pair[T](val first: T, val second: T)(implicit ev: T <:< Comparable[T])
- Structural Type - Under the hood Scala uses reflection. Only use structural typing when you model common behavior from classes that cannot share a trait. In the example you can call the
appendLines
method with an instance of any class that has an append method:
def appendLines(
target: { def append(str: String): Any },
lines: Iterable[String]
)
{for (l <- lines)
{target.append(l);
target.append("\n")
}
}
- Structural types are similar to “duck typing” in dynamically typed programming languages such as JavaScript or Ruby. In those languages, variables have no type. When you write
obj.quack()
, the runtime figures out whether the particular object to which obj refers at this point has a quack method. In other words, you don’t have to declare obj as a Duck as long as it walks and quacks like one. - lambda
val twice: Int => Int = (x: Int) => x * 2
, where the(x: Int)
part is the argument to the lambda, andx * 2
is its body. TheInt => Int
express the type of the lambda.
Partially applied function
- (Sinisa, n.d.) if we have a function that takes three parameters, x, y and z, we can only apply the first one and as a result get a function of two parameters. Or we can apply the first two and get a function of only one parameter. For example, having a function that takes two integers and adds them, we can apply only the first one, e.g. 42, and as a result we will get a function that adds the input number to 42:
// Note that if you don’t explicitly define add42 to be of type Int => Int, you will need to explicitly define the type of the unused parameter: add(42, _: Int).
val add: (Int, Int) => Int = (a: Int, b: Int) => a + b
val add42: Int => Int = add(42, _)
println(add42(8)) // 50
- (Sinisa, n.d.) Currying is a similar principle. Main idea with currying is to separate a function of n parameters into n functions of one parameter… Note that if we curry a function, there’s no need for the underscore thingy; with partially applied functions it’s needed because otherwise compiler will rightfully complain that add() takes two parameters, but with currying we are simply providing only the first parameter, which is completely valid:
val add: Int => Int => Int = (a: Int) => (b: Int) => a + b
val add42: Int => Int = add(42)
println(add42(8)) // 50
val add42and8: Int = add(42)(8)
println(add42and8) // 50
- (Sinisa, n.d.) There is a difference between methods with no parenthesis and methods with empty parenthesis: former are basically values, but re-evaluated upon every access, while latter are methods as we know them. Invoke them as they are defined (if there are empty parenthesis in definition, put them in invocation too; this way you don’t have to remember what happens if f is invoked as f() and vice versa).
Stackable Trait
- (Venners, 2009) Stackable traits decorate the core traits at compile time, similar to the way decorator objects modify core objects at run time in the decorator pattern.
- Order of applying is from rightmost to leftmost.
- The trait has a super call on a method declared
abstract
. Such calls are illegal for normal classes, because they will certainly fail at run time. For a trait, however, such a call can actually succeed. Since super calls in a trait are dynamically bound, the super call in traitFiltering
will work so long as the trait is mixed in after another trait or class that gives a concrete definition to the method. This arrangement is frequently needed with traits that implement stackable modifications. To tell the compiler you are doing this on purpose, you must mark such methods asabstract override
. This combination of modifiers is only allowed for members of traits, not classes, and it means that the trait must be mixed into some class that has a concrete definition of the method in question.
- (Venners, 2009) example
// BASE
abstract class IntQueue { def get(): Int; def put(x: Int) }
// CORE
class BasicIntQueue extends IntQueue{
private val buf = new ArrayBuffer[Int]
def get() = buf.remove(0)
def put(x: Int) { buf += x}
}
// STACKABLE
trait Incrementing extends IntQueue {
abstract override def put(x: Int) { super.put(x+1) }
}
trait Filtering extends IntQueue {
abstract override def put(x: Int) { if(x >= 0) super.put(x) }
}
// USAGE 1 : first filtering is applied, then incrementing
val queue = (new BasicIntQueue with Incrementing with Filtering)
queue.put(-1); queue.put(0); queue.put(1)
queue.get() // 1
queue.get() // 2
// USAGE 2 : reverse order of stackables => first incrementing and then filtering
val queue = (new BasicIntQueue with Filtering with Incrementing)
queue.put(-1); queue.put(0); queue.put(1)
queue.get() // 0
queue.get() // 1
queue.get() // 2
Parallel computations
- For parallelism an unevaluated computation should be passed (call by name)
def myParallel[A, B](taskA: => A, taskB: => B): (A, B) = {...} // by name
def myNonParallel[A, B](taskA: A, taskB: B): (A, B) = {...} // by value
- both are equivalent
val (x, y) = parallel(doSomething(a,b,c,d), doSomething(a,b,c+z,d+z))
power(x + y, 1/p)
val t1 = task{doSomething(a,b,c,d)}
val t2 = task{doSomething(a,b,c+z,y+z)}
power(t1 + t2, 1/p)
- join() call
def parallel[A, B](ca: => A, cb: => B): (A, B) = {
val tB: Task[B] = task{cb}
val tA: A = ca
(ta, tb.join) // correct for parallel
}
def parallelWrong[A, B](ca: => A, cb: => B): (A, B) = {
val tB: Task[B] = task{cb}.join // incorrect for parallel
val tA: A = ca
(ta, tb)
}
- Loop:
for (i <- (0 until 100).par) print(i + " ")
- the numbers are printed in the order they are produced by the threads working on the task. In a for/yield loopfor (i <- (0 until 100).par) yield i + " "
the results are assembled in order. - (Theron, 2016) Operations on parallelizable collections are fast, as the newly created structure from
.par
shares the same underlying data setArray, ArrayBuffer,
mutableHashMap
andHashSet, Range, Vector,
immutableHashMap
andHashSet,
and concurrentTrieMap
. - (Theron, 2016) Other Scala collections need to be converted to their parallel counterparts upon calling
.par
. Callingpar
on non-parallelizable collections entails copying their elements into a new collection. Importantly, the conversion from a sequential collection to a parallel one is not itself parallelized, and is a possible sequential bottleneck.- For example, calling par on List takes 55 milliseconds on our machine, whereas calling par on Vector takes 0.025 milliseconds.
object ParNonParallelizableCollections extends App {
val list = List.fill(1000000)("")
val vector = Vector.fill(1000000)("")
log(s"list conversion time: ${timed(list.par)} ms")
log(s"vector conversion time: ${timed(vector.par)} ms")}
- (Theron, 2016) Tip: Converting a non-parallelizable sequential collection to a parallel collection is not a parallel operation; it executes on the caller thread.
- Tip Treat operations invoked on a generic collection type as if they are parallel.
- (Theron, 2016) Note: Binary operators used in parallel operations do not need to be commutative.
- (Theron, 2016) Tip: Make sure that binary operators used in parallel operations are associative.
- in CATS
|@|
for parallel computations
// sequential
for{
a <- Future(1)
b <- Future(2)
c <- Future(3)
} yield a + b + c
// parallel
import cats.implicits._
(Future(1) |@| Future(2) |@| Future(3)).map( _ + _ + _)
Concurrent
- Use promises to bridge the gap between callback-based APIs and futures.
- Use promises to extend futures with additional functional combinators.
- Use promises to implement cancellation, or any other form of two-way communication between the client and the asynchronous computation.
- (Theron, 2016) The lowest cost development option is to replace unnecessarily costly Future creation with the use of
Future.success
orFuture.failure
. The order submission web service took advantage of these factory methods to lift values into aFuture
. As the value is already computed, these factory methods avoid submitting any tasks to the Executor that are referenced by the providedExecutionContext
.- Replacing usages of
Future.apply
with eitherFuture.successful
orFuture.failure
when the value is already computed can yield cost savings.
- Replacing usages of
- (Theron, 2016) All the callers are importing the global
ExecutionContext
to be implicitly used by the method. The default thread pool is backed by aForkJoinPool
, and it is sized based on the available cores on the machine. As such, it is CPU-bound and designed to handle nonblocking, CPU intensive operations. This is a good choice for applications that do not perform blocking calls. However, if your application runs blocking calls asynchronously (that is, in a Future execution), relying on the defaultExecutionContext
will most likely quickly degrade performance. - (Theron, 2016) Iterators on most concurrent collections are weakly consistent. This means that they are not guaranteed to correctly traverse the data structure if some thread concurrently updates the collection during traversal.
- (Prokopec, 2017) Unlike Java, Scala allows you to declare local fields volatile (in this case, local to the closure of the enclosing for loop). A heap object with a volatile field is created for each local volatile variable used in some closure or a nested class. We say the variable is lifted into an object.
Ordering
import math.Ordering
def msort[T](xs: List[T])(implicit ord: Ordering) = { ...}
msort(fruits)(Ordering.String)
msort(fruits) // the compiler figures out the right ordering
- (Fatin, n.d.) Perform reverse sorting in one step, avoiding tmp collections and additional transformation steps:
// Before
seq.sorted.reverse
seq.sortBy(_.property).reverse
seq.sortWith(f(_, _)).reverse
// After
seq.sorted(Ordering[T].reverse)
seq.sortBy(_.property)(Ordering[T].reverse)
seq.sortWith(!f(_, _))
- (Fatin, n.d.) Do not use sorting to find smallest/largest element, so a tmp collection is not created.:
// Before
seq.sorted.head
seq.sortBy(_.property).head
seq.sorted.last
seq.sortBy(_.property).last
// After
seq.min
seq.minBy(_.property)
seq.max
seq.maxBy(_.property)
Mutators
- At any time, you can redefine the getter and setter methods yourself. For example,
class Person {
private var privateAge = 0 // Make private and rename
def age = privateAge
def age_=(newValue: Int) {
if (newValue > privateAge) privateAge = newValue; // Can’t get younger
}
}
- TIP: It may sound scary that Scala generates getter and setter methods for every field. But you have some control over this process.
- If the field is private, the getter and setter are private.
- If the field is a val, only a getter is generated.
- If you don’t want any getter or setter, declare the field as private[this].
- Overriding restrictions
- A def can only override another def.
- A val can only override another val or a parameterless def.
- A var can only override an abstract var
- TIP: You can debug construction order problems with the
-Xcheckinit
compiler flag. This flag generates code that throws an exception (instead of yielding the default value) when an uninitialized field is accessed.
Laziness
- Laziness is not cost-free. Every time a lazy value is accessed, a method is called that checks, in a threadsafe manner, whether the value has already been initialized.
val words = scala.io.Source.fromFile("/.../words").mkString // Evaluated as soon as words is defined
lazy val words = scala.io.Source.fromFile("/.../words").mkString // Evaluated the first time words is used
def words = scala.io.Source.fromFile("/.../words").mkString // Evaluated every time words is used
- (Prokopec, 2017) Avoid cyclic dependencies between lazy values, as they can cause deadlocks.
- (Prokopec, 2017) Cyclic dependencies between lazy values are unsupported in both sequential and concurrent Scala programs. The difference is that they potentially manifest themselves as deadlocks instead of stack overflows in concurrent programming.
- (Prokopec, 2017) Never invoke blocking operations inside lazy value initialization expressions or singleton object constructors.
- (Prokopec, 2017) It is a good practice to initialize the lazy value with an expression that does not depend on the current state of the program.
Streams
- To avoid Stream memoization, it is good practice to avoid storing a Stream in a val. Using a val creates a permanent reference to the head of the Stream, ensuring that every element that is realized will be cached. If a Stream is defined as a def, it can be garbage collected as soon as it is no longer needed.
Views
- Stream methods are computed lazily, delivering results only when they are needed. You can get a similar effect with other collections by applying the
view
method. This method yields a collection on which methods are applied lazily. For example:val powers = (0 until 1000).view.map(pow(10, _))
yields a collection that is unevaluated. (Unlike a stream, not even the first element is evaluated.) When you callpowers(100)
thenpow(10, 100)
is computed, but the other powers are not. Unlike streams, these views do not cache any values. If you callpowers(100)
again,pow(10, 100)
is recomputed - (Theron, 2016) When a view applies transformations, it applies all transformations to each element rather than applying each transformation to all elements… By applying all transformations in one step, the view is able to return the first two elements without evaluating the entire collection. Here, we see the potential performance gains from view usage due to lazy evaluation.
- (Theron, 2016) When transforming a large collection, 1,000,000 elements in our benchmark, a view is more efficient with an increasing differential as the number of transformations increases. For example, with 1,000,000 elements and two transformations, views deliver approximately triple the throughput of List. In the case of a medium size collection, such as 1,000 elements in this example, this is not as clear-cut. When performing a single transformation, an eager List performs better, while a view delivers better throughput when applying more than one transformation.
- (Theron, 2016) A second axis of performance to consider is the nature of the transformation. Transformations that benefit from early termination (for example, find), benefit strongly from lazy evaluation. This benchmark illustrates that it is important to understand the size of your data and the transformations that you intend to perform.
Collections
- (Fatin, n.d.) Prefer
length
tosize
for arrays.Array.size
is implemented via implicit conversion and intermediate wrapper objects are created every call. - (Fatin, n.d.) Do not compute length for empty check. Use
seq.nonEmpty
andseq.isEmpty
instead. - (Fatin, n.d.) Do not compute full length for matching. Use
seq.lengthCompare(n) > 0
- (Fatin, n.d.) Do not use
exists
for emptyness check. Useseq.nonEmpty
. - (Fatin, n.d.) Do not rely on
==
for arrays compare. Usearray1.sameElements(array2)
.==
is always false for different instances. - (Fatin, n.d.) Do not use
sameElements
for collections compare. Useseq1 == seq2
. - (Fatin, n.d.) Retrieve first/last elements with
seq.head
andseq.last
. - (Fatin, n.d.) Prefer
exists
orcontains
for existence and absence. - (Fatin, n.d.) Use
count(p)
instead offilter(p).length
. The call tofilter
creates an intermediate collection (which is not really needed) that takes heap space and loads GC. - (Fatin, n.d.) Merge consecutive
filter
calls, to avoid tmp collections:
seq.filter(p1).filter(p2) // Before
seq.filter(x => p1(x) && p2(x)) // After
- (Fatin, n.d.) Merge consecutive
map
calls, to avoid tmp collections:
seq.map(f).map(g) // Before
seq.map(f.andThen(g)) // After
- (Fatin, n.d.) !!! Do not create tmp collections… my use
view
when a lot of transformations are to be done.
Options
- (Fatin, n.d.) Do not use
None
andSome
to compare. PreferisEmpty
,isDefined
andcontains()
. - (Fatin, n.d.) Do not compare value with
null
. UseOption(v)
instead ofif(v != null) Some(v) else None
.
Data science & Math APIs
- Spark Notebooks - data manipulation
- Apache Zeppelin - data manipulation
- Saddle - high-perf data manipulation
- Algebird - algebra API
- Factorie - probabilistic models, relational graphs…
- Figaro - probabilistic programming
- Spire - numeric lib with polynomial, real, distributions…
- cfor macro
- Breeze - numeric processing, signal processing
- nettib-java
- Blitz - parallel collections - use for primitive data. It is a lot faster.
- Slick - Functional relational mapping
- Scalaz, Cats, scala-hamsters
- H2O - high perf in-memory compute engine for data analysis
- h20_ai - provides deep learning ???
- Spark + MLib + GraphX
- Kroy - serialization
- Miniboxing - http://scala-miniboxing.org/
References
- Fatin, P. Scala Collections Tips and Tricks. Retrieved January 21, 2021, from https://pavelfatin.com/scala-collections-tips-and-tricks/
- Horstmann, C. (2017). Scala for the impatient. Addison-Wesley.
- Prokopec, A. (2017). Learning concurrent programming in Scala. Packt Publishing.
- Sinisa, L. Sinisa Blog. In Medium. https://medium.com/@sinisalouc
- Smith, J. (2016). Implicits and type classes in Scala. In The Guardian. https://www.theguardian.com/info/developer-blog/2016/dec/22/parental-advisory-implicit-content
- Theron, V. (2016). Scala high performance programming. Packt Publishing Limited.
- Venners, B. (2009). Scala’s Stackable Trait Pattern. https://www.artima.com/scalazine/articles/stackable_trait_pattern.html