Equality

  • Any: == delegates to equals
  • AnyRef:
new String("A") == new String("A")      // true in Scala, false in Java
new String("A").equals(new String("A")) // true in both Scala and Java
new String("A") eq new String("A")      // false in Scala, eq() is from AnyRef

Math

  • map() does not preserve associativity
  • reduce() must be associative
  • floating point is not associative (d1+d2)+d3 != d1+(d2+d3)
  • (Fatin, n.d.) Write pure functions with no side effects. Otherwise, isolate side effects from code.

Notes

  • (Prokopec, 2017) As turning a single field case class into a value class is as trivial as extending the AnyVal trait, we recommend that you always use AnyVal wherever possible. The overhead is quite low, and it generate high benefits in terms of garbage collection’s performance. To learn more about value classes, their limitations, and use cases, you can find detailed descriptions at http://docs.scala-lang.org/overviews/core/value-classes.html
  • [other] functional literal syntax example

    def check[A](condition: Boolean, onTrue: () => A, onFalse: () => A): A = {
      if(condition) onTrue() else onFalse()
    }
    // () => A is a syntactic alias for Funtion0[A]
    check(a < 22, () => println("a"), () => println("b"))
    

Implicit

Implicit Parameters

  • (Smith, 2016)* Implicit parameters are method parameters which do not have to be explicitly passed to the method when it is called. If they’re missing, the compiler will look in the surrounding scope for something that fits the bill.
def multiply(x: Int)(implicit y: Int) = x * y
multiply(3)(10) // 30
multiply(4)(10) // 40
multiply(3) // error: could not find implicit value for parameter factor: Int
implicit val z: Int = 10
multiply(3) // 30
multiply(4) // 40

// IN CASE OF TWO IMPLICIT VARS => ERROR
implicit val z2: Int = 11
multiply(3) // error: ambiguous implicit values:

Implicit Conversions

  • (Smith, 2016)* Scala compiler will look for opportunities to do an implicit conversion whenever there is a type mismatch.
def alert(msg: String): Unit = println(msg)
alert(7) // error: type mismatch; found: Int(7); required: String

implicit def intToString(i: Int): String = i.toString
alert(7) // 7

// So what’s actually happening on the last line there is:
alert(intToString(7))
  • (Smith, 2016)* The compiler will also look for opportunities to implicitly convert types when code tries to access an object member which is not defined for that type.
3.chat // error: value chat is not a member of Int

class LoquaciousInt(x: Int) {
  def chat: Unit = for(i <- 1 to x) println("Hi!")
}

implicit def intToLoquaciousInt(x: Int) = new LoquaciousInt(x)
3.chat
// Hi!
// Hi!
// Hi!
  • (Smith, 2016)* Shorthand for the above - implicit class and type enrichment
implicit class LoquaciousInt(x: Int) {
  def chat: Unit = for(i <- 1 to x) println("Hi!")
}
3.chat
// Hi!
// Hi!
// Hi!
// INTERFACE OBJECT SYNTAX
// the type class (actually an i-face)
trait Chat[A] { def chat(x: A): String }
object ChattyAddons {
  // the type class instance PersonChat
  implicit object PersonChat extends Chat[Person]{
    def chat(x: Person) = s"Hi, I'm ${x.firstName}"
  }
  // the type class instance DogChat
  implicit object DogChat extends Chat[Dog]{
    def chat(x: Dog) = s"Woof, my name is ${x.name}"
  }
}
// the type interface
object ChatUtil{
  def chat[A](x: A)(implicit chattyThing: CanChat[A]) = chattyThing.chat(x)
}
// in another package:
import ChattyAddons._

ChatUtil.chat(Person("John", "Smith"))
ChatUtil.chat(Dog("Meg"))
// INTERFACE SYNTAX
object ChattyAddons {
  implicit object PersonCanChat extends CanChat[Person] {
    def chat(x: Person) = s"Hi, I'm ${x.firstName}"
  }
  implicit object DogCanChat extends CanChat[Dog] {
    def chat(x: Dog) = s"Woof, my name is ${x.name}"
  }
  implicit class ChatUtil[A](x: A) {
    def chat(implicit makesChatty: CanChat[A]) = {
      makesChatty.chat(x)
    }
  }
}

// in another package...

import ChattyAddons._

Person("John", "Smith").chat
Dog("Meg").chat

Functional

  • In a functional language, functions are treated the same way as data. They can be stored in objects the same way as integers or strings, returned from functions, and passed to other functions. (In Java) these two entities are separate and treated quite differently by the Java compiler.
  • View Bound - T <% V requires the existence of an implicit conversion from T to V. A context bound has the form T : M, where M is another generic type. It requires that there is an “implicit value” of type M[T].
class Pair[T <: Comparable[T]](val first: T, val second: T) {
   def smaller = if (first.compareTo(second) < 0) first else second }
  • View Bound - Type constraints give you another way of restricting types. There are three relationships that you can use: T =:= U T <:< U T <%< U. These constraints test whether T equals U, is a subtype of U, or is view-convertible to U. To use such a constraint, you add an “implicit evidence parameter”.
class Pair[T](val first: T, val second: T)(implicit ev: T <:< Comparable[T])
  • Structural Type - Under the hood Scala uses reflection. Only use structural typing when you model common behavior from classes that cannot share a trait. In the example you can call the appendLines method with an instance of any class that has an append method:
def appendLines(
  target: { def append(str: String): Any },
  lines: Iterable[String]
  )
    {for (l <- lines)
      {target.append(l);
       target.append("\n")
      }
    }
  • Structural types are similar to “duck typing” in dynamically typed programming languages such as JavaScript or Ruby. In those languages, variables have no type. When you write obj.quack(), the runtime figures out whether the particular object to which obj refers at this point has a quack method. In other words, you don’t have to declare obj as a Duck as long as it walks and quacks like one.
  • lambda val twice: Int => Int = (x: Int) => x * 2, where the (x: Int) part is the argument to the lambda, and x * 2 is its body. The Int => Int express the type of the lambda.

Partially applied function

  • (Sinisa, n.d.) if we have a function that takes three parameters, x, y and z, we can only apply the first one and as a result get a function of two parameters. Or we can apply the first two and get a function of only one parameter. For example, having a function that takes two integers and adds them, we can apply only the first one, e.g. 42, and as a result we will get a function that adds the input number to 42:
// Note that if you don’t explicitly define add42 to be of type Int => Int, you will need to explicitly define the type of the unused parameter: add(42, _: Int).
val add: (Int, Int) => Int = (a: Int, b: Int) => a + b
val add42: Int => Int = add(42, _)
println(add42(8)) // 50
  • (Sinisa, n.d.) Currying is a similar principle. Main idea with currying is to separate a function of n parameters into n functions of one parameter… Note that if we curry a function, there’s no need for the underscore thingy; with partially applied functions it’s needed because otherwise compiler will rightfully complain that add() takes two parameters, but with currying we are simply providing only the first parameter, which is completely valid:
val add: Int => Int => Int = (a: Int) => (b: Int) => a + b
val add42: Int => Int = add(42)
println(add42(8)) // 50
val add42and8: Int = add(42)(8)
println(add42and8) // 50
  • (Sinisa, n.d.) There is a difference between methods with no parenthesis and methods with empty parenthesis: former are basically values, but re-evaluated upon every access, while latter are methods as we know them. Invoke them as they are defined (if there are empty parenthesis in definition, put them in invocation too; this way you don’t have to remember what happens if f is invoked as f() and vice versa).

Stackable Trait

  • (Venners, 2009) Stackable traits decorate the core traits at compile time, similar to the way decorator objects modify core objects at run time in the decorator pattern.
    • Order of applying is from rightmost to leftmost.
    • The trait has a super call on a method declared abstract. Such calls are illegal for normal classes, because they will certainly fail at run time. For a trait, however, such a call can actually succeed. Since super calls in a trait are dynamically bound, the super call in trait Filtering will work so long as the trait is mixed in after another trait or class that gives a concrete definition to the method. This arrangement is frequently needed with traits that implement stackable modifications. To tell the compiler you are doing this on purpose, you must mark such methods as abstract override. This combination of modifiers is only allowed for members of traits, not classes, and it means that the trait must be mixed into some class that has a concrete definition of the method in question.
  • (Venners, 2009) example
// BASE
abstract class IntQueue { def get(): Int; def put(x: Int) }
// CORE
class BasicIntQueue extends IntQueue{
  private val buf = new ArrayBuffer[Int]
  def get() = buf.remove(0)
  def put(x: Int) { buf += x}
}
// STACKABLE
trait Incrementing extends IntQueue {
  abstract override def put(x: Int) { super.put(x+1) }
}
trait Filtering extends IntQueue {
  abstract override def put(x: Int) { if(x >= 0) super.put(x) }
}
// USAGE 1 : first filtering is applied, then incrementing
val queue = (new BasicIntQueue with Incrementing with Filtering)
queue.put(-1); queue.put(0); queue.put(1)
queue.get() // 1
queue.get() // 2
// USAGE 2 : reverse order of stackables => first incrementing and then filtering
val queue = (new BasicIntQueue with Filtering with Incrementing)
queue.put(-1); queue.put(0); queue.put(1)
queue.get() // 0
queue.get() // 1
queue.get() // 2

Parallel computations

  • For parallelism an unevaluated computation should be passed (call by name)
def myParallel[A, B](taskA: => A, taskB: => B): (A, B) = {...} // by name
def myNonParallel[A, B](taskA: A, taskB: B): (A, B) = {...} // by value
  • both are equivalent
val (x, y) = parallel(doSomething(a,b,c,d), doSomething(a,b,c+z,d+z))
power(x + y, 1/p)
val t1 = task{doSomething(a,b,c,d)}
val t2 = task{doSomething(a,b,c+z,y+z)}
power(t1 + t2, 1/p)
  • join() call
def parallel[A, B](ca: => A, cb: => B): (A, B) = {
  val tB: Task[B] = task{cb}
  val tA: A = ca
    (ta, tb.join) // correct for parallel
}
def parallelWrong[A, B](ca: => A, cb: => B): (A, B) = {
  val tB: Task[B] = task{cb}.join // incorrect for parallel
  val tA: A = ca
  (ta, tb)
}
  • Loop: for (i <- (0 until 100).par) print(i + " ") - the numbers are printed in the order they are produced by the threads working on the task. In a for/yield loop for (i <- (0 until 100).par) yield i + " " the results are assembled in order.
  • (Theron, 2016) Operations on parallelizable collections are fast, as the newly created structure from .par shares the same underlying data set Array, ArrayBuffer, mutable HashMap and HashSet, Range, Vector, immutable HashMap and HashSet, and concurrent TrieMap.
  • (Theron, 2016) Other Scala collections need to be converted to their parallel counterparts upon calling .par. Calling par on non-parallelizable collections entails copying their elements into a new collection. Importantly, the conversion from a sequential collection to a parallel one is not itself parallelized, and is a possible sequential bottleneck.
    • For example, calling par on List takes 55 milliseconds on our machine, whereas calling par on Vector takes 0.025 milliseconds.
object ParNonParallelizableCollections extends App {
    val list = List.fill(1000000)("")
    val vector = Vector.fill(1000000)("")
    log(s"list conversion time: ${timed(list.par)} ms")
    log(s"vector conversion time: ${timed(vector.par)} ms")}
  • (Theron, 2016) Tip: Converting a non-parallelizable sequential collection to a parallel collection is not a parallel operation; it executes on the caller thread.
  • Tip Treat operations invoked on a generic collection type as if they are parallel.
  • (Theron, 2016) Note: Binary operators used in parallel operations do not need to be commutative.
  • (Theron, 2016) Tip: Make sure that binary operators used in parallel operations are associative.
  • in CATS |@| for parallel computations
// sequential
for{
  a <- Future(1)
  b <- Future(2)
  c <- Future(3)
} yield a + b + c
// parallel
import cats.implicits._
(Future(1) |@| Future(2) |@| Future(3)).map( _ + _ + _)

Concurrent

  • Use promises to bridge the gap between callback-based APIs and futures.
  • Use promises to extend futures with additional functional combinators.
  • Use promises to implement cancellation, or any other form of two-way communication between the client and the asynchronous computation.
  • (Theron, 2016) The lowest cost development option is to replace unnecessarily costly Future creation with the use of Future.success or Future.failure. The order submission web service took advantage of these factory methods to lift values into a Future. As the value is already computed, these factory methods avoid submitting any tasks to the Executor that are referenced by the provided ExecutionContext.
    • Replacing usages of Future.apply with either Future.successful or Future.failure when the value is already computed can yield cost savings.
  • (Theron, 2016) All the callers are importing the global ExecutionContext to be implicitly used by the method. The default thread pool is backed by a ForkJoinPool, and it is sized based on the available cores on the machine. As such, it is CPU-bound and designed to handle nonblocking, CPU intensive operations. This is a good choice for applications that do not perform blocking calls. However, if your application runs blocking calls asynchronously (that is, in a Future execution), relying on the default ExecutionContext will most likely quickly degrade performance.
  • (Theron, 2016) Iterators on most concurrent collections are weakly consistent. This means that they are not guaranteed to correctly traverse the data structure if some thread concurrently updates the collection during traversal.
  • (Prokopec, 2017) Unlike Java, Scala allows you to declare local fields volatile (in this case, local to the closure of the enclosing for loop). A heap object with a volatile field is created for each local volatile variable used in some closure or a nested class. We say the variable is lifted into an object.

Ordering

import math.Ordering  

def msort[T](xs: List[T])(implicit ord: Ordering) = { ...}  
msort(fruits)(Ordering.String)  
msort(fruits)   // the compiler figures out the right ordering  
  • (Fatin, n.d.) Perform reverse sorting in one step, avoiding tmp collections and additional transformation steps:
  // Before
  seq.sorted.reverse
  seq.sortBy(_.property).reverse
  seq.sortWith(f(_, _)).reverse

  // After
  seq.sorted(Ordering[T].reverse)
  seq.sortBy(_.property)(Ordering[T].reverse)
  seq.sortWith(!f(_, _))
  • (Fatin, n.d.) Do not use sorting to find smallest/largest element, so a tmp collection is not created.:
  // Before
  seq.sorted.head
  seq.sortBy(_.property).head
  seq.sorted.last
  seq.sortBy(_.property).last

  // After
  seq.min
  seq.minBy(_.property)
  seq.max
  seq.maxBy(_.property)

Mutators

  • At any time, you can redefine the getter and setter methods yourself. For example,
class Person {
  private var privateAge = 0 // Make private and rename
  def age = privateAge
  def age_=(newValue: Int) {
    if (newValue > privateAge) privateAge = newValue; // Can’t get younger
  }
}
  • TIP: It may sound scary that Scala generates getter and setter methods for every field. But you have some control over this process.
    • If the field is private, the getter and setter are private.
    • If the field is a val, only a getter is generated.
    • If you don’t want any getter or setter, declare the field as private[this].
  • Overriding restrictions
    • A def can only override another def.
    • A val can only override another val or a parameterless def.
    • A var can only override an abstract var
  • TIP: You can debug construction order problems with the -Xcheckinit compiler flag. This flag generates code that throws an exception (instead of yielding the default value) when an uninitialized field is accessed.

Laziness

  • Laziness is not cost-free. Every time a lazy value is accessed, a method is called that checks, in a threadsafe manner, whether the value has already been initialized.
val words = scala.io.Source.fromFile("/.../words").mkString // Evaluated as soon as words is defined
lazy val words = scala.io.Source.fromFile("/.../words").mkString // Evaluated the first time words is used
def words = scala.io.Source.fromFile("/.../words").mkString // Evaluated every time words is used
  • (Prokopec, 2017) Avoid cyclic dependencies between lazy values, as they can cause deadlocks.
  • (Prokopec, 2017) Cyclic dependencies between lazy values are unsupported in both sequential and concurrent Scala programs. The difference is that they potentially manifest themselves as deadlocks instead of stack overflows in concurrent programming.
  • (Prokopec, 2017) Never invoke blocking operations inside lazy value initialization expressions or singleton object constructors.
  • (Prokopec, 2017) It is a good practice to initialize the lazy value with an expression that does not depend on the current state of the program.

Streams

  • To avoid Stream memoization, it is good practice to avoid storing a Stream in a val. Using a val creates a permanent reference to the head of the Stream, ensuring that every element that is realized will be cached. If a Stream is defined as a def, it can be garbage collected as soon as it is no longer needed.

Views

  • Stream methods are computed lazily, delivering results only when they are needed. You can get a similar effect with other collections by applying the view method. This method yields a collection on which methods are applied lazily. For example: val powers = (0 until 1000).view.map(pow(10, _)) yields a collection that is unevaluated. (Unlike a stream, not even the first element is evaluated.) When you call powers(100) then pow(10, 100) is computed, but the other powers are not. Unlike streams, these views do not cache any values. If you call powers(100) again, pow(10, 100) is recomputed
  • (Theron, 2016) When a view applies transformations, it applies all transformations to each element rather than applying each transformation to all elements… By applying all transformations in one step, the view is able to return the first two elements without evaluating the entire collection. Here, we see the potential performance gains from view usage due to lazy evaluation.
  • (Theron, 2016) When transforming a large collection, 1,000,000 elements in our benchmark, a view is more efficient with an increasing differential as the number of transformations increases. For example, with 1,000,000 elements and two transformations, views deliver approximately triple the throughput of List. In the case of a medium size collection, such as 1,000 elements in this example, this is not as clear-cut. When performing a single transformation, an eager List performs better, while a view delivers better throughput when applying more than one transformation.
  • (Theron, 2016) A second axis of performance to consider is the nature of the transformation. Transformations that benefit from early termination (for example, find), benefit strongly from lazy evaluation. This benchmark illustrates that it is important to understand the size of your data and the transformations that you intend to perform.

Collections

  • (Fatin, n.d.) Prefer length to size for arrays. Array.size is implemented via implicit conversion and intermediate wrapper objects are created every call.
  • (Fatin, n.d.) Do not compute length for empty check. Use seq.nonEmpty and seq.isEmpty instead.
  • (Fatin, n.d.) Do not compute full length for matching. Use seq.lengthCompare(n) > 0
  • (Fatin, n.d.) Do not use exists for emptyness check. Use seq.nonEmpty.
  • (Fatin, n.d.) Do not rely on == for arrays compare. Use array1.sameElements(array2). == is always false for different instances.
  • (Fatin, n.d.) Do not use sameElements for collections compare. Use seq1 == seq2.
  • (Fatin, n.d.) Retrieve first/last elements with seq.head and seq.last.
  • (Fatin, n.d.) Prefer exists or contains for existence and absence.
  • (Fatin, n.d.) Use count(p) instead of filter(p).length. The call to filter creates an intermediate collection (which is not really needed) that takes heap space and loads GC.
  • (Fatin, n.d.) Merge consecutive filter calls, to avoid tmp collections:
  seq.filter(p1).filter(p2) // Before
  seq.filter(x => p1(x) && p2(x)) // After
  • (Fatin, n.d.) Merge consecutive map calls, to avoid tmp collections:
  seq.map(f).map(g) // Before
  seq.map(f.andThen(g)) // After
  • (Fatin, n.d.) !!! Do not create tmp collections… my use view when a lot of transformations are to be done.

Options

  • (Fatin, n.d.) Do not use None and Some to compare. Prefer isEmpty, isDefined and contains().
  • (Fatin, n.d.) Do not compare value with null. Use Option(v) instead of if(v != null) Some(v) else None.

Data science & Math APIs

  • Spark Notebooks - data manipulation
  • Apache Zeppelin - data manipulation
  • Saddle - high-perf data manipulation
  • Algebird - algebra API
  • Factorie - probabilistic models, relational graphs…
  • Figaro - probabilistic programming
  • Spire - numeric lib with polynomial, real, distributions…
    • cfor macro
  • Breeze - numeric processing, signal processing
  • nettib-java
  • Blitz - parallel collections - use for primitive data. It is a lot faster.
  • Slick - Functional relational mapping
  • Scalaz, Cats, scala-hamsters
  • H2O - high perf in-memory compute engine for data analysis
    • h20_ai - provides deep learning ???
  • Spark + MLib + GraphX
  • Kroy - serialization
  • Miniboxing - http://scala-miniboxing.org/

References

  • Fatin, P. Scala Collections Tips and Tricks. Retrieved January 21, 2021, from https://pavelfatin.com/scala-collections-tips-and-tricks/
  • Horstmann, C. (2017). Scala for the impatient. Addison-Wesley.
  • Prokopec, A. (2017). Learning concurrent programming in Scala. Packt Publishing.
  • Sinisa, L. Sinisa Blog. In Medium. https://medium.com/@sinisalouc
  • Smith, J. (2016). Implicits and type classes in Scala. In The Guardian. https://www.theguardian.com/info/developer-blog/2016/dec/22/parental-advisory-implicit-content
  • Theron, V. (2016). Scala high performance programming. Packt Publishing Limited.
  • Venners, B. (2009). Scala’s Stackable Trait Pattern. https://www.artima.com/scalazine/articles/stackable_trait_pattern.html