The first article in this series about deep neural network basics offers a unique perspective on the development and application of machine learning (ML) by comparing classic computational methods with modern ML approaches. Using the example of calculating a sine function – a basic trigonometric function – we highlight the difference between the conventional way of solving problems in the classic software approach and how to implement the calculation with deep neural networks. We’ll also implement the example in Kotlin, and in doing so, we’ll consider Kotlin-specific topics.

STAY TUNED!

Learn more about JAX London

In 1969, the same year the Apollo 11 mission went down in history as the first manned moon landing, the book “Perceptron” by Marvin Minsky and Seymour Papert was published [1]. This book proved that simple neural networks are unable to learn XOR functions. This result led to a decline in interest about neural networks that lasted until the 1980s, known as the AI winter. During this period, neural networks were replaced by other techniques such as expert systems and symbolic AI methods. It wasn’t until the 1990s, when the backpropagation technique was introduced for efficient calculation of gradients, that neural networks experienced a comeback.

Man versus machine

But if we go back to the Apollo mission in 1969, navigation was one of the most important tasks performed by the Apollo Guidance Computer (AGC). This included calculating sine functions. The sine value of an angle in a right-angled triangle in the unit circle corresponds to the length of the opposite side (Fig. 1). There are several methods for calculating this value numerically. Conventional series expansions like the Taylor series are expensive in terms of CPU computing power, while tables of pre-calculated values take up valuable memory space. The AGC, clocked at around 2 MHz, could perform around 85,000 additions per second. The memory size of 2,048 words for RAM and 36,864 words for ROM was another challenge that the engineers of the Apollo mission overcame, along with the processor speed. They mastered the task brilliantly. The polynomial (Fig. 2) can be used to approximate the sine function accurately within a certain value range [2], [3]. In Figure 2, the difference between the actual sine function in blue and the approximation in red is visible. The approximation closely follows the sine curve in the angular range (0-π/2), but the values diverge outside this range. As long as the actual calculations only take place in the small range, as was the case for the Apollo mission, an otherwise very complex sine function calculation is possible with only a few multiplications and additions.

Fig. 1: Sine in the unit circle

Fig. 2: Apollo mission approximation with a polynomial

The Apollo mission developers wonderfully showed how people can solve complex problems. We’re currently living in times when computing power comparable to the Apollo Guidance Computer can be obtained for a few euros with a standard 8-bit microcontroller. Now we want to challenge ourselves a bit and see if calculating the sine function can also be implemented with a deep neural network.

Software 2.0

Solving software tasks with neural networks is often referred to as Software 2.0. The term is based on a blog post by Andrej Karpathy [4]. To understand what it’s about, first let’s describe the classic software approach. In his approach, given rules are defined in the form of source code and results are calculated from the given data. The result of our work is source code written by us. We’ve been doing it this way for quite a while and it works. Mostly. But it becomes difficult when the rules become more complex, change over time, or when the amount of data gets too large. In such cases, defining the rules and maintaining the written source code is difficult. The Software 2.0 approach turns it around. Here, the rules aren’t defined, but are instead learned through a training process on given data. The result is a model (Fig. 3). Specifically, the input data is an angle value and the learned rule is the approximated sine function.

Fig. 3: Traditional software development and software 2.0

Why Kotlin?

Why do we want to implement a neural network in Kotlin at all? Python has already taken its spot in the machine learning field. Python, with all its advantages and deficits, offers a high level of acceptance in the data science and machine learning community and a broad ecosystem of libraries, frameworks, and tools. The sheer amount of learning material in books, blog posts, and videos makes it very easy to get started with the topic. There are also other specialized programming languages suitable for solving certain tasks (C and C++, Rust, Go for fast computation and portability) or that are popular in different circles, like Julia for research. However, the immense effort that Modular is putting into the development of the Mojo programming language [5] proves there’s a need for new programming languages focusing on AI.

Kotlin is a modern programming language developed by JetBrains [6]. It’s a statically typed language that originally runs in the JVM and is interoperable with Java. Kotlin offers many advantages over Java, such as a shorter and more concise syntax, improved type safety, and better support for functional programming. Kotlin is also the preferred language for developing Android apps. But it’s not just the syntax that makes Kotlin interesting for machine learning. It also offers excellent support for developing embedded DSLs (domain-specific languages). We can take advantage of this feature when we want to develop specialized libraries for machine learning. The Kotlin compiler plug-ins are another ace up its sleeve. This is a very powerful tool that extends the Kotlin compiler’s capabilities.

Not to be left behind are Kotlin Multiplatform, Kotlin Native and Compose Multiplatform. These technologies may not have a direct relationship to ML models, but they play an important role when it comes to portability or applications combining ML with user interfaces or ones that run on other platforms like mobile devices or IoT. Kotlin can already prove its worth in the backend and cloud area.

Kotlin expertise is also being directly expanded in the field of data science and machine learning. Besides the Kotlin/JVM-native libraries for mathematics, statistics or plotting, Kotlin Jupyter is an interactive environment that’s well-known in the Python world. It combines the flexibility of Jupyter notebooks with the strengths of Kotlin – as a plug-in in JetBrains products like IntelliJ – or integrated in cloud products. It is implemented as a Jupyter kernel and is available in the PyPI archive and Conda as a package, allowing it to integrate with Jupyter notebooks or JupyterLab.

Neural networks

To implement our sinus approximation as a deep neural network in Kotlin, first we need to understand the basic concepts. The first is the neuron (Fig. 4). In the broader context of artificial neural networks, a neuron is a general concept and can be part of simple or complex architectures. A neuron mimics the biological neuron in the human brain. The weighted contribution of each input value is added in the neuron, corrected by a bias value and adjusted by an activation function. The result is passed on to all connected neurons. The activation function in neurons cannot be linear. We can describe this process with a formula (Fig. 5). K is the activation function and n corresponds to the number of values influencing the neuron. We can also convert this formula into Kotlin in a simplified form (Listing 1).

Figure 4: Symbolic illustration of a neuron

Fig. 5: Neuron as a formula

Listing 1

class Neuron(private val weights: FloatArray, private val bias: Float, val activation: (Float) -> Float) {
  fun forward(inputs: FloatArray): Float {
    val sum = inputs.foldIndexed(0.0f) { i, acc, input ->
      acc + (input * weights[i])
    }
    return activation(sum + bias)
  }
}

r is another concept of how neurons are organized together. A layer is a group of neurons working together in a neural network. A layer can consist of any number of neurons. From our point of view, it is interesting that – just like a neuron – a layer can also be considered a function. In our later implementation, the function is usually called forward. The function takes the input, applies weights and activation for each neuron in the layer, and passes the output to connected neurons in the layer. By changing the number of layers or choosing different activation functions, we can improve the model’s performance. These parameters are known as hyperparameters. If all neurons in the subsequent layers are connected, we speak of a fully connected layer. Networks with these layers are also known as feed-forward networks. For example (Fig. 6), consider a small neural network with two inputs: a hidden layer with three neurons and one output. Figure 7 shows how the values (simplified without bias this time) are calculated. This calculation process, which is also called forward propagation, is an important step in training a neural network. In it, the inputs flow through the network to produce a prediction. Ultimately, the same thing happens during inference, in which case we speak of a finished model with unchanged weights. In this case, predictions or classifications are made based on the inputs.

Fig. 6: Feed-forward network with weights

Fig. 7: Calculation formula with weights

With the necessary building blocks in place, we can now build our network for approximating the sine function in Kotlin. It can consist of two hidden layers, each with 16 neurons in each hidden layer. The input layer has only one neuron with the angle value as input. Similarly, the output layer returns a sine value as output with only one neuron.

An implementation in Kotlin is obvious and easy to do (Listing 2).

Listing 2

data class Layer(val neurons:List<Neuron>)

class DenseNet(private val layers: List<Layer>) {

  fun forward(inputData: FloatArray): FloatArray {
    var input = inputData
    layers.forEach { layer ->
      if (layer.neurons.isNotEmpty()) {
        val layerResult = FloatArray(layer.neurons.size)
        layer.neurons.forEachIndexed { index, neuron: Neuron ->
          layerResult[index] = neuron.forward(input)
        }
        input = layerResult
      }
    }
    return input
  }
}

Kotlin and DSL

The ability to define and use DSLs in Kotlin is one of its strengths. (For the curious reader, I recommend doing some research on the topic – there are many sources that provide a lot of detail.) We’ll look at whether and how our sine approximation works and show how elegant source code written with a DSL will be read (Listing 3). Though it may not look like it at first glance, it’s valid for Kotlin. We will use dataframe [8] and kandy [9] for this. These are libraries written by JetBrains for data scientists. The result can be seen in Figure 8. The red line represents the deviation between the actual and the neural network-calculated value. It runs along the zero, indicating that our neural network that we wrote from scratch in Kotlin works. The complete implementation can be found on GitHub [10], both as a JVM project and as a Kotlin Jupyter Notebook.

Fig. 8: Sine values calculated and approximated

Listing 3

df.plot {
  line {
    x("x")
    y("y")
    color("mode") {
      scale = categorical("sin" to Color.PURPLE, 
                          "nn" to Color.ORANGE, 
                          "error" to Color.RED)
    }
    width = 1.5
  }
}

Result

In this article, we took our first steps to implement a simple neural network in Kotlin. We looked at the basics and implemented a sine value approximation as a deep neural network. This time, humans clearly won. The Apollo mission’s polynomial solution is superior to our implementation in several respects including the mathematical operations and the significantly higher memory consumption for all the weights of the respective neurons.

The next article in this series will focus on another one of the fundamental functions of neural networks: training.

Referenes

[1] Minsky, Marvin: “Perceptrons: An Introduction to Computational Geometry”; Seymour Papert, 1969

[2] Commented source code: “Apollo 11 implementation of trigonometric functions”: https://fermatslibrary.com/s/apollo-11-implementation-of-trigonometric-functions

[3] Source code: “Apollo 11 implementation of trigonometric functions”: https://github.com/chrislgarry/Apollo-11/blob/27e2acf88a6345e2b1064c8b006a154363937050/Luminary099/SINGLE_PRECISION_SUBROUTINES.agc

[4] Karpathy, Andrej: “Software 2.0”: https://karpathy.medium.com/software-2-0-a64152b37c35

[5] https://www.modular.com/max/mojo

[6] https://kotlinlang.org

[7] https://github.com/michalharakal/ml-magazine-articles/tree/main/SinusNNFromScratch

[8] https://kotlin.github.io/dataframe/overview.html

[9] https://kotlin.github.io/kandy/welcome.html

[10] https://github.com/michalharakal/ml-magazine-articles/tree/main/SinusNNFromScratch