Data Science In The Future — Learn data science with go!

Data Science In The Future — Learn data science with go!

Start learning data science with go — but should you?

Go is a statically typed language and not commonly used for Data Science such as Python or R. But it is worth trying. So let’s figure out how you can do Data Science in the Golang.

Requirements

You will need to have Go installed on your machine.

Getting Started

To get started, we create a new project, which is done as follows, you create a new file called main.go and you enter the following starter code:

package main
import (
 "fmt"
 "log"
 "os"
)
func main() {
}

And that’s all you need to get started. Let’s try to run it to see if we don’t get any errors with: go run main.go

Getting the Data

In most crash courses to learn Data Science in Python or R, we use the Iris dataset , and we will use it this time.

In Go, there is a default function for reading CSV files. We use the OS Module to do this. To read the CSV, we enter the following code into our main func() :

package main
import (
 "log"
 "os"
)
func main() {
 iris, err := os.Open("data/iris.csv")
 if err != nil {
  log.Fatal(err)
 }
}

Processing the Data

If you’ve done any work with Python or R for data science, you are certainly familiar with the concept of a DataFrame. To use DataFrames in Go, we use the dataframe module, let’s install it:

go get github.com/kniren/gota/dataframe

And now we can include it in our imports:

import (
 "log"
 "os"
 "dataframe"
)

To now actually, convert the data to a DataFrame use the following line of code:

df := dataframe.ReadCSV(iris)

Now print df to the console:

print(df)

This will output:

[150x5] DataFrame
sepal.length sepal.width petal.length petal.width variety
 0: 5.100000     3.500000    1.400000     0.200000    Setosa
 1: 4.900000     3.000000    1.400000     0.200000    Setosa
 2: 4.700000     3.200000    1.300000     0.200000    Setosa
 3: 4.600000     3.100000    1.500000     0.200000    Setosa
 4: 5.000000     3.600000    1.400000     0.200000    Setosa
 5: 5.400000     3.900000    1.700000     0.400000    Setosa
 6: 4.600000     3.400000    1.400000     0.300000    Setosa
 7: 5.000000     3.400000    1.500000     0.200000    Setosa
 8: 4.400000     2.900000    1.400000     0.200000    Setosa
 9: 4.900000     3.100000    1.500000     0.100000    Setosa
    ...          ...         ...          ...         ...
    <float>      <float>     <float>      <float>     <string>

Using the Data

Now, reading data from CSV is cool and all that, but now we really want to do something with this data frame, right?

Getting The Head of the Data

To acquire the head of the data frame, use the following method:

head := df.Subset([]int{0, 3})

This will look for the first three rows.

In the console, this would look like this:

[2x5] DataFrame
sepal.length sepal.width petal.length petal.width variety
 0: 5.100000     3.500000    1.400000     0.200000    Setosa
 1: 4.600000     3.100000    1.500000     0.200000    Setosa
    <float>      <float>     <float>      <float>     <string>

If Go can handle data manipulation decently, it might be worth exploring further, considering all non-data science-related things it holds over Python.

Let’s say you’re only interested in virginica species and want to store them to a specific variable. Here’s how you would do that:

virginica := df.Filter(dataframe.F{
 Colname:    "variety",
 Comparator: "==",
 Comparando: "Virginica",
})
fmt.Println(virginica)

With the following output:

[50x5] DataFrame
sepal.length sepal.width petal.length petal.width variety
 0: 6.300000     3.300000    6.000000     2.500000    Virginica
 1: 5.800000     2.700000    5.100000     1.900000    Virginica
 2: 7.100000     3.000000    5.900000     2.100000    Virginica
 3: 6.300000     2.900000    5.600000     1.800000    Virginica
 4: 6.500000     3.000000    5.800000     2.200000    Virginica
 5: 7.600000     3.000000    6.600000     2.100000    Virginica
 6: 4.900000     2.500000    4.500000     1.700000    Virginica
 7: 7.300000     2.900000    6.300000     1.800000    Virginica
 8: 6.700000     2.500000    5.800000     1.800000    Virginica
 9: 7.200000     3.600000    6.100000     2.500000    Virginica
    ...          ...         ...          ...         ...
    <float>      <float>     <float>      <float>     <string>

Conclusion

To conclude this article about trying Go for data science I would like to say that using Go for Data Science is possible but I would recommend Python, it’s much faster and lots easier.

Python is partially made for Data Science and Go just isn’t, so don’t use the wrong tools.

If you’re trying to learn about Data Science I would not recommend using Go to start with as the language itself is difficult enough since it’s statically typed and Python is not.

Originally Published on Medium