Data Science In The Future — Learn data science with go!
Start learning data science with go — but should you?
Go is a statically typed language and not commonly used for Data Science such as Python or R. But it is worth trying. So let’s figure out how you can do Data Science in the Golang.
Requirements
You will need to have Go installed on your machine.
Getting Started
To get started, we create a new project, which is done as follows, you create a new file called main.go and you enter the following starter code:
package main
import (
"fmt"
"log"
"os"
)
func main() {
}
And that’s all you need to get started. Let’s try to run it to see if we don’t get any errors with: go run main.go
Getting the Data
In most crash courses to learn Data Science in Python or R, we use the Iris dataset , and we will use it this time.
In Go, there is a default function for reading CSV files. We use the OS Module to do this. To read the CSV, we enter the following code into our main func() :
package main
import (
"log"
"os"
)
func main() {
iris, err := os.Open("data/iris.csv")
if err != nil {
log.Fatal(err)
}
}
Processing the Data
If you’ve done any work with Python or R for data science, you are certainly familiar with the concept of a DataFrame. To use DataFrames in Go, we use the dataframe module, let’s install it:
go get github.com/kniren/gota/dataframe
And now we can include it in our imports:
import (
"log"
"os"
"dataframe"
)
To now actually, convert the data to a DataFrame use the following line of code:
df := dataframe.ReadCSV(iris)
Now print df to the console:
print(df)
This will output:
[150x5] DataFrame
sepal.length sepal.width petal.length petal.width variety
0: 5.100000 3.500000 1.400000 0.200000 Setosa
1: 4.900000 3.000000 1.400000 0.200000 Setosa
2: 4.700000 3.200000 1.300000 0.200000 Setosa
3: 4.600000 3.100000 1.500000 0.200000 Setosa
4: 5.000000 3.600000 1.400000 0.200000 Setosa
5: 5.400000 3.900000 1.700000 0.400000 Setosa
6: 4.600000 3.400000 1.400000 0.300000 Setosa
7: 5.000000 3.400000 1.500000 0.200000 Setosa
8: 4.400000 2.900000 1.400000 0.200000 Setosa
9: 4.900000 3.100000 1.500000 0.100000 Setosa
... ... ... ... ...
<float> <float> <float> <float> <string>
Using the Data
Now, reading data from CSV is cool and all that, but now we really want to do something with this data frame, right?
Getting The Head of the Data
To acquire the head of the data frame, use the following method:
head := df.Subset([]int{0, 3})
This will look for the first three rows.
In the console, this would look like this:
[2x5] DataFrame
sepal.length sepal.width petal.length petal.width variety
0: 5.100000 3.500000 1.400000 0.200000 Setosa
1: 4.600000 3.100000 1.500000 0.200000 Setosa
<float> <float> <float> <float> <string>
If Go can handle data manipulation decently, it might be worth exploring further, considering all non-data science-related things it holds over Python.
Let’s say you’re only interested in virginica species and want to store them to a specific variable. Here’s how you would do that:
virginica := df.Filter(dataframe.F{
Colname: "variety",
Comparator: "==",
Comparando: "Virginica",
})
fmt.Println(virginica)
With the following output:
[50x5] DataFrame
sepal.length sepal.width petal.length petal.width variety
0: 6.300000 3.300000 6.000000 2.500000 Virginica
1: 5.800000 2.700000 5.100000 1.900000 Virginica
2: 7.100000 3.000000 5.900000 2.100000 Virginica
3: 6.300000 2.900000 5.600000 1.800000 Virginica
4: 6.500000 3.000000 5.800000 2.200000 Virginica
5: 7.600000 3.000000 6.600000 2.100000 Virginica
6: 4.900000 2.500000 4.500000 1.700000 Virginica
7: 7.300000 2.900000 6.300000 1.800000 Virginica
8: 6.700000 2.500000 5.800000 1.800000 Virginica
9: 7.200000 3.600000 6.100000 2.500000 Virginica
... ... ... ... ...
<float> <float> <float> <float> <string>
Conclusion
To conclude this article about trying Go for data science I would like to say that using Go for Data Science is possible but I would recommend Python, it’s much faster and lots easier.
Python is partially made for Data Science and Go just isn’t, so don’t use the wrong tools.
If you’re trying to learn about Data Science I would not recommend using Go to start with as the language itself is difficult enough since it’s statically typed and Python is not.