/ Golang

Hackathon: Training a Blackjack AI

When I attended AWS re:Invent at the end of 2019, I attended a workshop for using machine learning via Amazon SageMaker to teach an AI how to play blackjack. Seeing as re:Invent was held in Vegas, I decided to take the spirit of Vegas home with me and create my own text-based blackjack game in Go. I added a simple interface so it would be easy to create different AI opponents. I had another hackathon coming up at work and I thought it would be cool to try and train a model to play a better game of blackjack using SageMaker. This would be different from the workshop I attended in that they were mostly focused on recognizing a card's rank and suit, whereas I wanted to look at dealer/player hand combinations and retrieve predictions on the outcome of various actions.

My goals for the hackathon were the following:

  • Explore SageMaker's training capabilities
  • Generate copious amounts of blackjack game data
  • Train a model using the game data
  • Use predictions from the model to create a blackjack AI

First, I instrumented my blackjack game to record dealer and player hands, as well as the outcome of their next move. I ran a simulation of 1 million rounds with two AI opponents, one that picks moves randomly and one that plays using the generally accepted best strategy. These resulted in 2,501,256 rows that became my training data.

STAY_WIN,10 5,7 9 A
STAY_WIN,10 5,9 Q
STAY_LOSS,6 7,4 8
STAY_LOSS,4 A,Q 5
STAY_LOSS,4 A,9 10
HIT_NONE,J 5,Q 2
STAY_WIN,J 5,Q 2 7
DOUBLE_WIN,J 5,5 6
HIT_NONE,8 Q,3 6
HIT_NONE,8 Q,3 6 A
HIT_LOSS,8 Q,3 6 A Q
HIT_LOSS,8 Q,2 J
SPLIT_NONE,3 8,8 8
STAY_LOSS,3 8,8 2
STAY_LOSS,3 8,8 A
DOUBLE_LOSS,3 8,8 2

I realized I had a few small problems related to the format of my training data (needed to be integers, not strings), but I learned and wrote some code to translate the training data.

0,510,110907
0,510,1009
1,706,804
1,1104,510
1,1104,1009
6,510,210
0,510,70210
7,510,605
6,1008,603
6,1008,110603
4,1008,10110603
4,1008,1002
11,803,808
1,803,208
1,803,1108
8,803,208

By following the Getting Started tutorial for SageMaker, I was able to get a model trained quickly.

sagemaker_training

Deploying the SageMaker model to an endpoint was just another simple command and then I was able to see some predictions. I was impressed that the model correctly suggested splitting double aces!

sagemaker_predict

Next step was to load the model using Go and get predictions locally. Unfortunately, due to mismatched XGBoost versions, machine differences, and unpickling problems, I was unable to load the model locally. Serialization in Python is called pickling and unpickling. Why? I don't really know.

So what did I do? I gave up on SageMaker for the time being and trained a new model on my local machine using straight XGBoost/Python. Fortunately, I was able to use all the same hyperparameters.

import sys
import numpy as np
import pandas as pd
import xgboost as xgb

dataset = pd.read_csv(sys.argv[1])

X = dataset.iloc[:, 1:3].values
y = dataset.iloc[:, 0].values

classifier = xgb.XGBClassifier(max_depth=5,
                               eta=.2,
                               min_child_weight=6,
                               silent=0,
                               objective="multi:softmax",
                               num_class=12,
                               num_round=10)

classifier.fit(X, y)

data = np.matrix([803, 1111])
print(data)
result = classifier.predict(data)
print(result)

classifier.save_model(sys.argv[2])

The Go packages still had issues loading the model for various reasons. But Python could load the model and predict with ease. So I decided to just call the Python script from Go via command line (don't do this at home, kids).

func Predict(dealer *game.Hand, player *game.Hand) Label {
	d := strconv.Itoa(ConvertHand(FormatHand(dealer)))
	p := strconv.Itoa(ConvertHand(FormatHand(player)))

    cmd := exec.Command("./machine/predict.py", model, d, p)
	out, err := cmd.CombinedOutput()
	if err != nil {
		panic(err)
	}

	n, err := strconv.Atoi(strings.TrimSpace(string(out)))
	if err != nil {
		panic(err)
	}

	return Label(n)
}

Final step in the process was to create an AI that used the predictions from the trained model. Once the AI was done, I ran a simulation to see how the model performed. After 1000 automated rounds, here are the results.

Marvin is an AI making decisions using predictions from my model.
Joe plays using the generally accepted best strategy.
Larry is an idiot who picks randomly whether to hit or stay.

Larry (*ai.Random) 
  Win: 313 (%31.3) | Loss: 641 (%64.1) | Tie: 46 (%4.6) | $-4497.50 

Joe (*ai.Standard) 
  Win: 439 (%42.8) | Loss: 507 (%49.5) | Tie: 79 (%7.7) | $-65.00 

Marvin (*ai.Machine) 
  Win: 429 (%41.6) | Loss: 534 (%51.8) | Tie: 68 (%6.6) | $-1100.00 

Remember, Marvin is the one using the trained model. Not bad for a first pass. Also goes to show that gambling is not a viable career move. This concluded the hackathon, but I wasn't quite done yet.

I really wanted to be able to get model predictions using Go code so I could remove the command line Python nonsense, which was really slow because it was loading the model every. single. time. XGBoost updated to version 1.0 a month or two ago and so most third party packages were still expecting XGBoost 0.90 models. I downgraded my XGBoost version to 0.90 and retrained the model with the same training data. With a little bit of fiddling, I was able to get dmitryikh/leaves to load the model and return predictions!

This was a huge success and a tremendous breakthrough in speed, but I also wanted Marvin to perform better. I fiddled with the AI a little bit more and adjusted it to use multiple predictions, instead of just the prediction with the highest score, and pick the highest scored prediction with a favorable result. This resulted in a jump in how well the AI performed! Running multiple thousand round simulations, Marvin would even beat the standard AI on some occasions. I ran a simulation with a million rounds and here are the results:

Larry (*ai.Random)
  Win: 319618 (%32.0) | Loss: 639397 (%63.9) | Tie: 40985 (%4.1) | $-4455117.50
Joe (*ai.Standard)
  Win: 439344 (%42.7) | Loss: 505073 (%49.1) | Tie: 84569 (%8.2) | $-15747.50
Marvin (*ai.Machine)
  Win: 448082 (%43.8) | Loss: 496821 (%48.6) | Tie: 77213 (%7.6) | $-276612.50

Marvin still isn't performing quite as well as the standard blackjack strategy (Joe), but he's getting there! He's actually winning more games and losing less games than Joe, percentage-wise, but he's still losing more money due to poor double down decisions, etc. I think with some work, Marvin could become a world champion.

This was a really fun project to work on and it was really satisfying to see good results as it progressed. SageMaker allowed me to quickly get up and running with machine learning and provided access to powerful computing resources. For hefty training jobs, it may prove to be a valuable tool. I recommend checking out the full repo here.

Casino image from bestcasinosites.net/free-images

Stan Nelson

Stan Nelson

I love programming. I'll create stuff for fun outside of work just because I like figuring out problems and the act of creating. I also enjoy cars, games, hiking, and going on adventures.

Read More