top of page

FizzBuzz with Deep Learning

The somewhat different attitude test

Every computer scientist knows it and most have implemented it themselves - FizzBuzz. The classic attitude test that actually serves as a mathematics exercise for primary school pupils. But how does FizzBuzz behave when Deep Learning is applied to it? For those who have not yet had the good fortune to come into contact with Fizzbuzz, here is the task again:

Write a program that receives integers as input. For multiples of three the program should output "Fizz" and for multiples of five "Buzz". For numbers that are multiples of both three and five, "FizzBuzz" is to be output. For all other numbers, "other" is to be output.

A typical solution for the task looks like this in Python:

def fizz_buzz(n):     
    if n % 3 == 0 and n % 5 == 0:         
    elif n % 5 == 0:         
    elif n % 3 == 0:         

Generate data set

To solve this task with Deep Learning, we first need a suitable data set. With a slightly modified version of the FizzBuzz implementation above, we can easily generate the corresponding label.

def label_of_int(n):     
    if n % 3 == 0 and n % 5 == 0:         
        return [0, 0, 0, 1]  # fizzbuzz     
    elif n % 5 == 0:         
        return [0, 0, 1, 0]  # buzz      
    elif n % 3 == 0:         
        return [0, 1, 0, 0]  # fizz     
        return [1, 0, 0, 0]  # other number

The binary representation of the numbers is used as input for the neural network. To do this, we generate all numbers from 0 to 1024 in binary representation with the following function:

def binary_of_int(n):     
    binary = []     
    for i in range(10): # 1024 needs 10 bits         
        binary.append(n >> i & 1)     
    return binary

Adjust class weights

With this, we have our required features (binary representation of the numbers) and the associated labels (division into "Fizz", "Buzz", "FizzBuzz", "other number"). Now we could start with the training. However, to compensate for the imbalance of the data - the distribution of the labels in the range 0 to 1024 is not even - we still adjust the class_weights. With class_weight, we can specify when fitting the model in Keras that the underrepresented classes should be weighted more heavily during training.

total_count = len(all_numbers) 
class_weight = {     
    0: total_count / count_other_number(all_numbers),     
    1: total_count / count_fizz(all_numbers),     
    2: total_count / count_buzz(all_numbers),     
    3: total_count / count_fizzbuzz(all_numbers), 

Network architecture

Next, let's look at the network architecture used. Our data set consists of binary numbers between 0 and 1024. 10 bits are needed to represent the largest number, 1024, in binary. This results in the structure of the input layer of our neural network consisting of 10 neurons. This is followed by two hidden layer pairs. The first pair consists of a hidden layer with 128 neurons each, relu as activation function and a dropout layer with a loss rate of 0.3. The second pair differs only in the loss rate of 0.2 instead of 0.3. The output layer has 4 neurons with activation function softmax to enable the categorisation into 4 label classes. We use sgd (stochastic gradient descent) as optimiser and categorial crossentropy as error function. To find the best hyperparameters, see our blog post how to optimise a neural network.

# 10 input neurons for the 10 bits we need to  
# represent numbers up to 1024 
input_layer = l.Input(shape=(config.NUM_DIGITS_INPUT,))  

hidden_layer = l.Dense(units=128, activation='relu')
hidden_layer = l.Dropout(0.3)(hidden_layer)  

hidden_layer = l.Dense(units=128, activation='relu')
hidden_layer = l.Dropout(0.2)(hidden_layer)  

# 4 output neurons for: fizzbuzz | fizz | buzz | 
output_layer = l.Dense(units=config.NUM_CLASSES_OUTPUT,                 

model = Model(inputs=[input_layer], outputs=



We then train the whole thing for 20,000 epochs with our calculated class_weight and a validation split of 0.2. We then save our model.,      
    batch_size=config.BATCH_SIZE, #256     

The results in Tensorboard show a validation accuracy of 97% and a loss of 0.44.

Use of the model

In order to now use our model elsewhere, we load both the network architecture and the trained weights with load_model. We ask the user to enter an integer input and convert it to binary representation. With predict we let the net classify our input and then output the corresponding label.

# returns a compiled model identical to the trained one 
model = load_model('weights/deepbuzz.h5', compile=True)  

user_input = int(input("Input an integer "))  

# binary of input 
binary_number = data.binary_of_int(user_input)  

# prediction 
prediction = model.predict(np.array([binary_number]))  

# get class label of argmax 
prediction = data.class_of_label(prediction.argmax(axis=-1)) 
print(f'The number {user_input} is category: {prediction}')

The entire code of the project is available on our Github.



bottom of page