Kalebu Jordan

Become a Pro Python Developer

Hi guys,

In this article, you’re going to learn about text classification using a popular Python framework for machine learning, Tensorflow in just a couple of lines of code.

what is text classification ?

Text classification is a subpart of natural language processing that focuses on grouping a paragraph into predefined groups based on its content, for instance classifying categories of news whether its sports, business , music, and etc

what will you learn ?

In this tutorial, we learn in brief how to perform text classification using Tensorflow , you’re going to learn text processing concepts such as word embedding and how to build a neural network with an embedding layer.

You will be learning all those concepts while by building a simple model to properly classify text as negative and positive reviews based on data we used to train it.

what you need to have ?

For you to successfully follow through with this tutorial, you’re supposed to have the following libraries python libraries installed on your machine.

Installation

There are two approaches that you can follow when it comes to installing the setup environment for doing machine learning together with data science-based projects.

  • Installing Anaconda
  • Installing independently using pip

Installing Anaconda

If it’s your first time hearing about Anaconda, it is the toolkit that equips you to work with thousands of open-source packages and libraries. It saves the time for installing each library independently together with handling dependencies issues.

What you need to do is go to their official website at Anaconda.com and then follow guide to download and install it on your machine depending with Operating system you’re using.

Once you install it, it will install thousands of other packages for doing machine learning and data science tasks such as numpy, pandas, matplotlib, scikit-learn, jupyter notebook, and many others

Almost here

Now once dependencies have been installed together with Anaconda its time to install the TensorFlow library, Anaconda comes with its package manager known as conda.

Now Let’s use conda to install TensorFlow
conda create -n tf tensorflow

conda activate tf

Installing independently using pip

If you love handling every piece of details of yourself, then you can also install all the required python libraries just by using pip just as shown below;

pip install tensorflow

pip install numpy 

pip install matplotlib

pip install jupyter notebook

Now once everything is installed let’s start building our classification model

Note:

The TensorFlow that has been using while preparing this tutorial is TensorFlow 2.0 which comes with keras already integrated into it, therefore I recommend using it or a more updated version to avoid bugs.

Let’s get started

For convenience we usually use a jupyter notebook in training our machine learning models therefore I would you to use it too since in this article I will be showing you individual chunks of code equivalent to a single cell in a jupyter notebook

Starting a jupyter notebook

To start a jupyter notebook it just simple and straight forward it’s just you have to type jupyter notebook on your terminal and then it gonna automatically open a notebook on your default browser.

Importing all required libraries
>>>import numpy as np
>>>import tensorflow as tf
>>>import matplotlib.pyplot as plt
Create array of random Textual Data ( features ) & Labels

The array below acts as features for training our model consisting of 4 positive and 4 negative short sentences and their respective labels where by 1 for positive and 0 for negative

>>>data_x = [
    'good',
    'well done',
    'nice',
    'Excellent',
    'Bad',
    'OOps I hate it deadly',
    'embrassing',
    'Hate you'
]
>>>label_x = np.array([1,1,1,1, 0,0,0,0])
Use one hot encoding to convert textual feature to numerical

One hot encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.

Follow the below code to encode the above textual features into numerical values .

>>>one_hot_x = [tf.keras.preprocessing.text.one_hot(d, 50) for d in data_x]
>>>print(one_hot_x)
[[21], [9, 34], [24], [20], [28], [41, 26, 9, 17, 26], [36], [9, 41]]

As we can see after using one hot encoding to our textual data , it have resulted into array of different size .

Array of textual data require same length to be well fitted on Machine Learning Model . Therefore we have to process it again to form array of Identical length.

Apply padding to features array & restrict its length to 4

you can edit or change individual array length by changing the maxlen parameter , choice of value for maxlen depend where most of paragraph in your training data lies

>>>padded_x = tf.keras.preprocessing.sequence.pad_sequences(one_hot_x, maxlen=4, padding = 'post')
>>>print(padded_x)
array([[21,  0,  0,  0],
       [ 9, 34,  0,  0],
       [24,  0,  0,  0],
       [20,  0,  0,  0],
       [28,  0,  0,  0],
       [26,  9, 17, 26],
       [36,  0,  0,  0],
       [ 9, 41,  0,  0]], dtype=int32)

After we have already processed the training data now let’s create our Sequential Model to fit our data .

Let’s build a Sequential model for our classification
>>>model = tf.keras.models.Sequential()

Now Let’s add Embedding Layer to receive the processed textual feature

>>>model.add(tf.keras.layers.Embedding(50, 8, input_length=4))

Add Flatten layer to flatten the features array

>>>model.add(tf.keras.layers.Flatten())

Finally Let’s add a dense layer with sigmoid activation function to effectively learn the textual relationship

>>>model.add(tf.keras.layers.Dense(1, activation='sigmoid'))
Compile the Model and Check it’s summary Structure
>>>model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
>>>model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, 4, 8)              400       
_________________________________________________________________
flatten (Flatten)            (None, 32)                0         
_________________________________________________________________
dense (Dense)                (None, 1)                 33        
=================================================================
Total params: 433
Trainable params: 433
Non-trainable params: 0
_________________________________________________________________

Now Let’s fit the Model with 1000 epochs & Visualizing the learn process

history = model.fit(padded_x, label_x, epochs=1000, batch_size=2, verbose=0)
plt.plot(history.history['loss'])
Testing Model

Let’s create a Simple function to predict new words using model have just created, it won’t be as smart since our data was really short

def predict(word):
    one_hot_word = [tf.keras.preprocessing.text.one_hot(word, 50)]
    pad_word = tf.keras.preprocessing.sequence.pad_sequences(one_hot_word, maxlen=4,  padding='post')
    result = model.predict(pad_word)
    if result[0][0]>0.1:
        print('you look positive')
    else:
        print('damn you\'re negative')

Let’s test calling predict method with different word parameters

>>>predict('this tutorial is cool')
you look positive
>>>predict('This tutorial is bad as me ')
damn you're negative

Congratulations you have successful trained Text classifier using tensorflow to get the Jupyter notebook guide download here . Otherwise in case of comment, suggestion, difficulties drop it on the comment box

I also recommend you reading this;

Subscribe now to Stay updated on upcoming Python tutorial

One thought on “How to perform text classification using TensorFlow in python

Leave a Reply

error

Enjoy this blog? Please spread the word :)