Abdalsamad Keramatfar
4 min readOct 1, 2019

--

Least Square Classification with tensorflow

When come to classification, every one familiar with linear regression may transfer the knowledge from regression and say why not we do the same simple idea for classification too?

The answer to this question will guide us to Least Square Classification. The main idea is to multiply feature vector by weights matrix that is just the combination of some columns, each one is responsible for a class. The loss function is the square difference between the predicted label and the gold label. This algorithm is actually simple, but for more information and pros and cons, the reader can refer to chapter 4 of the bishop book.

As the my last post, i will implement the algorithm on hcr dataset, that is a popular data set in sentiment analysis, but by tensorflow. For more information about sentiment analysis see here.

First of all, we will import all needed libraries:

import pandas as pdimport matplotlib.pyplot as pltfrom sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import accuracy_scoreimport tensorflow as tfimport numpy as np

Then, we need to import data:

train = pd.read_csv(‘/content/drive/My Drive/Colab Notebooks/GCN/Least Square/data/hcr/train/orig/hcr-train.csv’)
train[‘sentiment’] = train[‘sentiment’].str.strip()
dev = pd.read_csv(‘/content/drive/My Drive/Colab Notebooks/GCN/Least Square/data/hcr/dev/orig/hcr-dev.csv’)dev[‘sentiment’] = dev[‘sentiment’].str.strip()test = pd.read_csv(‘/content/drive/My Drive/Colab Notebooks/GCN/Least Square/data/hcr/test/orig/hcr-test.csv’)test[‘sentiment’] = test[‘sentiment’].str.strip()test.head()

As i use colab, the path i use is so. Also, ijust do the classification on the positive and negative class:

data = train.append(dev).append(test)
ndata = data[(data[‘sentiment’].notnull())&(data[‘sentiment’] != ‘unsure’)&(data[‘sentiment’] != ‘neutral’)&(data[‘sentiment’] != ‘irrelevant’)]
y = [[row[0], row[1]] for ind, row in pd.get_dummies(ndata[‘sentiment’]).iterrows()]

We need to split data to train and test and vectorize our text:

X_train, X_test, y_train, y_test = train_test_split(ndata['content'], y, test_size=0.2, random_state=42)vectorizer = TfidfVectorizer()vX_train = vectorizer.fit_transform(X_train)vX_test = vectorizer.transform(X_test)print(vX_train.shape)

We now define our placeholders, variables and model using tensorflow. As i said the model is the multiplication of the feature vector by our weights adding by bias:

w = tf.Variable(tf.random_normal(shape=[vX_train.shape[1], 2]))b = tf.Variable(tf.random_normal(shape=[1, 2]))data = tf.placeholder(dtype=tf.float32, shape=[None, vX_train.shape[1]])target = tf.placeholder(dtype=tf.float32, shape=[None, 2])logit = tf.matmul(data, w) + b

We should specify some hyperparameters:

# Define the learning rate, batch_size etc.learning_rate = 0.9batch_size = 8numb_of_epoch = 50num_of_iterations = int(len(train)/batch_size)iter_num = numb_of_epoch * num_of_iterationsiter_num

We now have to introduce our optimizer. Then we start to define our loss to be minimized. First we get the difference between our system prediction and the gold label. In the next step we just compute the frobenius norm of the the result and just get a mean of the resulting vector. in the last step we just define our measure, i.e. accuracy.

opt = tf.train.GradientDescentOptimizer(learning_rate)temp = target - logitloss = tf.reduce_mean(tf.norm(temp,ord='euclidean',axis=1,)) #+ 0.01*tf.nn.l2_loss(w)goal = opt.minimize(loss)prediction = tf.math.argmax(logit, axis = 1)gold = tf.math.argmax(target, axis = 1)correct = tf.cast(tf.equal(prediction, gold), dtype=tf.float32)# Averageaccuracy = tf.reduce_mean(correct)

Every thing is ok, and we can now start training our model. In what follows, we just train our model, batch by batch and then visualize the output:

init = tf.global_variables_initializer()sess = tf.Session()sess.run(init)loss_trace = []train_acc = []test_acc = []for iteration in range(iter_num):# Generate random batch indexbatch_index = np.random.choice(len(X_train), size=batch_size)batch_train_X = vX_train[batch_index].toarray()batch_train_y = np.matrix(np.array(y_train)[batch_index])sess.run(goal, feed_dict={data: batch_train_X, target: batch_train_y})temp_loss = sess.run(loss, feed_dict={data: vX_train.toarray(), target: np.matrix(y_train)})# convert into a matrix, and the shape of the placeholder to correspondtemp_train_acc = sess.run(accuracy, feed_dict={data: vX_train.toarray(), target: np.matrix(y_train)})temp_test_acc = sess.run(accuracy, feed_dict={data: vX_test.toarray(), target: np.matrix(y_test)})# recode the resultif (iteration)%num_of_iterations == 0:loss_trace.append(temp_loss)train_acc.append(temp_train_acc)test_acc.append(temp_test_acc)# outputif (iteration)%(num_of_iterations) == 0:print('epoch: {:2f} loss: {:5f} train_acc: {:5f} test_acc: {:5f}'.format((iteration)/num_of_iterations, temp_loss,temp_train_acc, temp_test_acc))plt.figure()plt.figure(figsize=(9, 3))plt.subplot(121)plt.plot(loss_trace, label = 'loss_trace')plt.legend()plt.subplot(122)plt.plot(train_acc, label = 'train_acc')plt.plot(test_acc, label = 'test_acc')plt.legend()plt.show()

The final result:

The best result we obtained is about 71. I really encourage you to compare the result with the result of the logistic regression here.

If we look at the right chart we can say that our model has overfitted on the data. So, we can see that after epoch 4 the gap between train and test accuracy start to grows. So, let see can we do some interesting thing here. So, we just change one line of above codes to:

)) + 0.01*tf.nn.l2_loss(w)

By this, we add regularization to our model. Now, let train the model again. The results:

This time, we have about 75 at best.

--

--