Diagnosing Mental Health Conditions Using Deep Learning

Karan Chahal
6 min readOct 16, 2022

--

According to John Hopkins Institute, 1 in 4 US adults suffer from a diagnosable mental health disorder. This statistic alone just goes to show how widespread mental health conditions are in our modern day society. Although these conditions are widespread, doctors and medical professionals still do not have tools that can accurately test for mental health conditions. For example, if a patient suffering from a mental health condition goes to a medical professional, the professional will perform various physiological examinations and tests on the patient. After these tests have been conducted, the medical professional will make a diagnosis. Statistically speaking, psychiatric diagnosis accuracy ranges from 60% to 0% depending on the type of disorder. Overall, that is a very low accuracy and it showcases how many individuals suffering from a mental health condition are diagnosed incorrectly 😢

Fortunately, we aren’t the only ones that can diagnose mental health conditions as Deep Learning models can as well. In this article, we will be creating a Deep Learning model in PyTorch that can diagnose mental health conditions.

What is Deep Learning?

Deep learning is a subsection of machine learning which uses ANNs (Artificial Neural Networks). These networks are based on the brain’s architecture and are very powerful when it comes to performing certain tasks. If you want to learn more about how a Deep Learning model works, you can go check out my other article named Creating a CNN in PyTorch for the CIFAR10 Dataset

To create a deep learning model that can diagnose mental health conditions, the first thing that needs to be done is figuring out what type of data can be used for diagnosis. My initial thoughts involved using audio recordings. However, I soon realized that it would be difficult to get that type of data. Eventually, I stumbled upon quantitative electroencephalogram (qEGG) data.

What is an EEG?

An electroencephalogram (EEG) is a device that measures brain waves in the brain and allows medical professionals to diagnose several types of brain disorders. A qEEG is created when complex mathematical and statistical analysis is performed on those measured brain waves.

Image of an EEG From Medical News Today

After researching, I found that qEEG findings can point to a specific diagnosis of psychiatric syndromes and that qEEG has been previously used to diagnose depression. With this new knowledge on qEEG’s accuracy, I decided on using qEEG data for training the Deep Learning model.

About the Dataset:

The dataset that I used in this project came from Kaggle and is named EEG Psychiatric Disorders Dataset. This dataset includes data from a qEEG (quantitative EEG) and the features of the dataset are mostly qEEG parameters. Moreover, the qEEG data in the dataset was recorded when the patient was in the resting-state. When it comes to size, the dataset has approximately 1000 features and about 1140 samples. In terms of the background of the dataset, it was made by scientists from Seoul, South Korea for a study named “Identification of Major Psychiatric Disorders From Resting-State Electroencephalography Using a Machine Learning Approach”.

Now that we have went over the details of the dataset let’s go through the code 🚀

Step 1: Importing all the Libraries

Before we begin programming the Deep Learning model, it is important that we import all the libraries and API needed for the project.

Step 2: Define the Disorder Classes that will be Diagnosed

The Deep Learning model will be able to diagnose various mental health conditions from qEEG data. The mental health conditions it will be able to diagnose are included in the array above. This array can be used to better understand the predictions made by model. It is important that you list your names for the mental health conditions in this sequence. The reason for this is because the Label Encoder labelled them in this way. For instance, ‘Acute stress disorder ’ corresponds to a 0 in column 8 in the dataset.

Step 3: Define the Deep Learning Model’s Parameters

Here we define the number of epochs, the batch size and the learning rate. To determine the batch size, the number of epochs and the learning rate, I just used trail and error. These parameters can be changed based on what you think works best.

Step 4: Label Encoding and Removing Columns that Contain Unnecessary Data

In this step, we first read the csv file containing the qEEG data and then convert it to a pandas dataframe. Next, the label encoder from sklearn.preprocessing converts the string values in columns 2 and 8 to numerical values. This makes the gender data in column 2 as well as the specific disorder data in column 8 machine readable. On top of label encoding, specific columns containing data on the date, age, education, IQ and main disorder are removed from the dataframe. After this, the dataframe’s index is reset to the default (0, 1, 2, etc). By reseting the index, the dataframe will be more easier to navigate in the following steps.

Step 5: Removing Column with Null Values

In the current dataframe, there can still be data values that have a value of NaN (Not a Number) or None. These values can cause the loss to become a NaN preventing the deep learning model from getting trained properly. Due to this, it is important to remove the rows containing the NaN (Not a Number) and None values. To do this, I employed the algorithm above which returns two arrays:

  • deletearrayrow → contains the row numbers of all the rows with NaN and None values
  • deletearraycol→ contains the column numbers of all the columns with NaN and None values

In this specific case, I analyzed the arrays and realized that this dataset has just one column with NaN values. Therefore, in the end, I just dropped this column to get rid of all the NaN and None values from the dataframe.

Step 6: Scaling the Data with Min-Max Normalization and Creating Two Separate Dataframes (One for Samples and One for Labels)

In this step, the original dataframe is split into two dataframes where one contains the samples (x) and the other contains the corresponding labels (y). In addition to this, the data in the samples dataframe is scaled between 0 and 1 using Min-Max normalization. By employing this normalization technique, the data in the samples dataframe is transformed to a similar scale. This will improve the performance of the deep learning model as it will consider all features to be of equal importance.

Step 7: Splitting the Dataset and Creating the Neural Network

Here we split the dataset (contains tensor for samples and tensor for labels) into two datasets. These two datasets are the train dataset and the test dataset. The train dataset will be used to train the model, while the test dataset will be used to test the accuracy of the model. After splitting the dataset and creating the corresponding data loaders, I created the neural network class. This class defines the neural network’s architecture as well as how the neural network will conduct forward propagation.

Step 8: Training the Model

In this step, the model is repeatedly going over the train loader for the specific number of epochs and is calculating its loss. Once the loss is calculated, the model conducts backward propagation and changes its weights to reduce its loss. In this manner, the neural network “learns” from the qEEG data. For more information on backward propagation, check out this article.

Step 9: Accuracy Calculations

In this step, we calculate the accuracy of the model on the train dataset and the test dataset.

The Results!

As you can see above, the model achieved a 92% accuracy for the training dataset, while a 26% accuracy for the test dataset. This indicates that possibly the model is overfitting on the training dataset, therefore it may be possible to improve accuracy on the test dataset by lowering the number of epochs. Another reason for such a low accuracy on the test dataset can be the model’s architecture.

Overall, I hope you learned a lot from building this ANN because I know I did!

If you have any questions or concerns feel free to contact me 🖥

LinkedIn: https://www.linkedin.com/in/karan-chahal-49554320a/

Email: karanchahal2005.1@gmail.com

--

--

Karan Chahal

I am a 18 year-old who is very interested in the integration of technology in the medical field