Evolution of Machine Learning and Artificial Intelligence




by Meghal Dani, Jaswant Singh



Artificial Intelligence is changing every aspect of society today and is transforming greatly with new algorithms coming up each day. Review of classic papers can help with improved  understanding of artificial intelligence. Here are some of the most important algorithms in machine learning and deep learning with their referenced articles  to help you get started.



Machine Learning

Linear Regression [1]

A linear regression model is a data analysis technique that tries to predict the dependent variable as a linear function of the independent variables. Consider the function, 

y=α+ βx;  where x and y are independent and dependent variables respectively. This is an equation of line with β slope and 𝛼 intercept.

Ridge Regression [2] 

If we increase the number of features in Linear Regression, the number of β coefficients will be increased. The model trained will be more flexible, but this may lead to overfitting.  To avoid this, regularization is generally applied.  

Therefore Ridge regression is basically a regularized linear regression which tries to punish higher values of coefficients and make the model simpler and thus avoid overfitting.

K Nearest-Neighbor (KNN) [3]: 

 It is a supervised algorithm, with input consisting of k nearest training example and output is the class the sample belongs to. The distance between sample points is generally measured in terms of euclidean distance. The output is determined by vote of its neighbors.


Support Vector Machine (SVM) [4]

It is a supervised algorithm which is mostly used for classification tasks though can be used for regression tasks also. For the simplest case, if we have data points with two (can be N) features and we plot each data point into 2-D space then, we perform classification by finding a line that differentiates the two classes. This line is called Hyperplane (plane in case of points with three features).The class which is assigned to the new data point depends on the side of Hyperplane on which it falls.

 There may be cases when data cannot be separated linearly. In this type of cases kernelling is done in which data is mapped into higher dimensions so that data points can be separated.   

Decision Trees [5]  

As the name suggests Decision tree algorithms are based on trees-like structures wherein, internal nodes represent a “test” on an attribute, and leaf nodes are final decisions for classification.  It is a supervised algorithm that splits the population into sub-categories based on some attributes/ variables. These splits or branching in the tree is based on Gini Index, Chi-Square, Information Gain, Variance or Entropy depending on the target variable. The final path from the root to leaf node is responsible for making the decision about the sample in the population.

Random Forest [6]

This model is made up of many decision trees. When building trees random sampling of data points is done and when splitting nodes random subsets of features are considered. Each tree is trained with different samples to avoid higher variance or overfitting on overall forest. The variance can be reduced in a single tree by limiting the depth and there may not be a need for random forest  but this is done on the expense of increasing bias and this is where Random Forest helps.

The final prediction of Random Forest is done by averaging the predictions of each individual tree. 


Gradient Boosting [7]

There is a common term in machine learning i.e, Ensembling techniques. In which different models try to predict the same target and reason being that many models will perform better than one. It is further classified in Bagging and Boosting. In Bagging different models are combined with some techniques like weighted average or mean. Bagging is also implemented in Random forest in which averaging of predictions from different trees is done.

In Boosting different models are combined sequentially. Gradient Boosting is an example of a Boosting algorithm in which new  models learn from mistakes committed by previous models. Basically we are updating the predictions by adding new models so that the sum of residuals is minimum. XGBoost [8] is an implementation of decision trees with gradient boosting for speed and performance. 


Neural Network

Deep Neural Network

Neural Network basically consists of an input, hidden and output layer. When the number of hidden layers increases from one to many, the network is considered a Deep Neural Network.  Usually a network with more than 2 or 3 layers is considered as deep, though there is no exact definition. They are different from Machine Learning Algorithms mainly because they do not require any feature engineering of the input data.

Convolutional Neural Network (CNN) 

CNN are the variants of neural networks which are mostly used in the field of Computer Vision. The general constituents include convolutional layers, pooling layers and normalization layers. They take an image as input, extract features and finally perform the task such as image classification, object detection, segmentation. 

In the field of deep learning advances have been made and variants of CNN architectures have been developed. There are famous architectures that have shown improvement over the years and success in competitions like the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). AlexNet [9], VGG [10], ResNet [11] are some of the famous networks.

As seen in AlexNet, more layers were added in CNNs to increase the performance. Therefore networks were getting deeper. As VGG was having more layers than AlexNet. 

As very deep neural networks are difficult to train because of the vanishing gradient problem. ResNets residual block was introduced which helps with vanishing gradient problems. These residual blocks use skip connection which prevents the magnitude of the gradient in initial layers from vanishing during backpropagation. There are other networks like:

Faster RCNN  [12]

 A state-of-the-art object detection network that combines Region Proposal Network (RPN) and Fast R-CNN. The RPN generates bounding box proposals around the possible objects which are used to pool features and identify the classes of objects. The regression layer in the end prunes the coordinates of bounding boxes.

U-Net [13]

 This segmentation network won the ISBI challenge in 2012 and since then has been popular for segmentation tasks especially in the biomedical domain. The architecture is U-shaped, consisting of two parts: contracting path (containing general convolution process) and expansive path (contains transposed 2D convolution layers).


Machine Translation

RNN [14]

We cannot understand a sentence or a video if we start each word from scratch and forget the earlier part. The earlier discussed neural networks cannot understand them either. Recurrent Neural Networks (RNN) were designed to address this issue. RNNs can be thought of as neural network in loop with each network passing information to its successor for information to persist. This is how while reading a sentence it can keep track of each word in the sentence and form meaning out of it. It finds major applications in Natural Language Processing (NLP).

LSTM [15]

RNNs cannot retain long term information. Long short Term Memory is a special kind of RNN that provide a solution to this problem as they have a memory module. Thus their default function is to retain information for long periods of time. The basic architecture consists of gates, which decides whether to pass certain information to next cell or forget it if not useful. These gates make LSTM special and alike humans.

 Bi-LSTM [16]

Bi-LSTMs are nothing but two independent RNNs put together wherein the information flows two ways: from past to future and vice versa. Cases where we need to predict a word not only from the previous word but also what follows to know the context better, Bi-LSTMs make a better choice. 



This article walks you through basic algorithms in machine learning which can be used for classification and regression problems. In general, these are majorly used and build a strong foundation for further understanding. Later we move to Deep neural networks, where we present famous architectures developed in the past used for image classification, object detection and segmentation problems. Not limited to image based problems, the article presents RNN, LSTMs used for textual or time based data. These days using graphs to solve problems in medical image analysis, 3D Vision, Neuroimaging have been widely used. Understanding graph neural networks and its variations can promote new algorithms to be developed in this regard.


  1. Jeffrey M. Stanton (2001) Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors, Journal of Statistics Education, 9:3, , DOI: 10.1080/10691898.2001.11910537
  2. Hoerl, Arthur E., and Robert W. Kennard. “Ridge Regression: Biased Estimation for Nonorthogonal Problems.” Technometrics, vol. 12, no. 1, 1970, pp. 55–67. JSTOR, www.jstor.org/stable/1267351. Accessed 28 July 2020.
  3. T. Cover and P. Hart, "Nearest neighbor pattern classification," in IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, January 1967, doi: 10.1109/TIT.1967.1053964.
  4. Hearst, Marti & Dumais, S.T. & Osman, E. & Platt, John & Scholkopf, B.. (1998). Support vector machines. Intelligent Systems and their Applications, IEEE. 13. 18 - 28. 10.1109/5254.708428.
  5. Tom M. Mitchell, (1997). Chapter 3.Decision Tree Learning. Machine Learning, Singapore, McGraw-Hill
  6. Tin Kam Ho, "Random decision forests," Proceedings of 3rd International Conference on Document Analysis and Recognition, Montreal, Quebec, Canada, 1995, pp. 278-282 vol.1, doi: 10.1109/ICDAR.1995.598994.
  7. Friedman, Jerome. (2002). Stochastic Gradient Boosting. Computational Statistics & Data Analysis. 38. 367-378. 10.1016/S0167-9473(01)00065-2.
  8. Chen, Tianqi, and Carlos Guestrin. “XGBoost.” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016): n. pag. Crossref. Web.
  9. Krizhevsky, Alex & Sutskever, Ilya & Hinton, Geoffrey. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Neural Information Processing Systems. 25. 10.1145/3065386.
  10. Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556.
  11. He, Kaiming & Zhang, Xiangyu & Ren, Shaoqing & Sun, Jian. (2016). Deep Residual Learning for Image Recognition. 770-778. 10.1109/CVPR.2016.90.
  12. Ren, Shaoqing & He, Kaiming & Girshick, Ross & Sun, Jian. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39. 10.1109/TPAMI.2016.2577031.
  13. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI.
  14. Giles, C.Lee & Kuhn, Gary & Williams, Ronald. (1994). Dynamic recurrent neural networks: Theory and applications. IEEE Transactions on Neural Networks. 5. 153-156. 10.1109/TNN.1994.8753425.
  15. Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9, 1735-1780.
  16. Graves, A., & Schmidhuber, J. (2005). Framewise phoneme classification with bidirectional LSTM networks. Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., 4, 2047-2052 vol. 4.




Meghal Dani 

Graduated from IIIT-Delhi, currently I am a researcher at Tata Research and Innovation Labs working in the area of Deep Learning and Artificial Intelligence. Working in the areas where technology is put to use for healthcare interests me.




Jaswant Singh

I am currently working with TCS Research & Innovation Labs after completing my masters in Industrial Engineering & Operations Research (IEOR)  from IIT Bombay. My area of expertise include Computer Vision and Medical Image Analysis.





Data Empathy and AI (Part 1).The Need for a Patient Centric Analytics Ecosystem

Data Empathy and AI (Part 2).Legal Theory Disruption and Ethics

A two part article written by Kevin Michael Mooney, Esq.,Senior Director, Enterprise Data Governance,Cleveland Clinic.

It presents an excellent and comprehensive view point  on data management, artificial Intelligence applications and it's associated ethical and legal challenges.

Clarifying and detailing some of the ethical and legal principles, it's a must read to understand this  context of data management and AI in healthcare beyond building and implementing algorithms.

2018 Year in Review:Machine Learning in Healthcare

A BrainX Community exclusive!

A comprehensive review of 2200+ articles published in 2018 on machine learning applications in healthcare.

After exclusions,1068 articles classified into various medical specialities and a synopsis is available through the link below.

2018 Year in Review: Machine Learning in Healthcare.

A copy of the BRAINX COMMUNITY LIVE event, January 2019 presentation is also available via link below.

BrainX Community Live January 2019 event

Augmented Intelligence:Moving beyond man vs machine


Artificial Intelligence described as new electricity by Prof.Andrew Ng, is certainly leading to construction of new “powerhouses”.I recently had the opportunity to participate at two such powerhouse conferences, both focused on applications of Artificial Intelligence(AI)/Machine Learning(ML) in Healthcare.


ai+MD 2018(http://(http://ai-med.io/aimd-summit-choc-childrens/)

ai+MD, organized by Dr.Anthony Chang’s AI Med group was a physician focused conference which discussed various aspects of AI applications in clinical settings, its pitfalls, current state and future possibilities.I was honored to be part of the inaugural panel and delivered my presentation titled, “Truth vs Hype:AI in Healthcare”,focused on understanding the current scientific evidence and pitfalls of hype in this field.At the end of the conference, there was a consensus that man+machine is exponentially better than man vs machine, man being clinician and machine being AI/ML.

ML for Healthcare, 2018(https://www.mlforhc.org)

At the Stanford,ML for Healthcare conference, hosted by Ken Jung and Nigam Shah the focus was mostly ML aspects of healthcare applications.Professors Abraham Verghese, Russell Greiner and Andrew Ng  amongst many others presented the clinical needs, vision and a word of caution for ML in healthcare research and innovation.Being an anesthesiologist,I was  also thrilled to see the engagement and leadership from anesthesiologists such as Drs. James Fackler, Randall Wetzel and Micheal Burns at the conference.

Lots of passion and energy from the students and academics from around the world was at display.It was humbling to obeserve that despite all the fiscal advantages of engaging in various other industries, many of the participants found their passion in healthcare for good of humanity and the intrigue of messy healthcare data.

Learning, connecting and sharing data was my motivation to attend these conferences bringing AI/ML and healthcare experts together.Sharing and building together is the future for us at BrainX Community.

Piyush Mathur MD,FCCM

BrainX Community:The largest machine learning in healthcare community online.

BrainX Community is the largest and the fastest growing online group for application of machine learning in healthcare.Our focus is on sharing and growing knowledge in this area and to foster innovation and research.

A 200+ people strong group that has come together in less than 2 months, we are growing very rapidly.Members include physicians, clinicians,executives,machine learning experts, students amongst others who make this group multidisciplinary.All the continents are represented with members from all over the world having joined.

We have Cleveland based live interactive monthly sessions featuring experts from various areas presenting their work and their love for the science.https://www.brainxai.org/

We only promote real science,avoiding the hype in AI.Open source de-identified data sharing is important and supported for development of science.Also learn about various aspects of data in healthcare through the DATA section of BrainX Community.https://www.brainxai.org/data/

New and meaningful publications are posted in the LEARN section of BrainX Community webpage on a routine basis.https://www.brainxai.org/learn/

Everyone’s participation is valuable.Let’s share the knowledge,spread the word and help shape the future of healthcare and machine learning  for good.

Is blockchain the answer to fragmented, inaccessible,expensive healthcare data?

With the new and evolving regulatory requirements, governance structure as suggested by the Stuart McLennan,David Shaw and Leo Celi, in the article below is absolutely needed not just for critical care but for all of healthcare.

McLennan, S., Shaw, D. & Celi, L.A. Intensive Care Med (2018).


Data cannot remain fragmented and inaccessible increasing cost,accountability and stalling much needed transformation.

Can blockchain help?
Some of the initial work done at MIT shed some light on possibilities.








BrainX Community Live







CEO of VisualDx

Associate Professor of Dermatology & Medical Informatics

University of Rochester College of Medicine

TUESDAY, JULY 10, 2018 



Machine Learning, Diagnosis and

Augmented Point of Care Decisions

Dr. Papier is the co-founder of VisualDx and Chief Executive Officer. A thought leader in clinical informatics, Dr. Papier maintains the overall vision for the VisualDx product with a keen focus on software integration and impacting costs in healthcare through clinical accuracy. His entrepreneurial drive, years of clinical experience, and passion for delivering true healthcare solutions have propelled VisualDx clinical decision support to the top in quality and innovation.

A dermatologist and medical informatics expert, Dr. Papier has a particular interest in designing clinical decision support systems based on visually rich knowledge areas to reduce diagnostic error at the point of care. In line with this goal, he is focused on transforming medical education to include training in cognitive error and the use of information technology. Dr. Papier also focuses on consumer health, developing tools to educate and empower patients.