Machine Learning · November 10, 2023

Tips to enter into analytics field

Analytics is a big umbrella which is also known as Artificial Intelligence. Here I have provided an overview that one needs to review and take care of to enter into this field. By following these steps you will be able to start your journey in this field.

Analytics contains following fields. I would suggest that you should start in the order given below.

  • Artificial Intelligence
    • Data Science and Statistics
      • This field deals with algorithms i.e. mathematical formulae. These algorithms will be useful to learn and understand the patterns from the data. It will also help in making decisions based on those patterns.
      • Major areas of the field are,
        • Hypothesis Testing
        • Linear Algebra
        • Probability Distribution
        • Integration and Differentials
        • Mean
        • Mode
        • Median
        • Range
        • Percentile
        • Quantile
        • IQR [Inter Quartile Range]
        • Variance
        • Standard Deviation
        • Skewness
        • Kurtosis
        • Factor Analysis
        • Principal Component Analysis
        • Cluster Analysis
        • Linear Regression
        • Logistic Regression
        • Ridge Regression
        • Lasso Regression
        • Decision Tree
        • Random Forest
        • Adaboost
        • Adabag
        • SVM [Support Vector Machine]
        • Naive Bayes
        • Apriori Algorithm
        • Gradient Descent
        • Market Basket Analysis
        • Conjoint Analysis
        • Neural Networks and its different flavors
      • Here you don’t need to remember mathematical formulae. The goal is to understand how each algorithm works and in case you want to tweak the algorithm what can be done. You should have basic understanding so that you know that why an algorithm works in specific manner with specific data, which algorithm will work with which data types.
    • Descriptive and Exploratory Data Analysis / Data Mining
      • This field deals with exploring the data and getting insight from the data. It will also help in identifying the hidden pattern from the data.
      • In this field we can include visualization as well.
      • Tools:
        • Tibco Spotfire
        • Tableau
        • Microsoft Power BI
        • Visualization libraries like D3.JS
        • Python packages like Matplotlib, Plotly, Seaborn, Bokeh, Dash
        • R Packages like R Shiny, Ggplot2, Plotly, RGL
      • Techniques:
        • Mean
        • Mode
        • Median
        • Range
        • Percentile
        • Quantile
        • IQR
        • Variance
        • Standard Deviation
        • Skewness
        • Kurtosis
        • Factor Analysis
        • Cluster Analysis
    • Machine Learning
      • My concept of machine learning is bit different, when we combine mathematical algorithms from Data Science and Machine i.e. Computer it becomes Machine Learning.
      • In Machine Learning, we would implement mathematical algorithms via different computer languages i.e. C, C++, Python, R. These implemented algorithms then will be available to the different tools for the implementation.
      • In most of the cases we don’t need to implement the algorithms from scratch as you will get some packages having implementation of the same.
      • The major tools are, R, Python, SAS, SPSS, RapidMiner, Tanagra, Orange, Weka. In these tools you will get number of packages with pre-built algorithms mentioned under Data Science section.
      • Python Packages:
        • Statistical Modelling
          • Scikit Learn [Sklearn]
          • Statsmodels
        • Deep Learning
          • Keras
          • PyTorch
          • TensorFlow
          • OpenCV
          • HuggingFace
          • Llama Index
          • Langchain
          • NLTK
          • spaCy
          • Gensim
          • TextBlob
          • StanfordN:P
      • R Packages:
        • Caret
        • E1071
        • randomForest
        • RPart
        • Adabag
        • CART
        • You will see that R is good for statistical modelling. For deep learning R is dependent on Python where to provide functionality to work with deep learning models R will need Python to be installed on the system.
      • Statistical Modelling or Statistical Learning
        • This field deals with predictive analytics. In predictive analytics, based on past events and patterns from the data machine will predict the future events.
        • This field is associated with numerical data and problems like house price prediction, diseases prediction, binary classification like will the person will default or not?, in an accident a person will survive or not etc.
        • For this field tools will be same as mentioned above. Techniques would be as follows,
          • Linear Regression
          • Logistic Regression
          • Ridge Regression
          • Lasso Regression
          • Decision Tree
          • Random Forest
          • Adaboost
          • Adabag
          • SVM [Support Vector Machine]
          • Naive Bayes
          • Apriori Algorithm
          • Gradient Descent
          • Market Basket Analysis
          • Conjoint Analysis
        • Deep Learning
          • Computer Vision
            • This field deals with object detection, image recognition and video analysis. It tries to solve the problem related to image and video.
            • This field can provide answers like chances of skin cancer by looking at skin image, chances of violence at any place by looking at body language of the people through camera, low or high car speed based on the objects ahead of the car, rather given food grain is of good quality or not.
            • Most widely used tool is Python with PyTorch, Keras and TensorFlow packages.
            • There are other ready made solutions i.e. frameworks as well like YOLO, SSD, CornerNet, CenterNext, RetinaNet, CNN [Convolutional Neural Networks], ResNet [Residual Network], VGG [Visual Geometry Group], AlexNet, MobileNet. We can utilizes these frameworks as well for own custom visual data.
          • NLP [Natural Language Processing] & NLG [Natural Language Generation]
            • As the name suggests this field deals with human langauges.
            • For NLP, aim will be to understand and to interpret human languages correctly. Need to detect the correct intent like “I am at bank to deposit the money” here bank can be river bank or financial institution. Here NLP should be able to understand that I am talking about financial institution and not river bank.
            • For NLG, aim will be to generate human like text which can be interpreted and understood correctly. ChatGPT is one of the famous examples.
            • Most widely used tool is Python with PyTorch, Keras and TensorFlow packages.
            • Techniques will be Neural Network and different flavors like LSTM [Long Short Term Memory], BERT [Bidirectional Encoder Representations from Transformers], GPT [Generative Pre-trained Transformer], Seq2Seq, RNN [Recurrent Neural Network]
  • The flow of learning will be from top to bottom in the particular order. You need to learn both tools and techniques to start the journey. Here I have also not included Big Data side part like, Hadoop Eco System, Spark, Scala, Kafka, Hive, Pig, Sqoop etc as its part of Data Engineering.
  • Though the list provided here is not exhaustive but you can start your journey from here. The tools and techniques mentioned here are most widely used, but once you start the journey and dig down more, you will come across many more tools and techniques for different use cases.