Get to Grips With The Terminology 

As with any technical industry buzzwords, abbreviations and obscure words synonymous with less obscure words are common. They are just part of the game and learning what all these words mean can seem daunting, in a way they act as a barrier to entry for people when they really shouldn’t be. 

Rather than write an exhaustive list which would take far too long, here are a few important examples and their definitions:

  • ETL – Extract Transform Load
  • Heuristic(s) – Finding an approximate solution when classical methods don’t work. 
  • Structured / Unstructured Data – Data that can easily be put into a table or database vs other data such as PDFs or MP3 files.
  • Supervised / Unsupervised Algorithms – Machine learning algorithms that learn with labelled training data vs those without 
  • Greedy Algorithms – An algorithm that iteratively finds the localised best option in order to get to the global optimal option.
  • Normalization –  Adjusting values measured on different scales to a common scale.
  • Residual (Error) – Deviation of the observed value from some derived value.
  • Feature engineering –  Process of transforming and creating training data for a machine learning algorithm.
  • Application containerization – The process of creating standard unit of software that packages up code and dependencies so the application runs between computing environments.

There are many more of course and terminology is ever changing. 

Just don’t take how many words you know the meaning of as an approximation for your capabilities as a data scientist. You’ll learn what they mean in time.