Don’t Fear The Mathematics

Reading any scientific research paper can often be intimidating when you see large sections of mathematical notation – to the untrained eye at least, it can look like some foreign language that only intelligent people can understand. You’ll realise soon enough that the fundamental elements of these equations, broken down, are much simpler concepts than many of the complex ideas portrayed in the English language. Breaking the equations down and googling each element will let you build a picture of what that equation means. Soon, you’ll begin decoding and understanding what the equations means from understanding what each element means. 

A good data scientist can take an equation, understand the variables involved, understand the operators, such as ‘sum over’ or ‘series product’, then turn that single or set of equations into pseudocode. The pseudocode forms the outline for an algorithmic implementation of the methodology described in the paper. In reality, the majority of the important research that’s implemented regularly, such as many of the machine learning algorithms have already been codified and incorporated into some package for you. However being able to understand the mathematics of these implementations is a crucial part of the job of a data scientist. I would suggest having a mathematical notation ‘cheat sheet’ to hand, which you can either write yourself or find online. Use it like you would use a dictionary, translate the mathematical notation into concepts that you understand in your head.


The Internet Will Know

I recall having a conversation with a colleague that went along the lines of,  “How did anyone ever get anything done before Google?”. The truth is I doubt that I would be a data scientist if it wasn’t for Google’s magic hand. Whilst there’s not much more to say here besides, “Google it”, there are some tips and tricks that may be of use. 

The first is, copy and paste the errors returned into Google. Remove the parts that are specific to your problem like ID’s or column names, but keep the rest. At the start of your career you’ll find no shortage of explanations and solutions. 

Right click, then use “open in new tab” on the search results. Open 3-6 tabs, this will help you quickly cross reference answers without having to keep clicking back to the search results. 

Close non-relevant unnecessary tabs before you make a search, these tabs just create more cognitive load that you don’t need. At the start I used to think that I’ll come back to this page so i’ll keep it open, 90% of the time I never did. If it’s a really important page then bookmark it. Worst case you can always pull up the page from your history.

Split screens work, always have an internet window accessible with Google opened to quickly search stuff whilst you’re working on a problem. If you don’t have multiple screens; most modern operating systems let you split your single screen. 

Try not to overly rely on Google too much. If you find that you are constantly just copying and pasting code without understanding what’s going on then that’s a bad sign. Try and read multiple solutions, read information about the code as well as the code itself, once you are happy & have a rough idea of what’s going on – write your own code using the answers you found as an outline. You may think you are saving time by copying and pasting an answer from or Github, however you might be wasting time in the long run if you can’t come up with it yourself in the future.


Develop a Problem Solving Attitude

There’s no doubt that there will be what appear like roadblocks. At the start, most likely you’ll find that your code won’t run, you’ll get an error that makes no sense. Your personal troubleshooting system may not be formalised yet and you end up giving up. You’ll tell yourself something along the lines of, ‘this is too hard for me’. The truth is it’s not, rather it’s a matter of mental resource allocation. More specifically about how much you’re willing to dedicate to the problem. As a data scientist you learn to dedicate a much higher percent of brain power to your work than you would for trivial tasks. For example, thinking about the quickest journey home may be 10% brain power and take you 1 minute. In the beginning for me, I found that I would need hours of intense concentration trying to fix a single error in my code. After much training and repetition you begin to develop an instinct for fixing bugs in your code, you’ll subconsciously assess the probability of a set of possible mistakes and reasons for the error. Then begin digging into the most likely, which is usually some sort of syntax error or data type error when you begin. 

The key message is that problem solving takes nothing but time and brain cycles, and you need to consecutively keep solving problems until you begin to develop an instinct. Which means you can’t avoid putting in the work. 

Given you need to dedicate this high level of mental resource to the problem you can’t waste brain power beating yourself up about how you can’t get it to work, this is usually a big distraction and could lead to your downfall (of your data science plans at least). That said it’s natural to do so and everyone does it to some degree.

Try to think of a problem as not a roadblock, but rather a problem pending solved status, that way you know it’s just a matter of time and brain power till you figure it out. 

Your perspective on a given problem can be somewhere on either two ends of a spectrum, the first – a problem that causes you pain and self doubt, a dead end of sorts. The other end, a fantastic opportunity for pleasure, the greater the amount of work you dedicate the more pleasure you’ll gain once it’s solved, like an existing holiday in the not too distant future. Rewiring the way you think like this takes time and can be difficult for some, so start with easy problems and don’t bite off more than you can chew. As the level of difficulty of the problems increases each time one is solved, you’ll gradually gain momentum for solving much harder problems. It’s very hard to get this momentum when you start straight off the mark with hard problems. 

A useful problem solving mental crutch is to psychologically unbound the amount of time and brain power you might dedicate to the problem, tell yourself something like,”I’ll work on this small problem for a year if I have to”. There’s something about complete acceptance that the problem will be solved that will help your brain find a solution faster. Or at least that’s been the case in my experience.


Understand Why You Want a Career in Data Science

I’m a strong believer in conviction over intelligence, I also believe conviction breads intelligence. How do you get convicted? This is where a little soul searching comes in, think about the context and your life holistically. There’s a reason why you’ve chosen to read this after all. What’s the why? Are you just interested in what it takes to be a data scientist? Are you here because someone told you it’s the sexiest job of the 21st century? Because you want to earn the big bucks? For the job title’s prestige? Maybe you’re bored of a job and want a challenge? It’s probably a mixture of things, I know it was for me. As long as that reason isn’t too concentrated around what I call, ‘non-intrinsic’ motives then you’re in good standing. Non- intrinsic motives are any external reasons outside of the satisfaction of actually doing the work. What’s the cutoff for the mix of intrinsic vs non-intrinsic motives? I don’t know but if it’s 50% money, 50% prestige then I don’t think it’s going to work… There needs to be some level of job satisfaction, some level of enjoyment, that’s what’s going to be the fuel for progress, the propeller for going on to implementing the practical tips on this site.

That said, I believe everyone has some capacity to get satisfaction from doing the type of work a data scientist does. However not everyone starts on an even playing field, it’s important to be pragmatic about that. Some people have a much bigger mountain to climb. 

The types of people I see going into this field generally have some element of mathematical experience, however, not all. Social scientists, economists, doctors, biologists, chemists, physicists, software engineers are common. However, I’ve seen great data scientists from a range of disciplines including journalists, historians and philosophers. Each with their unique edge they can inject into their data science skill set. Philosophers for example, tend to have the characteristic of inquisition, which becomes mighty useful when it comes to data expiration or just simply not assuming that the data received will be correct.

So what’s the practical tip here? Write down 3 reasons why you want to become a data scientist on some post-it notes, be honest with yourself, if one of them is related to money, that’s good. For the right type of person becoming a data scientist is a fantastic way to pull yourself out of poverty and debt. Remember a mixture of non-intrinsic and intrinsic motives is normal. Once you have your reasons, stick the post-it notes somewhere you’ll see them everyday. Maybe on your wardrobe or bathroom mirror. That way you’ll remind yourself each day why you want to become a data scientist. 


Are You Ready For a Career In Data Science?

I have great memories of being young and obsessed with building model cars. These miniature models of classic cars were built from plastic kits, going from picking the plastic components out individually, then following the visual instructions one by one until the car was fully assembled with glue. I would then paint some flames along the sides, soon after running to my family to show off what I created. Later, moving on to building with Meccano, assembling all manner of contraptions. I attempted to build a marble gun with Meccano pieces and a hand held fan. Didn’t turn out to be as powerful as I hoped. 

When building these childhood relics, I remember time melting away as if it didn’t exist, I was fully immersed, hunched over my desk under a lamp, it felt good. I guess you could call it flow. A few years after I realised I could reach this state with problem solving, puzzles, games and math – if there was a solution to be discovered, I wanted to prove to myself that I could figure it out, and when I did it felt good. Little did I know that it would be the inherent motivation behind my day job today.

For those of you who are like me, who find peace or pleasure in problem solving and creating- then this blog may be for you. If you hope to find a path in life that enables you to problem solve and create each day – then this blog is definitely for you.

This blog is my personal collection of nuggets of wisdom for becoming a data scientist. Almost all of these practical tips helped me become a data scientist. For the tips I did not follow, I retrospectively wish I had. It’s my belief that if you follow them, your chances of becoming a data scientist can increase by an order of magnitude. That doesn’t not mean to say it’s a complete checklist, nor will every tip be relevant for you. But it’s a collection substantial enough, that if followed, will send you on your way. I have purposely attempted to avoid vagueness and ambiguity where possible, the best advice is that of which can be easily interpreted and implemented.