Categories
resources

Know Your Strengths

A memory that has always stuck with me during my career-to-date is when my data science hiring manager drew a quadrant on the white board during one of my hiring interviews. The quadrant represents the spectrum skills you need to become a data scientist.  Every data scientist sits somewhere on this landscape of skills. I was asked to draw a dot on where I thought I sat on a spectrum of four primary DS skills. The skills were: Technical/Engineering, Theoretical/Mathematical, Commercial Aptitude and Data Storytelling.

At the time, I feel I greatly overestimated my technical ability – knowing what I know now, I would have rated myself lower on this scale. Now, however, I would say I’d be offset towards technical/engineering. 

I believe this type of honest self analysis is useful to any data scientist at any stage of their career, however even more so at the early stages. This kind of self awareness of your skills set can act as a compass to direct you into what areas you need to improve on and guide towards doing work that you are most likely to succeed at. Have a go at plotting yourself on the quadrant now, and then where you would like to be in, say, 5 years from now.

The truth is employers will most likely want you to sit somewhere across multiple skills groups. What use is complex statistical analysis if you can’t extract and articulate the valuable knowledge to others in the company, be it verbally or visually?

Same goes for brilliant technical data scientists who create amazing pieces of technology that hold no commercial value? Or vice versa, a data scientist has a brilliant grasp on the intricate commercial value of a project, yet does not know where to start on delivering the project. The first years of your career will enable you to feel out what types of tasks you have the greatest aptitude for, however, some of you will already know this. And for the very rare exceptional cases, you have a high aptitude for all 4 scales of the quadrant. 

The DS skills quadrant:

Technical/Engineering 

Theoretical/Mathematical



Commercial Aptitude


Data Storytelling 

One of the modern dilemmas of aspiring data scientists today is that the job description of a data scientist is ever expanding, the possible skills that could fall under the umbrella of “Data Scientist” is extremely large and growing by the day. Thus, it’s important to rank these skills by some criteria. However you choose to rank these skills, ensure that you don’t become overwhelmed by the endless number of possible tools and skills you think you require. That said there are some basic skills that are fundamental to the job.

Categories
mindset

Digitally Immerse Yourself 

As part of taking your career in data science seriously, you should try and surround yourself with as many different information and learning inputs. Try and build this into your lifestyle as best you can. This will most certainly involve spending your free time ingesting data science related content. As you slowly chip away at content you’ll start making connections between concepts without even realising, it may even happen on a subconscious level. 

The longer you immerse yourself, the more you’ll learn and the more likely you’ll stick at it for even longer… constantly being surrounded by data science related content will also act as daily reminders to keep learning and help prevent you from getting distracted and moving on to some other shiny project. 

A useful tactic here is to leverage the existing habits your brain has already set with social media. For example, the involuntary twitches making you open up Facebook when procrastinating, and replacing the not so useful content you were ingesting with data science related content. Liking, following and engaging DS groups and pages will fill your newsfeed with interesting data science related content.

Here are some ways you can immerse your digital life with data science information online. 

  • Podcasts and ebooks
  • Email newsletters 
  • Subscribe to subreddits, Facebook groups and forums
  • Follow data scientists / data science related accounts on Twitter and blogs
  • Watch films related to AI & data science 
  • Download apps that send you content recommendation notifications
  • Subscribe to YouTube channels
Categories
resources

Learn from Free Videos

Some concepts in data science need to be explained to you in a clear and human way. Aside from having a tutor, mentor or teacher then the next best thing can be to watch a quality video explaining concepts to you. Luckily there are thousands of videos with varying degrees of quality explaining most of the important fundamentals. For example, you could type “How decision trees work” into YouTube, start at the top, if one video doesn’t make sense to you, you can simply try another video until you find someone who explains the concept in a way that you can understand. You get the added benefit of being able to pause a video and look up as you go along. I have found this a highly effective way to learn. Furthermore you can rewind sections of the video over and over again, going over anything that you may have missed. 

When I was learning, I created a table with important fundamental concepts and links to videos I found helpful explaining them.

Many paid online course platforms such as Udemy and Coursera, deliver lessons through video however much of the content is already available for free through YouTube and other platforms. Do some research and find a channel and platform that works for you.

Categories
resources

Work on Projects Interesting to You

If you ask a data scientist what projects they’ve worked on, you’ll get a long list of paid and personal projects that all have contributed to their learning. My first real eye opening and useful data science project was writing a web scraper in Python. The goal was to scrape consumer food product information, including their ingredients to determine which contained palm oil. I built a site on top of the data that enabled people to search for products. The idea being that much of the rainforest destruction in South East Asia is tied back to unsustainable production of palm oil. Large areas of land would be cleared, most commonly through scorching perimeters of rainforest in-order to clear space for palm oil plantations. 

The project felt like a useful piece of work and kept me motivated enough to want to keep solving the challenges it faced, from how to fix encoding issues associated with raw html text to dealing with the best method to store the data. It was a project I truly believed would add value thus, I worked on it ferociously. I was willing to learn for the purpose of creating. Furthermore, you’ll find your learning is more efficient as you learn only what you need to know to get your project working. By documenting and recording these projects you’ll soon have a portfolio you can use to demonstrate to employers your technical skills.

While this was a relatively large project, the project you decide to start with can be as big or as small as you like. As long as you see that the project has some incremental value, even if it’s very small. You’ll be amazed by the number of simple projects available to you which are actually useful. Here are some ideas:

  • Python Script clean up unwanted system junk on your hard drive. A script that automatically deletes files in your trash, caches, logs and downloads folders. Whenever you’re running low on disk space, run the script.
  • Python Script tell you the chances of a losing hand in poker. Great as it includes some statistics and probability theory.
  • Forecast your costs or revenue over the coming months, most banks let you download a csv file of your transactions and balance statements. Or you could create some interesting visualisations of predictions for your income and outgoings.
  • Random dinner recipe generator, scrape some recipes online then write some code to randomly pick a recipe without replacement. You’ll never run out of ideas for what to have for dinner.
  • Simulate how you spend your time, record how long it takes you to do things, create estimations of how long they would take to complete, then run scenarios of how you could spend your time over the coming weeks and months, Then pick the scenario which is optimal (however you may define that).

Some more DS advanced projects:

  • Sentiment analysis of tweets (lot’s of existing code and libraries do this).
  • Handwritten digits recognition with MNIST dataset
  • IMDB movie recommendation system 
  • Clustering user data for user segmentation

Luckily there’s plenty of open source datasets available for you to play around with, for example, kaggle.com. Just make sure it’s a project you are genuinely interested in and feel like it can provide some real world value.

Categories
tools

Understand How a Computer Works

Basic IT literacy is a given, as is a fundamental understanding of computer science. The majority of the work you’ll be doing will be on a computer and you won’t be spending a huge portion of your time using applications like Microsoft Word, Excel or Outlook. Rather, you’ll be in the trenches of the computer using the text editors, the terminal console or in your IDE; writing and running your own code. I very rarely spend time looking at nice user interfaces, you’ll be ‘closer to the metal’, in that you’re working with data a layer below conventional applications with graphical user interfaces. Closer to the metal machinery that grinds all the instructions.

Working in this way enables a huge possibility for customisation and flexibility with regards to what the computer is capable of. Let’s run through some basics you should know.

“What is an OS?” 

“What is linux”

“What is a Virtual Machine?”

“What does distributed computing mean?”

“What’s the difference between RAM and ROM?”

“What is a shell/bash script”

“What’s a CPU?”

“How many MB in a GB?”

“How do you do basic file management in the terminal? E.g. remove, copy and rename files”

“What’s the difference between a relational database and a non-relational database?”

“What is a binary executable?”

There are plenty more important concepts you should understand besides the above, but run through each one to check you know the answer and if you don’t, give it a search online.

Categories
tools

Make Notes, Keep Them Organised 

This tip is one that I retrospectively wished I had listened to. 

Notes are the breadcrumbs along the path of where the knowledge is stored in your head. Sometimes you need to read through notes to resurface that knowledge. While you’re learning, create your own personal knowledge repository of notes, keep them clear and organised. If you’re aesthetically inclined, make them look nice and something you can be proud of. You’ll probably have to draw upon them a few times in your career, I know I had to rummage through boxes of old papers from my previous modules. I regrettably threw many of them out. Besides being useful, notes are a living record and a “horcrux” of the long hours spent learning your trade.

Categories
resources

Get to Grips With The Terminology 

As with any technical industry buzzwords, abbreviations and obscure words synonymous with less obscure words are common. They are just part of the game and learning what all these words mean can seem daunting, in a way they act as a barrier to entry for people when they really shouldn’t be. 

Rather than write an exhaustive list which would take far too long, here are a few important examples and their definitions:

  • ETL – Extract Transform Load
  • Heuristic(s) – Finding an approximate solution when classical methods don’t work. 
  • Structured / Unstructured Data – Data that can easily be put into a table or database vs other data such as PDFs or MP3 files.
  • Supervised / Unsupervised Algorithms – Machine learning algorithms that learn with labelled training data vs those without 
  • Greedy Algorithms – An algorithm that iteratively finds the localised best option in order to get to the global optimal option.
  • Normalization –  Adjusting values measured on different scales to a common scale.
  • Residual (Error) – Deviation of the observed value from some derived value.
  • Feature engineering –  Process of transforming and creating training data for a machine learning algorithm.
  • Application containerization – The process of creating standard unit of software that packages up code and dependencies so the application runs between computing environments.

There are many more of course and terminology is ever changing. 

Just don’t take how many words you know the meaning of as an approximation for your capabilities as a data scientist. You’ll learn what they mean in time.

Categories
help

Find a Mentor 

In my eyes having a mentor is one of the most important things you can do to aid you in becoming a successful data scientist. I was fortunate enough to find someone willing and able to invest time in teaching me when I started as a junior data scientist, for which I am forever grateful. Before starting to find a mentor, make sure you’re ‘mentor-able’, this means having the right attitude to be mentored, which hugely consists of staying humble and appreciative for their time. This person will provide you with wisdom and teach you to think in a specific way. After all, they themselves spent hundreds of hours figuring out and/or receiving mentorship themselves. Whilst the thought of approaching an expert and asking for help may seem intimidating, you’ll be surprised by just how many may be willing to help. After all, their career may have given them so much, they may feel a responsibility to pay it forward. If they like you, they’ll gain satisfaction in watching you progress and happily trek with you through the dense jungle of data science. 

So how do you go about finding a mentor? Here are the main 4 ways:

  • Assigned a mentor through work 
  • Pay for tutoring / training 
  • Someone you know 
  • Reachout to people 

Assigned a mentor through work

All of these have their benefits and caveats. Mentoring through work is likely to be the optimal choice as you can get paid whilst being mentored, however, not everyone will be fortunate enough to have this as an option. If you do, it may be through one of two ways, you work for a company that has an in house data science team and supports mentorship for career transition. You may spend time working on a project with a data scientist, which you can use as an opportunity to ask questions and express your interests in learning more. Data scientists are often found in cross functional teams or ‘squads’. The other option is to join an apprenticeship or junior role where you are able to learn on the job quickly, this is likely to be the most efficient way to learn however you may have to accept a lower salary, or if in the case of an unpaid internship, nothing at all.

Someone you know 

Reaching out to someone you know in your social network is the next best option in my opinion. Think of people who know you well that have a close mutual friend. There’s no harm in asking, it’s unlikely someone in your social circle will be rude or very dismissive. In a way asking for their expertise can be a great compliment. If you don’t know too many people in the professional space, say you’re a graduate student, ask your parents, uncle, friends parents. Thanks to the six degrees of separation in social networks, the idea that everyone is mutually connected to everyone on average by 6 mutual connections, you only have to keep digging until you find the perfect person to help you.

Pay for tutoring / training 

There are plenty of data science bootcamps and training programs available online and offline where you can be assigned a mentor. As is the case with academics with real world experience, most masters programs will assign you a supervisor, however this supervisor will likely work with many students who will be short on time and may not have the full industry level experience that you seek.

There are websites that offer private tutoring online, however be prepared to spend a lot of money for the many hours of supervision you might need. Most people don’t have pockets so deep, so seeking cost effective or free mentorship is their only option. In my experience, the best mentors won’t ask for money as they are not financially motivated to help or have reached levels where any money they could earn through mentorship is somewhat unecessary. 

Reach out to people 

If you don’t have any friends or acquaintances in data science, then reach out to data scientists via email, Linkedin, Twitter or any other social platform. It could be something as quick as a comment on something they posted, data science blogs are a good place to start. Have a clear idea of the sort of person you want to learn from, someone who is in the position or industry you envision yourself working in a few years from now. If it’s data science for finance you’re interested in, reach out to data scientists working at fin-techs or banks.  Good data scientists receive multiple cold messages a week from recruiters; they may find it refreshing to receive your message. If the person has published some work, contributed to an open source project or has written a blog post, read it and mention it. The trick is to find a way to build an authentic relationship, for example don’t suck up and praise work you haven’t read. Be honest and open about what you hope to gain and mention what they could learn from the relationship (mentors often find the process mutually beneficial for their own development) and don’t get disheartened if too many people say no.

Once you find that mentor, work on building a positive relationship with them as best you can. It could be one of the most important relationships you have in terms of their impact on your career. Once you have a clear and efficient dialogue set up you can begin unearthing their wisdom. A scarce and valuable resource.

Categories
mindset

Accept You Won’t Understand Everything

This piece of advice is short but sweet. Don’t expect to understand everything. Don’t beat yourself up for it, it’s normal. Some things are too complex and simply unnecessary to know. For example, machine code – what’s the use of knowing how to write hexadecimal instructions or binary if you don’t need to? Same goes for some mathematical proofs, understanding the idea is usually enough, be pragmatic about what you need to know and avoid following rabbit holes into pools of unnecessary knowledge. This can be difficult to begin with as you’re less experienced at classifying what’s useful and what is not. A general rule of thumb is that if there isn’t a huge amount of information available online, it’s usually not worth knowing to begin with. 

As your data science career develops you will begin to specialise and acquire more of this rare knowledge where necessary. Don’t try and learn it all at the start. Sounds simple, but you might come across people who like to know things others don’t, leading them to almost brag about something niche and impractical for the sake of sounding intelligent. Avoid this trap at all costs, people will listen if you have something that can actually add value to them. 

Categories
mindset

Don’t Fear The Mathematics

Reading any scientific research paper can often be intimidating when you see large sections of mathematical notation – to the untrained eye at least, it can look like some foreign language that only intelligent people can understand. You’ll realise soon enough that the fundamental elements of these equations, broken down, are much simpler concepts than many of the complex ideas portrayed in the English language. Breaking the equations down and googling each element will let you build a picture of what that equation means. Soon, you’ll begin decoding and understanding what the equations means from understanding what each element means. 

A good data scientist can take an equation, understand the variables involved, understand the operators, such as ‘sum over’ or ‘series product’, then turn that single or set of equations into pseudocode. The pseudocode forms the outline for an algorithmic implementation of the methodology described in the paper. In reality, the majority of the important research that’s implemented regularly, such as many of the machine learning algorithms have already been codified and incorporated into some package for you. However being able to understand the mathematics of these implementations is a crucial part of the job of a data scientist. I would suggest having a mathematical notation ‘cheat sheet’ to hand, which you can either write yourself or find online. Use it like you would use a dictionary, translate the mathematical notation into concepts that you understand in your head.