If you ask a data scientist what projects they’ve worked on, you’ll get a long list of paid and personal projects that all have contributed to their learning. My first real eye opening and useful data science project was writing a web scraper in Python. The goal was to scrape consumer food product information, including their ingredients to determine which contained palm oil. I built a site on top of the data that enabled people to search for products. The idea being that much of the rainforest destruction in South East Asia is tied back to unsustainable production of palm oil. Large areas of land would be cleared, most commonly through scorching perimeters of rainforest in-order to clear space for palm oil plantations.
The project felt like a useful piece of work and kept me motivated enough to want to keep solving the challenges it faced, from how to fix encoding issues associated with raw html text to dealing with the best method to store the data. It was a project I truly believed would add value thus, I worked on it ferociously. I was willing to learn for the purpose of creating. Furthermore, you’ll find your learning is more efficient as you learn only what you need to know to get your project working. By documenting and recording these projects you’ll soon have a portfolio you can use to demonstrate to employers your technical skills.
While this was a relatively large project, the project you decide to start with can be as big or as small as you like. As long as you see that the project has some incremental value, even if it’s very small. You’ll be amazed by the number of simple projects available to you which are actually useful. Here are some ideas:
- Python Script clean up unwanted system junk on your hard drive. A script that automatically deletes files in your trash, caches, logs and downloads folders. Whenever you’re running low on disk space, run the script.
- Python Script tell you the chances of a losing hand in poker. Great as it includes some statistics and probability theory.
- Forecast your costs or revenue over the coming months, most banks let you download a csv file of your transactions and balance statements. Or you could create some interesting visualisations of predictions for your income and outgoings.
- Random dinner recipe generator, scrape some recipes online then write some code to randomly pick a recipe without replacement. You’ll never run out of ideas for what to have for dinner.
- Simulate how you spend your time, record how long it takes you to do things, create estimations of how long they would take to complete, then run scenarios of how you could spend your time over the coming weeks and months, Then pick the scenario which is optimal (however you may define that).
Some more DS advanced projects:
- Sentiment analysis of tweets (lot’s of existing code and libraries do this).
- Handwritten digits recognition with MNIST dataset
- IMDB movie recommendation system
- Clustering user data for user segmentation
Luckily there’s plenty of open source datasets available for you to play around with, for example, kaggle.com. Just make sure it’s a project you are genuinely interested in and feel like it can provide some real world value.