Machine Learning – how do I get a good data to learn?
This is the fourth article of the Machine Learning topic. I have introduced you to the Machine Learning, described the process and the tools.
My first idea was that today I like to show you a simple example of the supervised Machine Learning process. Then I found that as you already know Machine Learning and AI must have (a lot of ) data to play with. I have not told you yet how to get the data!
For sake of the series I will be using public datasets and here are my favourite web pages where you can find good data:
There are resons you see this website on the first place. You can find a lot of datasets there but this is not all! You can also take part in challenges which means there are problems to solve! Most of them are for learning purposes where you can see how other people write their code and you get scored once you publish a solution. You can also take part in discussion with participants.
There are also paid challenges where the complexity of the problem is really enormous.
UCI Machine Learning Repository
This website is governed by the University of California, School of Information and Computer Science. There are hundreds of great datasets classified by the type of machine learning problem. You can find datasets for univariate and multivariate time-series datasets, classification, regression or recommendation systems.
This website is dedicated to all types of deep learning problems. You will find data sets from categories like natural images, artifical datasets, music, recommendation systems, faces, speech and text recognition.
This is just another great website with tons of valuable datasets! There are links to public government data sets, finance and economics, lot of datasets similar to the deep learnign page. But there is more! You can find datasets for autonomous vehicles as well!
Yes, this is a name of a github repo. Go there and you will find why data is awesome!
Summary
I have to say that there are more than the sources I have mentioned in this post. I have shared with you the one I already know and have been using during my learning process. I hope you find them interesting!
But… Do not forget about the Cloud vendors as they also have a lot of great data sources ready to use:
Try to find more at Stanford, MIT and Berkeley universities. They have a lot of open datasets for their students. Leave a comment and I will navigate you there!
Cheers,
Damian