Where to find sample data online


Have you ever struggled to find sample data to play with in Power BI?

Did you spend hours (sounds crazy, but it happened to me too!) just looking for a dataset with insurance data? Healthcare data? Housing prices data?

Did you ever wonder "where are people finding the data to create those Netflix and Amazon reports that seem to be everywhere these days? Seriously, tell me your secret!"?

Well, let me tell you, it's not that hard to find that data after all!

I want to save you the trouble to get lost online, so I'm sharing with you my top 3 websites that I use to get data for my sample reports, trainings etc.

And the websites are... *drum roll*


1st - Data.World

2nd - Kaggle

3rd - Mockaroo


Now, to the details. Let's check what each of these websites has to offer you!




1st - Data.World





Link to data.world:

https://data.world/datasets/open-data


This one is my all time favorite!

It has thousands of free datasets (yes, free!) that you can easily download to CSV, or even use a Shared URL to use the data in R or Python. Pretty awesome isn't it?

To use Data.World, you just go to their web page and search the keywords for the dataset you're trying to find, something like "sales", "insurance", "google analytics", "education".

They also have licensing packages that I never explored as I can usually find everything I need in the free version.

Just a remark as you have to sign up to be able to download the files.

Another great thing about Data.World is that you can use SQL to query the data tables directly in your browser, so you can even practice those SQL skills.




You can even create charts in Data.World! They don't look as good as in Power BI of course, but if you want to quickly check the data visually it's a great alternative.


Lastly, you can create projects within Data.World, so you can have all your data, definitions etc in one place. All of this for free!





2nd - Kaggle





Link to Kaggle:

https://www.kaggle.com/datasets



Kaggle is another great resource. It's a bit more oriented to data science, but still very useful for all things data.

To use Kaggle, you also need to register and create a profile, but again, once you create your profile, you're free to use Kaggle for anything you want, and of course to download the datasets.

It works in a very similar way to Data.World. You just have to search for the dataset you want using keywords, and then you can download the dataset usually in csv format.

A very interesting thing about Kaggle is that they usually have data science competitions, where the winners win a cash prize. So you can learn and earn some money at the same time!

For example, there is one competition live at the moment from Mayo Clinic, where they offer 10.000$ to the winner:


But there are competitions where the prize can be as much as 150.000$, yes, you heard it right! However, like I mentioned before, these are usually Data Science projects, so unless you're into Data Science, probably not the best idea to join a Kaggle competition 🤣






3rd - Mockaroo





Link to Mockaroo:

https://www.mockaroo.com/


Mockaroo is very useful if you want to generate data that is not usually available online. Some examples of data you can generate in Mockaroo:

  • First and Last names

  • Email addresses

  • IP addresses

  • Avatar URLs

  • Phone numbers

  • Credit card numbers

  • IBAN

  • NHS number

  • ...

If you're thinking "Oh, there is a lot of personal data there, is that even legal?".

This generated data is all fake, don't worry!!

You can mix and match different fields and categories, so you can choose which columns you want in your "fake" dataset.


Mockaroo offers a free version, however, you can only download files up to 1000 rows. There is no limit on the number of files you can download though, so you can download 5 files and end up with 5000 rows of data.

You can also use the API to get the data from Mockaroo. The free version allows you to make up to 200 API requests per day.



And these are my top 3 websites that I go for whenever I want some sample data.

Each one of the websites offer a lot more than just "sample data" as you know now, so make the most out of each one!

I hope you found this post useful, now go and play with some datasets in Power BI 😊

239 views1 comment