Crime and car accidents in Poland will be completely eliminated from as early as 2032.

 

How do we know this you may ask? We can see it in the statistical trends, and numbers do not lie…right? On the Polish police force’s page, information on the number of in flagrante delicto arrests, search cases, traffic accidents and their victims is updated on a daily basis. Unfortunately, a simple table showing raw numbers does not say much about these events, their frequency or the aggregate trends. For our contribution, we downloaded these police statistics to Google Sheets, made a few calculations, and reported the results to Google Data Studio in a report titled ‘Polish Police Statistics’. We have also made sure that all these activities are updated automatically 😉 In this article, we will show you how from raw, simplistic website data you can create a readable and interesting report with impressively informative content.

1

A brief explanation before we get to the merits

Before you start reading, we would like you to know that the contentions presented in this article serve only to show the capabilities of the Google Data Studio tool and are not intended to cast influence or opinion on the Police institution in any way, nor can they be treated as source material for other studies. The conclusions presented in this article do not express the opinions of MakoLab but merely serve as examples of the types of applications that can be utilised for data presentation using this Google tool. In this way, the data we have downloaded from the Police service purely provides us with a much more interesting case study than an abstract demonstration ever could. Ok, let’s get started.

Will the need for the police really become obsolete? Certainly not. The tools that we used to prepare the report are useful and precise, but the quantity of initial data, the number of factors that the data set does not take into account, errors in the dataset itself, and the lack of consistency in defining the investigated events effectively prevent the analysis from reaching such far-flung conclusions as zero crime in 15 years. Nevertheless, this case presents a great opportunity to demonstrate the capabilities of Google Sheets and Google Data Studio and to show what pitfalls manifest on the way to constructing strong conclusions.

In addition, from this article you will also learn:

  • How to automatically download data from a website and the web to a spreadsheet
  • How to optimally use the IMPORTHTML function in Google Sheets
  • How to archive the value of one cell into another o in Google Sheets <- unbelievably useful
  • How to use the script editor (GAS) for Google Sheets

Where did we get the data for our analysis?

From the internet ;), but more specifically from the website www.policja.pl, where the statistics on crime and road accidents in Poland are published on a daily basis. Unfortunately, we did not have direct access to the database or API and so we had to download the data straight from the website. To help us in our endeavour, we needed to recruit a simple, effective and all-powerful data processing solution, namely a spreadsheet. The old faithful and almost irreplaceable Excel allows users to import data directly from a web page (Data> From Web). However, the use of Excel requires the active refreshing of such inputted data. For this to happen, you need a switched-on computer and someone who can constantly refresh the downloaded data, but we want everything to happen automatically.

Data preparation for Google Data Studio

To best meet our needs, we required a spreadsheet that would always be available online. It would also be advisable that our “database” can be prepared in such a way as to be easily linked with the Google Data Studio, which would be used to visualize the data later on. Make way Excel, because Google Sheets has fulfilled all of our criteria. Through the use of the IMPORTHTML formula function, users can download tables or lists from a specified URL in a very similar fashion to Excel. We can attain exactly the data we are seeking by simply indicating which table/list we are interested in (1,2,3 …) e.g.:

=IMPORTHTML(“http://www.policja.pl/pol/form/1,dok.html”;”table”;1)

Such a formula allows for the transposition of the entire contents of the table from the website onto the spreadsheet, including all of the headings:

2

The IMPORTHTML formula is a great tool, but it also has its disadvantages. One of critical importance is its performance. If too many formulas are present in a given sheet it can cause a bug on the page from the response times taking too long to update the records. If we refer to multiple subpages in one spreadsheet, some of the formulas simply will not work. In this situation, it is virtually impossible to retrieve historical data from several hundred URLs at once.

Being aware of this pitfall, it is, therefore, worth reducing the number of formulas on each sheet. Historical data – which the Police publishes over several hundred pages – was copied and saved permanently into the sheet. It then became possible to focus on adding and updating only the most recent data, day by day. In this way, we limited the number of necessary formulas to one and thus significantly improved the reliability of the sheet:

=INDEX(IMPORTHTML(“http://www.policja.pl/pol/form/1,dok.html”;”table”;1);2)

 

Preceding the IMPORTHTML function with the INDEX function allows you to specify that only an indicated table row should be returned – in this case, the row responsible for providing the most recently available data. Here we identified the second row for selection as the first row contained the headers.

 

Automatically archiving rows in Google Sheets

Ok, we have historical data saved in the sheet and we can download fresh information about new events every day. But how can this information be copied and added to the end of the table for archiving? Easy! We’ll show you!

At this stage, the sheet should look like this:

3

In row 1 we have the IMPORTHTML formula, which is responsible for downloading the latest data. In the 4th row of the table are the extracted headers, and below these are the historical data from December 2008 onwards which has been permanently saved to the sheet. We then needed to add the latest day to the bottom of the table, thus archiving the latest available data. To do this, we used the Google API Scripts (GAS) editor. To go to the editor from the Google Sheets Tools menu select Script Editor:

4

 

We utilised two functions from the script editor. The first function was responsible for creating an additional custom menu in the Google spreadsheet. We then placed this within a button responsible for the following function:

function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu(‘Custom Menu’)
.addItem(‘Save Data’,’saveData’)
.addToUi();
}

 

The second function then fulfils our exact request: to retrieve the recently added values from the first row and add them to the last free row on the sheet:

function saveData() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheets()[0];
var data = sheet.getRange(‘Sheet1!A1’).getValue();
var zatrzymani = sheet.getRange(‘Sheet1!B1’).getValue();
var poszukiwani = sheet.getRange(‘Sheet1!C1’).getValue();
var nietrzezwi = sheet.getRange(‘Sheet1!D1’).getValue();
var wypadki = sheet.getRange(‘Sheet1!E1’).getValue();
var zabici = sheet.getRange(‘Sheet1!F1’).getValue();
var ranni = sheet.getRange(‘Sheet1!G1’).getValue();
var timestamp = sheet.getRange(‘Sheet1!H1’).getValue();
sheet.appendRow([data,zatrzymani,poszukiwani,nietrzezwi,wypadki,zabici,ranni,timestamp]);
}

 

Of course, the script should be adapted to suit your own file, including indicating the appropriate range of cells in the relevant sheet to then subsequently assign selected variable names to them. The function then works by selecting these variables and delivering their values to the last row of the sheet. In the script editor, it should look like this:

5

If you want to learn more about GAS, we recommend Ben Collins’s blog.

 

Script schedule and auto-refreshing

After saving the script you should check that it works correctly by running the SaveData function. This can be done from the Select Function menu. In addition, after saving the script, the spreadsheet should be enriched with a new item in the menu:

6

By clicking on “Save Data”, you are allowing the following function: the downloading and saving of all values in the sheet. This will happen every time the button is clicked. It can be effectively used it to test the sheet, and when necessary, to re-download the data.

In order for the process to be completely automated we still needed to assign an execution schedule to the script that would run the recalculation once a day. This was achieved by going back to the script editor and selecting the project trigger:

7

At this stage, we have a spreadsheet that automatically downloads and archives the latest data published on the website of the Polish police force. By having this data on the spreadsheet, we can perform many different analytical functions, such as calculating crime averages by selecting a trend and researching the relationship between different months or days of the week. So there you have it! We will still unfortunately have to wait and see what the state of security will be in Poland in 2032 😉

In the next article, we will reveal how to clip the prepared sheet from Google Data Studio and make a self-updating report. Stay tuned;)

The report is available online.

Rate this article
(13)
We use cookies to provide you with a better experience. Carry on browsing if you're happy with this, or learn more about opting-out the following here.
Follow MakoLab on Social Media
Want to be up-to-date with our MakoNews? Sign up to our newsletter.