Pandas in the Pandaemic. (COVID-19 in Scotland — statistics) Part 3
This is the 3rd part of an exploration of pandas package.
Part 2: https://www.vallka.com/blog/pandas-in-the-pandaemic-covid-19-in-scotland-statistics-part-2/
All data are taken from the official website: weekly-deaths-by-location-age-sex.xlsx
Taken from official the site:
Let’s have another look at the graph from the previous part: https://colab.research.google.com/drive/1a5FyWN5psehoqnAUEev0ODE2KNMiFTTE?usp=sharing
New data arrived. The line for the last week is going up a little, but still well below the average.
Ok, sit back, relax, and have a sip from a flask. Why Flask? This is first what comes to mind we start talking about web data visualization with Python. Also we have a long list of libraries, let’s just take one of the most popular — Plotly.
Plotly for Python is actually only a part of a family of products by Plotly. What we will use here — the ability to create data in json format by Plotly for Python.These json data can be used by Plotly for JS to plot a graph on a web page. A bit complicated. Easier done that said.
First we need to install Flask, Plotly and Plotly Express — the last one makes it easier to deal with Plotly.
pip install flask pip install plotly pip install plotly_express
Then we’ll create a Flask application file: mycovidash.py
import json
from flask import Flask,render_template,request
import numpy as np
import pandas as pd
import plotly
import plotly.express as px app = Flask(__name__) def get_data():
data_2021 = pd.read_excel ('https://www.nrscotland.gov.uk/files//statistics/covid19/weekly-deaths-by-location-age-sex.xlsx', sheet_name='Data', skiprows=4, skipfooter=2, usecols='A:F', header=None, names=['week','location','sex','age','cause','deaths'] ) data_2021['year'] = data_2021.week.str.slice(0,2).astype(int)
data_2021['week'] = data_2021.week.str.slice(3,5).astype(int) data_1519 = pd.read_excel ('https://www.nrscotland.gov.uk/files//statistics/covid19/weekly-deaths-by-location-age-group-sex-15-19.xlsx', sheet_name='Data', skiprows=4, skipfooter=2, usecols='A:F', header=None, names=['year','week','location','sex','age','deaths'], ) data_1519['cause'] = 'Non-COVID-19'
data = data_1519.copy() data = data.append(data_2021)
return data def get_plot():
data = get_data()
dt1 = data.groupby(['year','week']).agg({'deaths':np.sum}) dt1.reset_index(inplace=True)
fig = px.line(dt1,x='week', y='deaths',line_group='year',color='year')
fig = json.dumps(fig, cls=plotly.utils.PlotlyJSONEncoder)
return fig @app.route('/')
def init():
fig = get_plot()
return render_template('mycovidash.html',fig=fig)
We basically repeat my exercises from Part 2. But instead of using pandas’ plot function we use Plotly Express’ line function to build a line graph and output this figure as json. What is inside this json data? The short answer — everything needed to display a graph on html page.
We also need an html template file: mycovidash.html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1"> <title></title>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bulma@0.9.2/css/bulma.min.css">
</head>
<body>
<section class="section">
<div class="container">
<h1 class="title has-text-centered"> Weekly Deaths in Scotland </h1>
<div id="myPlot"></div>
</div>
</section>
</body>
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
<script> Plotly.newPlot('myPlot', {{ fig | safe }} );
</script>
</html>
Div myPlot is a place where the plot will be rendered. Plotly.newPlot function renders the plot, getting the data from the variable fig, which we set in the Python function. Notice that we need to use “| safe” filter (see Jinja documentation), otherwise our data in json format will be html-encoded and thus broken.
…And run the app:
export FLASK_APP=mycovidash.py
export FLASK_ENV=development
flask run
Opening localhost:5000 show show the following page:
This graph is already interactive. You can hover over various parts of the graph, see additional controls which appear when you hover over certain places, play with these controls, e.g. zoom-in — zoom-out. It’s already better than what we had before in Jupyter Notebook. (Yes, we could use the same Plotly functionality in a Jupyter Notebook, but now we wanted to build a web application using Flask)
Let’s go further. Let’s include checkboxes corresponding to different groups in our data (year, age, sex, location, cause) and allow user to include/exclude certain groups by checking/unchecking these checkboxes.
First, we need to find out what unique values these groups contain.
all_years = data['year'].unique()
all_ages = data['age'].unique()
all_sexes = data['sex'].unique()
all_causes = data['cause'].unique()
all_locations = data['location'].unique()
Looking at the data, all_ages contain slightly incorrect info: 0 value is represented twice — as string ‘0’ and as an integer 0. We need to clean the data. As all other values for age are strings, let’s just convert all integer 0s to string ‘0’s, adding a simple line to get_data function:
data.loc[data['age']==0,'age']='0'
Next we need to pass all these lists to our template.
...
def get_plot():
data = get_data()
all_years = data['year'].unique()
all_ages = data['age'].unique()
all_sexes = data['sex'].unique()
all_causes = data['cause'].unique()
all_locations = data['location'].unique() dt1 = data.groupby(['year','week']).agg({'deaths':np.sum})
dt1.reset_index(inplace=True)
fig = px.line(dt1,x='week', y='deaths',line_group='year',color='year') fig = json.dumps(fig, cls=plotly.utils.PlotlyJSONEncoder)
return fig,all_years,all_ages,all_sexes,all_causes,all_locations @app.route('/')
def init():
fig,all_years,all_ages,all_sexes,all_causes,all_locations = get_plot()
return render_template('mycovidash.html',fig=fig,all_years=all_years,all_ages=all_ages,all_sexes=all_sexes,all_causes=all_causes,all_locations=all_locations)
And in mycovidash.html, just after <div id=”myPlot”></div>:
<div class="columns">
<div class="column is-2"> Years:<br>
{% for s in all_years %}
<input type="checkbox" value="{{ s }}" class="years_cb" checked> 20{{ s }}<br>
{% endfor %}
</div>
<div class="column is-2"> Age:<br>
{% for s in all_ages %}
<input type="checkbox" value="{{ s }}" class="ages_cb" checked> {{ s }}<br>
{% endfor %}
</div>
<div class="column is-1"> Sex:<br>
{% for s in all_sexes %}
<input type="checkbox" value="{{ s }}" class="sexes_cb" checked> {{ s }}<br>
{% endfor %}
</div>
<div class="column is-3"> Location:<br>
{% for s in all_locations %}
<input type="checkbox" value="{{ s }}" class="locations_cb" checked> {{ s }}<br>
{% endfor %}
</div>
<div class="column"> Cause:<br>
{% for s in all_causes %}
<input type="checkbox" value="{{ s }}" class="causes_cb" checked> {{ s }}<br>
{% endfor %}
</div>
</div>
<div class="has-text-centered">
<button id="update" class="button is-primary">Update</button>
</div>
(It’s time to notice that in this html file I used Bulma css framework instead of the usual Bootstrap. I found Bulma very promising and easier to use than Bootstrap. Let’s see…)
Now as all checkboxes are populated, we need somehow to react to them. Let’s use it the most usual way — using jQuery and ajax call. These lines are going into our template:
$('#update').click(function(){
let years = '';
$(".years_cb:checked").each(function(){ years += this.value + ',' }) let ages = '';
$(".ages_cb:checked").each(function(){ ages += this.value + ',' }) let sexes = '';
$(".sexes_cb:checked").each(function(){ sexes += this.value + ',' }) let locations = '';
$(".locations_cb:checked").each(function(){ locations += this.value + ',' }) let causes = '';
$(".causes_cb:checked").each(function(){ causes += this.value + ',' }) $.ajax({
url: "/update",
type: "GET",
contentType: 'application/json;charset=UTF-8',
data: { years: years, ages: ages, sexes: sexes, locations: locations, causes: causes },
dataType:"json",
success: function (data) {
Plotly.react('myPlot', data );
}
});
})
In Python file we’ll add an additional route:
@app.route('/update')
def update():
fig,all_years,all_ages,all_sexes,all_causes,all_locations = get_plot([request.args.get('years',''), request.args.get('ages',''), request.args.get('sexes',''), request.args.get('locations',''), request.args.get('causes',''), ]) return fig
… and add processing of these new arguments to get_plot function:
def get_plot(pars=None):
...
years = list(map(int,pars[0].strip(',').split(','))) if pars else None ages = list(pars[1].strip(',').split(',')) if pars else None sexes = list(pars[2].strip(',').split(',')) if pars else None locations = list(pars[3].strip(',').split(',')) if pars else None causes = list(pars[4].strip(',').split(',')) if pars else None if years:
data = data[data['year'].isin(years)] if ages:
data = data[data['age'].isin(ages)] if sexes:
data = data[data['sex'].isin(sexes)] if locations:
data = data[data['location'].isin(locations)] if causes:
data = data[data['cause'].isin(causes)] ...
One thing to notice. We have to use DataFrame.isin() method. Initially I wrote
data = data[data['year'] in (years)]
This failed, with an error message which is really difficult to interpret. But if we think about it, data[‘year’] is a Series object, which reminds a list. So actually using ‘in’ operator here would be something like [15,16,17,18,19,20,21] in [20,21] — what is apparently not what we want. But using .isin() method gives us exactly what we need.
Ok, it works now! You can select/deselect different groups and the plot is changing accordingly… but very slow. Of course, we need to add caching for the data. Let’s use to_pickle method of DataFrame, whatever this format is:
import os
import time
...
def get_data():
store_path = 'data.pkl'
try:
tm = os.path.getmtime(store_path)
if int(time.time())-int(tm) > 24 * 60 * 60:
raise Exception("Cache expired") data = pd.read_pickle(store_path)
return data
except Exception as e:
print (e)
pass
...
data.to_pickle(store_path) return data
It’s all for today.
Here is a GitHub repository: https://github.com/vallka/mycovidash
And here is a working version:
Originally published at https://www.vallka.com.