Sharing Facets Dive Visualizations

Jan Pomikálek Machine Learning, Open Source Leave a Comment

In my previous post I talked about Facets Dive – an excellent visualisation tool from Google PAIR for data scientists. Now that you have created beautiful interactive charts from your data analyses and machine learning experiments, you may want to share them with your non-technical colleagues or customers, simply and securely.

If you have a web server handy you can just upload the FacetsDive files there and send out the link. But what if you don’t? Or what if your data is confidential and you don’t want to spend time setting up the authentication? Then you may find my today’s tip useful.

I will show you how to save Facets Dive visualization to a single HTML file that can be simply opened on a local computer. No 3rd party logins, Python installations or arcane software dependencies required on the recipient’s part — just a web browser. Sharing the visualization is then as easy as attaching that single file to an email, or sharing it via cloud services, such as Google Drive or Dropbox (if your organization’s privacy policies permit).

Facets Dive as a single HTML file

Let’s look at the result first. You can download this HTML file and open it on your local computer. It shows the visualization of a small sample (100 records) of the Boston Marathon 2014 results. I intentionally used a small sample to keep the file source code readable.

Boston marathon in Facets Dive

Browser screenshot from when you open the file above (or click the image to open the interactive widget in a new window). I tested the file in all browsers I have on my MacBook – Chrome, Safari, Firefox – and it works well in all three. Let me know in the comments below if it gives you any trouble.

Now that you have the example file, you may think that’s all the template you need and be tempted to run away to play with it. But wait, there’s more. I’ll mention a few caveats that you may encounter and also share a Python snippet that makes generating the single HTML file much easier than manually editing the example.

Dependencies

The HTML file has two dependencies – the Facets Dive javascript library (obviously) and webcomponents.js. The latter is not required for Chrome, so if you’re sure your file will only ever be opened in Chrome, you can safely remove the following line from the code:

<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/0.7.24/webcomponents-lite.js"></script>

It makes loading slightly faster, but the difference is so small it’s probably not worth it.

For the Facets Dive library, the example HTML references the file https://raw.githubusercontent.com/PAIR-code/facets/master/facets-dist/facets-jupyter.html. Now, this is the official PAIR repo, so we’re effectively using Github as our CDN. You may want to host a copy of this file on your own web server, if you’re worried about Github going down or wish to avoid the external reference.

Please keep in mind though that both libraries have to be available via HTTPS (not HTTP). Otherwise, if opening Facets Dive on a local computer, the browser will refuse to load the libraries due to the CORS policy.

Data limit and loading speed

Facets Dive can officially handle up to 10,000 records (we have verified this is true). If possible, it’s generally a good idea to sub-sample the data even further. When balancing near the limit, the file easily grows to several megabytes and takes quite long to load in the browser. There’s no loading progress bar. The browser will simply show a blank screen before the file is fully loaded. Impatient users may then easily get to think the window is not working and close it. If you can’t keep the file small for any reason, it’s a good idea to warn the recipient that patience is needed when opening, especially if they’re expected to have a slow internet connection.

Presets

In my example file, you might have noticed the data isn’t loaded in Facets’ default view. Rather, the records (marathon runners) are in a matrix. In rows, the runners are grouped by age and in columns by their marathon run pace. This is achieved by presets. If you inspect the example HTML code closely, you can notice the following settings:

var presets = {
	"horizontalFacet": "pace",
	"horizontalBuckets": 5,
	"verticalFacet": "age",
	"verticalBuckets": 8,
	"colorBy": "gender",
	"imageFieldName": "name"
};

This is mostly self-explanatory. The horizontalFacet and horizontalBuckets define the attribute for grouping columns and the number of buckets respectively. colorBy is simply the attribute that drives the color of the points. imageFieldName is probably the only unclear one. It defines the “Display”, i.e. the attribute to be used for labeling the points. The thing is that it’s also possible to define custom images for the points and this is done using the same setting, hence the name.

You can read more about presets here.

Generating the HTML

Finally, we get to see how the example HTML was generated. Creating the Facets Dive HTML file by hand would be tedious and repetitive. We can do better!

You will be much better off generating it from a template. As promised, below is the code I have used for generating the example above. It should be easy for you to customize for your own use case.

The following assumes that the data to be visualized is in a Pandas DataFrame. That’s a reasonable assumption since Pandas makes it easy to import data from all kinds of formats. Here I’m loading a CSV file from a remote server with a single line of code:

#!/usr/bin/env python
 
CSV_FILE_PATH = 'https://github.com/llimllib/bostonmarathon/blob/master/results/2014/results.csv?raw=true'
OUTPUT_FILE_PATH = './marathon_results.html'
HTML_PAGE_TITLE = u'Marathon results'
 
# Number of lines from the CSV to be sub-sampled for the visualization.
# Set to None to disable sub-sampling.
SUBSAMPLE_SIZE = 100
 
# Facets Dive settings. Inial layout of the visualized data.
# https://github.com/PAIR-code/facets/blob/master/facets_dive/README.md#interactive-properties
PRESETS = {
    u'verticalFacet': u'age',
    u'verticalBuckets': 8,
    u'horizontalFacet': u'pace',
    u'horizontalBuckets': 5,
    u'colorBy': u'gender',
    u'imageFieldName': u'name',
}
 
import json
import pandas as pd
 
df = pd.DataFrame.from_csv(CSV_FILE_PATH)
 
facets_dive_html_template = u"""
<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8" />
    <title>%(title)s</title>
    <script>
        window.addEventListener('DOMContentLoaded', function() {
            var link = document.createElement('link');
            link.rel = "import";
            link.href = "https://raw.githubusercontent.com/PAIR-code/facets/master/facets-dist/facets-jupyter.html";
            link.onload = function() {
                var dive = document.createElement('facets-dive');
                dive.crossOrigin = "anonymous";
                dive.data = %(data)s;
                var presets = %(presets)s;
                for (var key in presets) {
                    if (presets.hasOwnProperty(key))
                        dive[key] = presets[key];
                }
                document.body.appendChild(dive);
            }
            document.head.appendChild(link);
        });
    </script>
    <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/0.7.24/webcomponents-lite.js"></script>
    <style>body, html { height: 100%%; margin: 0; padding: 0; width: 100%%; }</style>
</head>
<body></body>
</html>
""".strip()
 
if SUBSAMPLE_SIZE:
    df = df.sample(SUBSAMPLE_SIZE)
 
with open(OUTPUT_FILE_PATH, "wb") as f:
    rendered_template = facets_dive_html_template % {
        'title': HTML_PAGE_TITLE,
        'data': df.to_json(orient='records'),
        'presets': json.dumps(PRESETS)
    }
    f.write(rendered_template.encode('utf-8'))

This is the full and complete code that generated the standalone HTML example given at the beginning.

Summary

Facets Dive creates beautiful interactive data visualizations. Using it for your own benefit is valuable enough, but being able to easily share the visualizations as dependency-free, secure, private attachments opens further possibilities and adds even more value to an already great software tool. Thanks Google PAIR!

Leave a Reply

Your email address will not be published. Required fields are marked *