Explorations in OmniSci backend rendering with Vega¶

We would like to surface the vega rendering backend to Jupyter notebook users. This will allow people to visualize data that is too large to fit in memory. OmniSci uses Apache Thrift to generate interfaces to the SQL server for various clients. The Javascript client is notably more complete than the Python client.

Establishing a connection¶

pymapd allows us to connect to a OmniSci server using the following code:

In [1]:

import pymapd
import omnisci_renderer
connection_data = dict(
    user='mapd',
    password='HyperInteractive',
    host='metis.mapd.com',
    port='443',
    dbname='mapd',
    protocol='https'
)
con = pymapd.connect(**connection_data)

Once we have a connection, we can try to send vega data to the backend and render it. Both the Jupyter notebook client (typically a browser), and the kernel (in this case a python process) have the ability to make this request. There are a number of different ways we can try to proceed in the notebook.

Generate vega in Python, request in the browser, render in notebook¶

The following cell magic parses yaml data into JSON. This JSON is then sent to the browser, along with the relevant connection data. The browser then makes the request using the OmniSci browser client, and renders the resulting image in the notebook:

In [2]:

connection_data = dict(
    user='mapd',
    password='HyperInteractive',
    host='metis.mapd.com',
    port='443',
    dbname='mapd',
    protocol='https'
)

In [3]:

%%omnisci_vega $connection_data

width: 384
height: 564
config:
    ticks: false
data:
  - name: 'tweets'
    sql: 'SELECT goog_x as x, goog_y as y, tweets_nov_feb.rowid FROM tweets_nov_feb' 

scales:
  - name: 'x'
    type: 'linear'
    domain:
      - -3650484.1235206556
      -  7413325.514451755
    range: 'width'
  - name: 'y'
    type: 'linear'
    domain:
      - -5778161.9183506705
      -  10471808.487466192
    range: 'height'
marks:
  - type: 'points'
    from:
      data: 'tweets'
    properties:
      x:
        scale: 'x'
        field: 'x'
      y:
        scale: 'y'
        field: 'y'
      fillColor: 'green'
      size:
        value: 1

You can also do the same but for vega lite, which will get translated to vega in the browser before being executed with the omnisci browser client

In [4]:

%%omnisci_vegalite $connection_data

width: 384
height: 564
data:
    sql: 'SELECT goog_x as x, goog_y as y, tweets_nov_feb.rowid FROM tweets_nov_feb'
mark:
    type: circle
    color: green
    size: 1
encoding:
    x:
        field: x
        type: quantitative
        scale:
            range: width
            domain:
              - -3650484.1235206556
              -  7413325.514451755
    y:
        field: y
        type: quantitative
        scale:
            range: height
            domain:
              - -5778161.9183506705
              -  10471808.487466192

Write vega directly, request in the browser, render in notebook¶

We don't necessarily need to use yaml as the input format. The following takes a JSON string and sends it to the browser, along with the connection data:

In [5]:

import json

connection_data = dict(
    user='mapd',
    password='HyperInteractive',
    host='vega-demo.mapd.com',
    port='9092',
    dbname='mapd',
    protocol='http'
)

vega1 = """ {
  "width": 733,
  "height": 530,
  "data": [
    {
      "name": "heatmap_query",
      "sql": "SELECT rect_pixel_bin(conv_4326_900913_x(lon), -13847031.457875465, -7451726.712679257, 733, 733) as x,
                     rect_pixel_bin(conv_4326_900913_y(lat), 2346114.147993467, 6970277.197053557, 530, 530) as y,
                     SUM(amount) as cnt
                     FROM fec_contributions_oct
                     WHERE (lon >= -124.39000000000038 AND lon <= -66.93999999999943) AND
                           (lat >= 20.61570573311549 AND lat <= 52.93117449504004) AND
                           amount > 0 AND
                           recipient_party = 'R'
                           GROUP BY x, y"
    }
  ],
  "scales": [
    {
      "name": "heat_color",
      "type": "quantize",
      "domain": [
        10000.0,
        1000000.0
      ],
      "range": [ "#0d0887", "#2a0593", "#41049d", "#5601a4", "#6a00a8",
                 "#7e03a8", "#8f0da4", "#a11b9b", "#b12a90", "#bf3984",
                 "#cb4679", "#d6556d", "#e16462", "#ea7457", "#f2844b",
                 "#f89540", "#fca636", "#feba2c", "#fcce25", "#f7e425", "#f0f921"
      ],
      "default": "#0d0887",
      "nullValue": "#0d0887"
    }
  ],
  "marks": [
    {
      "type": "symbol",
      "from": {
        "data": "heatmap_query"
      },
      "properties": {
        "shape": "square",
        "x": {
          "field": "x"
        },
        "y": {
          "field": "y"
        },
        "width": 1,
        "height": 1,
        "fillColor": {
          "scale": "heat_color",
          "field": "cnt"
        }
      }
    }
  ]
}""".replace('\n', '')
im = omnisci_renderer.OmniSciVegaRenderer(connection_data, json.loads(vega1))
display(im)

<omnisci_renderer.OmniSciVegaRenderer at 0x7f5f793af6d8>

Another example:

In [6]:

connection_data = dict(
    user='mapd',
    password='HyperInteractive',
    host='metis.mapd.com',
    port='9092',
    dbname='mapd',
    protocol='http'
)
vega2 = """{
  "width": 1004,
  "height": 336,
  "data": [
    {
      "name": "polys",
      "format": "polys",
      "sql": "SELECT zipcodes.rowid,AVG(contributions_donotmodify.amount) AS avgContrib FROM contributions_donotmodify,zipcodes WHERE contributions_donotmodify.amount IS NOT NULL AND contributions_donotmodify.contributor_zipcode = zipcodes.ZCTA5CE10 GROUP BY zipcodes.rowid ORDER BY avgContrib DESC"
    }
  ],
  "scales": [
    {
      "name": "x",
      "type": "linear",
      "domain": [
        -19646150.75527339,
        19646150.755273417
      ],
      "range": "width"
    },
    {
      "name": "y",
      "type": "linear",
      "domain": [
        -3071257.58106188,
        10078357.267122284
      ],
      "range": "height"
    },
    {
      "name": "polys_fillColor",
      "type": "linear",
      "domain": [
        0, 325, 650, 975,
        1300, 1625, 1950, 2275, 2600
      ],
      "range": [
        "#115f9a", "#1984c5", "#22a7f0", "#48b5c4",
        "#76c68f", "#a6d75b", "#c9e52f", "#d0ee11", "#d0f400"
      ],
      "default": "green",
      "nullValue": "#CACACA"
    }
  ],
  "marks": [
    {
      "type": "polys",
      "from": {
        "data": "polys"
      },
      "properties": {
        "x": {
          "scale": "x",
          "field": "x"
        },
        "y": {
          "scale": "y",
          "field": "y"
        },
        "fillColor": {
          "scale": "polys_fillColor",
          "field": "avgContrib"
        },
        "strokeColor": "white",
        "strokeWidth": 0,
        "lineJoin": "miter",
        "miterLimit": 10
      }
    }
  ]
}""".replace('\n', '')
im = omnisci_renderer.OmniSciVegaRenderer(connection_data, json.loads(vega2))
display(im)

<omnisci_renderer.OmniSciVegaRenderer at 0x7f5f793c2710>

Requesting in Python¶

Making the omnisci request in the browser has a major drawback, in that the image data is not easily available to the python kernel. Instead, we can make the request on the Python side:

In [7]:

connection_data = dict(
    user='mapd',
    password='HyperInteractive',
    host='vega-demo.mapd.com',
    port='9092',
    dbname='mapd',
    protocol='http'
)
con = pymapd.connect(**connection_data)
con.render_vega(vega1)

---------------------------------------------------------------------------
TMapDException                            Traceback (most recent call last)
<ipython-input-7-0b3e7bdf7783> in <module>
      8 )
      9 con = pymapd.connect(**connection_data)
---> 10 con.render_vega(vega1)

~/quansight/pymapd/pymapd/connection.py in render_vega(self, vega, compression_level)
    580                 vega_json=vega,
    581                 compression_level=compression_level,
--> 582                 nonce=None
    583                 )
    584         rendered_vega = RenderedVega(result)

~/quansight/pymapd/mapd/MapD.py in render_vega(self, session, widget_id, vega_json, compression_level, nonce)
   1609         """
   1610         self.send_render_vega(session, widget_id, vega_json, compression_level, nonce)
-> 1611         return self.recv_render_vega()
   1612 
   1613     def send_render_vega(self, session, widget_id, vega_json, compression_level, nonce):

~/quansight/pymapd/mapd/MapD.py in recv_render_vega(self)
   1637             return result.success
   1638         if result.e is not None:
-> 1639             raise result.e
   1640         raise TApplicationException(TApplicationException.MISSING_RESULT, "render_vega failed: unknown result")
   1641 

TMapDException: TMapDException(error_msg='Exception: /home/jenkins-slave/workspace/mapd2-multi/compiler/gcc/gpu/cuda/host/centos/render/render/Rendering/Renderer/GL/Resources/GLRenderbuffer.cpp:25 Cannot resize renderbuffer with height 90009. It exceeds the max height of 32768')

Questions:¶

Better ways to author vega? For instance, can we use Altair?
Is the browser-side or python-side approach better? Should we do a hybrid approach?
What vega schema is used by OmniSci? Is the "sql" data attribute a custom syntax?