Swetha Subramanian     About     Research     Archive     CV     Portfolio

Connecting to presto database using python

Recently I ran into fair bit of trouble trying to connect to the presto+hive database using python. I first tried installing prestodb/presto-python-client, which ended up giving me a ‘ImportError: No module named kerberos_sspi’ error. I was trying it on a OSX machine and the kerberos_sspi error seemed to be a Windows dependency issue. Strange. Could not resolve it. I then moved on to PyHive. This was an easy install.

However, connecting to it wasn’t as simple. Here is a quick how-to code-snippet for future reference.

from pyhive import presto
import pandas as pd
cursor = presto.connect(host = $HOSTNAME, port = $PORTNUMBER, 
username = $USERNAME).cursor()
query = "SELECT ALPHA FROM GREEK_LETTERS"
cursor.execute(query)
df = pd.DataFrame(cursor.fetchall())

Note:

  1. If you are used to ending SQL commands with ‘;’, please refrain from doing so here. It will throw an error.

  2. The above code snippet only returns the data, but with column names as numbers. To have it return the dataframe with the column names, use the following commands instead of the last line.

column_names = [desc[0] for desc in cursor.description]
df = pd.DataFrame(cursor.fetchall(), columns = column_names)