SAS notebooks

Overview

SAS notebooks are available for those researchers who are more comfortable using SAS and its ecosystem. These are built off the same base image as python notebooks, but include the official SASPy library to allow for the execution of SAS in a notebook environment.

Working with SAS a notebook environment is slightly different than the SAS desktop application, in that we need to utilize python to interchange data with SAS. This step is quite simple, and doesn't require any expertise in python – see working with tabular data below.

Enabling SAS notebooks

Because SAS is proprietary software, you will need to have a licensed version of SAS 9.4 in order to enable SAS notebooks on Redivis. Organizations can specify license information in their settings, which will make SAS notebooks available to all members of their organization. Alternatively, you can provide your own SAS license in your workspace.

Working with tabular data

In order to load data into SAS, we first pull it into a data frame in python, and then pass that variable into SAS. If you're unfamiliar with python, you can just copy+paste the below into the first cell of your notebook to load the data in python.

import saspy
import redivis

sas_session = saspy.SASsession()

# We first load the table via python, and then pass it into SAS
df = redivis.table("_source_").to_pandas_dataframe(dtype_backend="numpy")

# Load the table into SAS, giving it the name "df"
sas_data = sas_session.df2sd(df, table="df")
sas_session.datasets() # print datasets currently loaded in our session

View the Table.to_pandas_dataframe() python documentation ->

Now that we have the dataset "df" in SAS, we can run SAS code against the data. To do so, we must prefix any SAS cell with the line %%SAS sas_session:

%%SAS sas_session
/*
    # SAS code can be executed inside any cell prefixed with the %%SAS command
    # Make sure to specify the sas_session after the %%SAS command
*/
proc print data=df(obs=5);
run;

Consult the full samples of interchanging data between SAS and python in the SASPy documentation.

Working with geospatial data

SAS offers some support for geospatial datatypes. However, we can't pass geospatial data from python natively, and instead need to first create a shapefile that can then be loaded into SAS.

import saspy
import redivis

sas_session = saspy.SASsession()

# This python code loads a geospatial table and then writes it to a shapefile
geopandas_df = redivis.table("_source_").to_geopandas_dataframe(dtype_backend="numpy")
geopandas_df.to_file("/scratch/df/out.shp")

View the Table.to_geopandas_dataframe() python documentation ->

Next, we can load this shapefile via SAS:

%%SAS sas_session
proc mapimport datafile="/scratch/df/out.shp" out=mymap;
run;

proc print data = mymap(obs = 10); 
run;

Working with larger tables

If your data is too big to fit into memory, you may need to first save the data as a delimited file, and then read that file into SAS:

import saspy
import redivis
sas_session = saspy.SASsession()

# This python code downloads a table and then rewrites it to a CSV

from pyarrow.dataset import write_dataset

output_directory = "/scratch/df" # directory where file should be located
arrow_dataset = redivis.table("_source_").to_arrow_dataset()
write_dataset(
    arrow_dataset, 
    output_directory, 
    format="csv", 
    max_partitions=1, 
    basename_template="out_{i}.csv" # The name of the file
)

View the Table.to_arrow_dataset() python documentation ->

%%SAS sas_session
proc import datafile="/scratch/df/out_0.csv"
        out=df
        dbms=csv
        replace;
run;

proc print data=work.df(obs=5);
run;

Creating output tables

Redivis notebooks offer the ability to materialize notebook outputs as a new table node in your project. This table can then be processed by transforms, read into other notebooks, exported, or even re-imported into a dataset.

To create an output table, we first need to pass our SAS data back to python. We can then use the redivis.current_notebook().create_output_table() method in python to output our data.

If an output table for the notebook already exists, by default it will be overwritten. You can pass append=True to append, rather than overwrite, the table. In order for the append to succeed, all variables in the appended table, which are also present in the existing table, must have the same type.

# Reference the table table named "some_table" in SAS
sas_table = sas_session.sasdata("some_table")

# Convert the sas_table to a pandas dataframe
df = sas_table.to_df()

redivis.current_notebook().create_output_table(df)

Storing files

As you perform your analysis, you may generate files that are stored on the notebook's hard disk. There are two locations that you should write files to: /out for persistent storage, and /scratch for temporary storage.

Any files written to persistent storage will be available when the notebook is stopped, and will be restored to the same state when the notebook is run again. Alternatively, any files written to temporary storage will only exist for the duration of the current notebook session.

%%SAS sas_session
proc export data=datasetname
  outfile='/out/filename.csv'
  dbms=csv
  replace;
run;

Last updated