# SAS notebooks

## Overview

SAS notebooks are available for those researchers who are more comfortable using SAS and its ecosystem. These are built off the same base image as [python notebooks](https://docs.redivis.com/reference/workflows/notebooks/python-notebooks), but include the [official SASPy library](https://support.sas.com/en/software/saspy.html) to allow for the execution of SAS in a notebook environment.&#x20;

Working with SAS a notebook environment is slightly different than the SAS desktop application, in that we need to utilize python to interchange data with SAS. This step is quite simple, and doesn't require any expertise in python – see [working with tabular data](#working-with-tabular-data) below.

## Enabling SAS notebooks&#x20;

Because SAS is proprietary software, you will need to have a licensed version of SAS 9.4 in order to enable SAS notebooks on Redivis. Organizations can specify license information in [their settings](https://docs.redivis.com/organizations/settings#stata-and-sas-licenses), which will make SAS notebooks available to all members of their organization. Alternatively, you can provide your own SAS license in [your workspace](https://docs.redivis.com/your-account/workspace#advanced).

## Base image and dependencies

SAS notebooks are based off the [python notebook base image](https://docs.redivis.com/reference/workflows/python-notebooks#base-image-and-dependencies), and can combine SAS with Python and optional python dependencies to create novel workflows.

To further customize your compute environment, you can specify various dependencies by clicking the **Dependencies** button at the top-right of your notebook. Here you will see three tabs: **Packages, pre\_install.sh**, and **post\_install.sh**.

Use packages to specify the *python* packages that you would like to install. When adding a new package, it will be pinned to the latest version of that package, but you can specify another version if preferred.

To manage system dependencies, and for more complicated workflows, you can use the pre- and post- install shell scripts. These scripts are executed on either side of the python package installation, and are used to execute arbitrary code in the shell. For example, you can also use `apt` to install system packages (`apt-get update && apt-get install -y <package>`), or `mamba` to install from conda.

{% hint style="info" %}
For notebooks that reference restricted data, internet will be disabled while the notebook is running. This means that the dependencies interface is the *only* place from which you can install dependencies; running `pip install` within your notebook will fail.

Moreover, it is strongly recommended to always install your dependencies through the dependencies interface (regardless of whether your notebook has internet access), as this provides better reproducibility and documentation for future use.
{% endhint %}

## Working with tabular data

In order to load data into SAS, we use a one-liner to pull any Redivis table, or SQL query result, into a named SAS Data Set. All variable metadata will be loaded with the data.&#x20;

```python
import redivis
redivis.table("_source_").to_sas("ds") # "ds" is the SAS Data Set name

# We can also reference any table in the workflow by name
redivis.table("transform_1_output").to_sas("ds")

# Or run a SQL query and load its results into Stata
redivis.query("SELECT * FROM _source_ LIMIT 10").to_sas("ds")
```

Next, in a separate cell, we use the `%%SAS` "magic" at the start of our cell to specify that this is SAS code.&#x20;

```sas
%%SAS
/*
    # SAS code can be executed inside any cell prefixed with the %%SAS command
*/
proc print data=df(obs=5);
run;
```

{% hint style="info" %}
Consult the full samples of interchanging data between SAS and python in the [SASPy documentation](https://support.sas.com/en/software/saspy.html).
{% endhint %}

## Working with geospatial data

Loading geospatial data looks much the same as other tabular data, in that we can just call the `.to_sas()` method on any table or query in python. If that table or query result contains a geography variable, it will automatically be loaded as geospatial data to SAS via the `mapimport` proc.

In the uncommon case where your table has multiple variables of type `geography`, you'll need to explicitly specify which variable to use as the geography.

If you want to bypass the default behavior and load a table with a geography variable as a standard Stata table, you can explicitly set `.to_stata(geography_variable = None)`.

```python
import redivis

redivis.table("some_geo_table").to_sas("ds")

# If our table has more than one variable of type 'geography', 
#   we must specify the `geography_variable` param. 
redivis.table("some_geo_table").to_sas("ds", geography_variable="geo_var")
```

```stata
%%SAS
proc print data = mymap(obs = 10); 
run;
```

## Creating output tables

Redivis notebooks offer the ability to materialize notebook outputs as a new [table node](https://docs.redivis.com/reference/workflows/tables) in your workflow. This table can then be processed by transforms, read into other notebooks, exported, or even [re-imported into a dataset](https://docs.redivis.com/guides/create-and-manage-datasets/cleaning-tabular-data).

To create an output table, we must first save our SAS data set to a `.sas7bdat` file We can then use the `redivis.current_notebook().create_output_table()` method in python to output our data.

If an output table for the notebook already exists, by default it will be overwritten. You can pass `append=True` to append, rather than overwrite, the table. In order for the append to succeed, all variables in the appended table, which are also present in the existing table, must have the same type.

```sas
%%SAS
libname out "/scratch";
proc copy in=work out=out;
/* ds is the name of the SAS Data Set we want to save*/
    select ds; 
run;
```

```python
# Run in a separate cell, this is python code
# The file name ("ds") will be the name of the SAS Data Set that we previously saved
redivis.current_notebook().create_output_table("/scratch/ds.sas7bdat)
```

## Storing files

As you perform your analysis, you may generate files and figures that are stored on the notebook's hard disk. There are two locations that you should write files to: `/out` for persistent storage, and `/scratch` for temporary storage.&#x20;

Any files written to persistent storage will be available when the notebook is stopped, and will be restored to the same state when the notebook is run again. Alternatively, any files written to temporary storage will only exist for the duration of the current notebook session.

```sas
%%SAS sas_session
proc export data=datasetname
  outfile='/out/filename.csv'
  dbms=csv
  replace;
run;
```
