# Work with data in notebooks

Redivis notebooks a performant, flexible environment for analysis that allow you to analyze and visualize data in workflows in Python, R, Stata, or SAS. With the notebook computation happening on Redivis, you don't need to configure an environment on a local machine or server, or export data from Redivis. This makes for easy iteration and collaboration, not to mention ensuring better security and data throughput.

Before working with a notebook you'll want to get started first by [creating a workflow](https://docs.redivis.com/guides/analyze-data-in-a-workflow) and adding data. You can then create a notebook off of any table in your workflow.

{% hint style="info" %}
If you are working with very large tables (>10GB is a good rule of thumb), it's always a good idea to first reshape and reduce the data via [transforms](https://docs.redivis.com/guides/analyze-data-in-a-workflow/reshape-data-in-transforms), since they can be significantly more performant for large data operations than running code in Python, R, Stata, or SAS.
{% endhint %}

## 1. Create a notebook

Once you have a table that you're ready to analyze, you can create a notebook by clicking the **+ Notebook** button at any time. You'll need to name it and choose a kernel (Python, R, Stata, or SAS).&#x20;

Notebooks can only reference tables within their workflow, so we recommend keeping all related work together in the same workflow.

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FoxhzY08Aab3JTc9vuIW2%2FScreenshot%202024-12-09%20at%207.54.00%E2%80%AFPM_out.png?alt=media\&token=4f2e9800-de46-4959-923d-70b95f7919c6)

{% hint style="info" %}

### Python

Python notebooks come pre-installed with a variety of common scientific packages for python. [*Learn more about working with python notebooks.*](https://docs.redivis.com/reference/workflows/notebooks/python-notebooks)<br>

### R

R notebooks come pre-installed with a variety of common scientific packages for R. [*Learn more about working with R notebooks.*](https://docs.redivis.com/reference/workflows/notebooks/r-notebooks)<br>

### **Stata**

Stata notebooks are based off of python notebooks, but offer affordances for moving data between Python and Stata. [*Learn more about working with Stata notebooks.*](https://docs.redivis.com/reference/workflows/notebooks/stata-notebooks)<br>

### **SAS**

SAS notebooks are based off of python notebooks, but offer affordances for moving data between Python and SAS.[ *Learn more about working with SAS notebooks.*](https://docs.redivis.com/reference/workflows/notebooks/sas-notebooks)
{% endhint %}

## 2. Define dependencies

All notebooks come with a number of common packages pre-installed, depending on the [notebook type](https://docs.redivis.com/reference/workflows/notebooks/notebook-concepts#analyzing-data). But if there is something specific you'd like to include, you can add [versioned packages](https://docs.redivis.com/reference/workflows/notebooks/notebook-concepts#dependencies) or write a [pre-/post- install script](https://docs.redivis.com/reference/workflows/notebooks/notebook-concepts#pre-install-and-post-install-scripts) by clicking the **Edit dependencies** button in the start modal or the toolbar.

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FFifiFnTKxvcPsu81OXBL%2FScreenshot%202024-12-09%20at%207.57.49%E2%80%AFPM_out.png?alt=media\&token=5b3c9d60-4c09-4b4e-a073-a9168fa74e38)

*Learn more in the* [*Notebooks*](https://docs.redivis.com/reference/workflows/notebooks/notebook-concepts#dependencies) *reference section.*

## 3. Compute resources

The default notebook configuration is free, and provides access to 2 CPUs and 32GB working memory, alongside a 60GB (SSD) disk and gigabit network. The computational powerful of these default notebooks are comparable to most personal computers, and will be more than enough for many analyses.

If you're working with larger tables, creating an ML model, or performing other particularly intensive tasks, you may choose to configure additional [compute resources](https://docs.redivis.com/reference/workflows/notebooks/compute-resources) for the notebook. This will cost an hourly rate to run based on your chosen environment, and require you to purchase [compute credits](https://docs.redivis.com/reference/your-account/compute-credits-and-billing) on your account.

Clicking **Edit compute configuration** button in the start modal or the toolbar will allow you to choose from different preconfigured machine types. The notebook will then default to this compute configuration each time it starts up.&#x20;

<figure><img src="https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2F1igbwbbyZ9W3Wn6yYejZ%2FScreenshot%202024-12-09%20at%207.58.35%E2%80%AFPM_out.png?alt=media&#x26;token=e70b0ba4-78bf-4b94-9d4b-401a23021e43" alt=""><figcaption></figcaption></figure>

*Learn more in the* [*Compute resources*](https://docs.redivis.com/reference/workflows/notebooks/compute-resources) *reference section.*

## 4. Start the notebook

Notebook nodes need to be started in order to edit or execute cells. When first clicking on a notebook node, you will see a read-only view of its contents (including cell outputs). Click the  **Start notebook** button in the toolbar to connect this notebook to compute resources.&#x20;

When you create a notebook for the first time it will start automatically.

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FX1H54X6fazK3O21ysle3%2FScreenshot%202024-12-09%20at%207.55.37%E2%80%AFPM_out.png?alt=media\&token=6b944b69-6a09-4349-9093-c45be2b5fa9d)

## 5. Load data

To do meaningful work in your notebook, you'll want to bring in the tabular and/or unstructured data that exists in your workflow into your notebook.

### Referencing tables

Notebooks come pre-populated with templated code that pulls in data from the notebook's source table. You will need to run this cell to pull the data into the notebook, and you can see that it worked because this code will print a preview of the loaded data.&#x20;

You can reference any other tables in this workflow by replicating this script and executing it with a different table reference. As a rule of thumb, notebooks will easily support interactive analysis of tables up to \~1GB; if your table is larger, try reducing it first by creating a [transform](https://docs.redivis.com/guides/analyze-data-in-a-workflow/reshape-data-in-transforms), or make sure to familiarize yourself with the tools for working with larger tables in the notebook's programming language.

{% tabs %}
{% tab title="Python" %}

```python
import redivis

# The source table of this notebook can always be referenced as "_source_"
table = redivis.table("_source_")

# Load table as a pandas dataframe. 
# Consult the documentation for more load options.
df = table.to_pandas_dataframe()

# We can also reference any other table in this workflow by name.
df2 = redivis.table("my_other_table").to_pandas_dataframe()

print(df)
print(df2)
```

[*See more examples in the Python notebooks reference.*](https://docs.redivis.com/reference/workflows/notebooks/python-notebooks)
{% endtab %}

{% tab title="R" %}

```r
# The source table of this notebook can always be referenced as "_source_"
redivis_table <- redivis$table("_source_")

# Load table as a tidyverse tibble. 
# Consult the documentation for more load options.
df <- redivis_table$to_tibble()

# We can also reference any other table in this workflow by name.
df2 <- redivis$table("my_other_table")$to_tibble()

print(df)
print(df2)
```

[*See more examples in the R notebooks reference.*](https://docs.redivis.com/reference/workflows/notebooks/r-notebooks)
{% endtab %}

{% tab title="Stata" %}

```python
# In order to load data into Stata, we first have to bring it into Python.
# This code loads the "_source_" table in the python variable `df`
# We can then pass this variable as our stata dataset.

import redivis

# The source table of this notebook can always be referenced as "_source_"
# Reference any other table in this workflow by name.
table = redivis.table("_source_")

df = table.to_pandas_dataframe(dtype_backend="numpy")
```

```stata
%%stata -d df -force
/*
# Use the %%stata magic to load our dataframe, specified by the -d parameter
# The -force flag replaces the the current working dataset in Stata

# The rest is just Stata code!
*/

describe
```

[*See more examples in the Stata notebooks reference.*](https://docs.redivis.com/reference/workflows/notebooks/stata-notebooks)
{% endtab %}

{% tab title="SAS" %}

```python
import saspy
sas = saspy.SASsession(results='HTML')

# We first load the table via python, and then pass the dataframe into SAS
df = redivis.table("_source_").to_pandas_dataframe(dtype_backend="numpy")

sas_data = sas.df2sd(df, '_df') # second argument is the name of the table in SAS
sas_data.heatmap('msrp', 'horsepower')
```

[*See more examples in the SAS notebooks reference.*](https://docs.redivis.com/reference/workflows/notebooks/sas-notebooks)
{% endtab %}
{% endtabs %}

### Referencing files

Any files with unstructured data stored in Redivis tables can be referenced by their globally unique `file_id`. You can also reference these file\_id's in any derivative tables, allowing you to query and download specific subsets of files.

When working with large files, you'll want to consider saving the files to disk and/or working with the streaming interfaces to reduce memory overhead and improve performance.

{% tabs %}
{% tab title="Python" %}

```python
import redivis
from io import TextIOWrapper
from PIL import Image

# See https://redivis.com/datasets/yz1s-d09009dbb/files for example data
table = redivis.table("demo.example_data_files:yz1s:v1_3.example_file_types:4c10")
text_file = table.file("pandas_core.py")
image_file = table.file("bogota.tiff"")

## Read file contents
str = text_file.read(as_text=True)
bytes = image_file.read()

## Open the file, as if it was on the filesystem
with file.open("rb") as f:
  f.read(100) # read 100 bytes

with file.open() as f:
  f.readline() # read first line
  
# Tools that integrate with fsspec can open Redivis URIs:
pystac.Catalog.from_file("redivis://table_ref/stac/catalog.json")
  
Image.open(table.file("bogota.tiff")) # PIL will automatically call open() on the file
  
## Download the file  
image_file.download("./path") # will be downloaded as ./path/bogota.tiff
text_file.download("./path/renamed.txt") # will be downloaded as ./path/renamed.txt
```

[*See more examples in the Python notebooks reference.*](https://docs.redivis.com/reference/workflows/notebooks/python-notebooks#working-with-non-tabular-files)
{% endtab %}

{% tab title="R" %}

```r
# See https://redivis.com/datasets/yz1s-d09009dbb/files for example data

t <- redivis$table("demo.example_data_files:yz1s:v1_3.example_file_types:4c10")

text_file <- t$file("pandas_core.py")
con <- text_file$open()
readLines(con)

binary_file <- t$file("bogota.tiff")
con <- binary_file$open("rb")
readBin(con)

file_contents <- text_file$read(as_text=TRUE) # Read all contents directly to memory

binary_file$download() # download to current working directory

# You can also use R's native open()
con <- open(redivis$table("table_ref")$file("filename"), "rb")
```

[*See more examples in the R notebooks reference.*](https://docs.redivis.com/reference/workflows/notebooks/r-notebooks#working-with-non-tabular-files)
{% endtab %}
{% endtabs %}

## 6. Analyze data

At this point, you have all the tools you need to work with your data in your chosen language. The Python, R, Stata, and SAS ecosystems contain myriad tools and libraries for performing sophisticated data analysis and visualization.&#x20;

The notebook interface is based off of [Jupyter notebooks](https://jupyter.org/), and has similar capabilities. You can also export a read-only copy of your notebook as an .ipynb, PDF, or HTML file.&#x20;

![](https://1672950126-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-LVodLwUXgJUGcm5Cvso%2Fuploads%2FYAdVetAInYeKjmpKkOiG%2FScreenshot%202024-12-09%20at%207.56.58%E2%80%AFPM_out.png?alt=media\&token=c6195423-eb43-479a-b5ee-3d583e043a3c)

*Learn more in the* [*Notebooks*](https://docs.redivis.com/reference/workflows/notebooks) *reference section.*

## 7. Create an output table

Notebooks can produce an output table, which you can sanity check and further analyze in your workflow by including in other notebooks or exporting to other systems.&#x20;

{% tabs %}
{% tab title="Python" %}

```python
# Read table into a pandas dataframe
df = redivis.table('_source_').to_pandas_dataframe()

# Perform various data manipulation actions
df2 = df.apply(some_processing_fn)

# Create an output table with the contents of this dataframe
redivis.current_notebook().create_output_table(df2)
```

[*See more examples in the Python notebooks reference.*](https://docs.redivis.com/reference/workflows/notebooks/python-notebooks#creating-output-tables)
{% endtab %}

{% tab title="R" %}

```r
# Read table into a tibble
tbl = redivis$table('_source_')$to_tibble()

# Perform various data manipulation actions
tbl2 = tbl %>% mutate(...)

# Create an output table with the contents of this dataframe
redivis$current_notebook()$create_output_table(tbl2)
```

[*See more examples in the R notebooks reference.*](https://docs.redivis.com/reference/workflows/notebooks/r-notebooks#creating-output-tables)
{% endtab %}

{% tab title="Stata" %}

```stata
%%stata -doutd df2
/*
  Once this cell executes, the current dataset will be pushed 
  to the python variable df2
*/
rename v* newv*
```

```python
# Via python, pass this dataframe to the output table
redivis.current_notebook().create_output_table(df2)
```

[*See more examples in the Stata notebooks reference.*](https://docs.redivis.com/reference/workflows/notebooks/stata-notebooks#creating-output-tables)
{% endtab %}

{% tab title="SAS" %}

```python
# Convert a SAS table to a pandas dataframe
df = sas_table.to_df()

# Create an output table with the contents of this dataframe
redivis.current_notebook().create_output_table(df)
```

[*See more examples in the SAS notebooks reference.*](https://docs.redivis.com/reference/workflows/notebooks/sas-notebooks#creating-output-tables)
{% endtab %}
{% endtabs %}

## Next steps

#### Share and collaborate

All Redivis notebooks support real-time collaboration, allowing multiple editors to edit and run cells in a running notebook. When another editor is active in a notebook, you will see a colored cursor associated with them (much like a Google Doc).&#x20;

[Share your workflow](https://docs.redivis.com/reference/workflows/overview#managing-a-project) to work with collaborators in real time, and make it public so that others can fork off of and build upon your work.

#### Cite datasets in your publications

If the work you're doing leads to a publication, make sure to reference the dataset pages from datasets you've used for information from the data administrators on [how to correctly cite it](https://docs.redivis.com/redivis-for-open-science/citations).
