# Table

## *class* <mark style="color:purple;">Table</mark>

Tables are the fundamental data-containing entity in Redivis. Tables belong to either a dataset or workflow, and are made up of rows and variables (columns). Various methods allow you to read table data, as well as to create / update / delete tables belonging to an unreleased version of a dataset.

Certain tables may be [file index tables](https://docs.redivis.com/reference/datasets/files#folders-and-index-tables), which represent a collection of non-tabular files, where each row corresponds to a file. There are additional methods available on file index tables that allow for you to interface with these files.

## Constructors

<table data-header-hidden><thead><tr><th width="319">Method</th><th>Description</th></tr></thead><tbody><tr><td><a href="redivis/redivis.table"><strong><code>redivis.table</code></strong></a>(name)</td><td>Return a Table within the <a href="..#environment-variables">current default scope</a> (either a dataset or workflow). In a Redivis notebook, the default scope will always be the notebook's workflow. If no default scope is specified, the table_reference must be fully qualified (see below).<br><br><code>table_reference</code> is a string that identifies the unique table. In some cases this may be the table name, though in others you'll want to include additional information to identify the table and to ensure reproducibility if the table name changes. Consult the <a href="../../../referencing-resources">referencing resources</a> documentation to learn more. <br><br>If you are operating within a Redivis notebook, you can specify <code>"_source_"</code> as the table reference to automatically refer to the notebook's source table.</td></tr><tr><td><a href="dataset/dataset.table"><strong><code>Dataset.table</code></strong></a>(name)</td><td>Return a Table within a specific <a href="dataset">dataset</a>. The table_reference does not need to be fully qualified, since the table lookup is already scoped to a dataset.<br><br>Consult the <a href="../../../referencing-resources">referencing resources</a> documentation to learn more. </td></tr><tr><td><a href="workflow/workflow.table"><strong><code>Workflow.table</code></strong></a>(name)</td><td>Return a Table within a specific <a href="workflow">workflow</a>. The table_reference does not need to be fully qualified, since the table lookup is already scoped to a workflow.</td></tr><tr><td><a href="dataset/dataset.list_tables"><strong><code>Dataset.list_tables</code></strong></a>()</td><td>Returns a list of Tables within a dataset</td></tr><tr><td><a href="workflow/workflow.list_tables"><strong><code>Workflow.list_tables</code></strong></a>()</td><td>Returns a list of Tables within a workflow</td></tr></tbody></table>

## Examples

{% tabs %}
{% tab title="Basics" %}

```python
table = redivis.table("Demo.iris_species.Iris") # owner.dataset|worfklow.table

table.exists() # -> True
table.get() # table.properties is now populated with the table resource definition

table.variable("SepalLengthCm") # -> Returns a variable reference
table.to_pandas_dataframe()     # -> Returns a pandas dataframe for the table
table.file("filename.png")      # -> Return a reference to a file in a file index table
```

{% endtab %}

{% tab title="Read tabular data" %}

```python
# Can also do: redivis.table("Demo.iris_species.iris")
dataset = redivis.organization("Demo").dataset("iris_species")
table = dataset.table("Iris")

table.to_pandas_dataframe()
# 	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
# 0	33	5.2	        4.1	        1.5	        0.1	        Iris-setosa
# ...

# Other methods to read data:
# table.to_arrow_batch_iterator()
# table.to_arrow_dataset()
# table.to_arrow_table()
# table.to_geopandas_dataframe()
# table.to_dask_dataframe()
# table.to_polars_lazyframe()
```

{% endtab %}

{% tab title="Read files" %}

```python
import redivis
from io import TextIOWrapper
from PIL import Image

# See https://redivis.com/datasets/yz1s-d09009dbb/files for example data
table = redivis.table("demo.example_data_files:yz1s:v1_3.example_file_types:4c10")
text_file = table.file("pandas_core.py")
image_file = table.file("bogota.tiff"")

## Read file contents
str = text_file.read(as_text=True)
bytes = image_file.read()

## Open the file, as if it was on the filesystem
with file.open() as f:
  f.read(100) # read 100 bytes

with TextIOWrapper(file.open()) as f:
  f.readline() # read first line
  
Image.open(table.file("bogota.tiff")) # PIL will automatically call open() on the file
  
## Download the file  
image_file.download("./path") # will be downloaded as ./path/bogota.tiff
text_file.download("./path/renamed.txt") # will be downloaded as ./path/renamed.txt
```

{% endtab %}

{% tab title="Create a table" %}

<pre class="language-python"><code class="lang-python">dataset = redivis.user("user_or_organization_name").dataset("my dataset")

# Tables can only be created on an unreleased version. 
# If necessary, create a new version:
<strong># dataset = dataset.create_next_version()
</strong>
dataset.table("my_new_table").create(description="some description")

# Learn more about uploading data in the Upload documentation
upload = table.upload('data.csv').create('/path/to/file')
</code></pre>

{% endtab %}

{% tab title="List variables" %}

```python
dataset = redivis.organization("Demo").dataset("iris_species")
table = dataset.table("Iris")

variables = table.list_variables()

for variable in variables:
    print(variable.properties) 
```

{% endtab %}
{% endtabs %}

## Attributes

<table data-header-hidden><thead><tr><th>Name</th><th>Description</th></tr></thead><tbody><tr><td><strong><code>dataset</code></strong></td><td>A reference to the <a href="dataset">Dataset</a> instance that constructed this table. Will be <code>None</code> if the table belongs to a workflow.</td></tr><tr><td><strong><code>workflow</code></strong></td><td>A reference to the <a href="workflow">Workflow</a> instance that constructed this table. Will be <code>None</code> if the table belongs to a dataset.</td></tr><tr><td><strong><code>properties</code></strong></td><td>A dict containing the <a href="../../../../resource-definitions/table#base-definition">API resource representation of the table</a>. This will only be populated after certain methods are called, particularly the <code>get</code> method, and will otherwise be <code>None</code>.</td></tr><tr><td><strong><code>qualified_reference</code></strong></td><td><p></p><p>The <a href="../../../referencing-resources">fully qualified reference</a> to this table, for use (e.g.) in a SQL query. </p><p>For example,</p><pre class="language-http"><code class="lang-http">demo.reddit:prpw:v1_0.posts:7q4m
</code></pre></td></tr><tr><td><strong><code>scoped_reference</code></strong></td><td>The canonical reference for the table, without any qualifiers. E.g., <code>posts:7q4m</code></td></tr></tbody></table>

## Methods

<table data-header-hidden><thead><tr><th width="441"></th><th></th></tr></thead><tbody><tr><td><strong>Reading data and metadata</strong></td><td></td></tr><tr><td><a href="table/table.download"><strong><code>Table.download</code></strong></a>([path, *, format, ...])</td><td>Export a table in a particular format and download it to disk.</td></tr><tr><td><a href="table/table.exists"><strong><code>Table.exists</code></strong></a>()</td><td>Check whether the table exists</td></tr><tr><td><a href="table/table.file"><strong><code>Table.file</code></strong></a>(name, *, [...])</td><td>Reference a <a href="file">File</a> within a file index table.</td></tr><tr><td><a href="table/table.get"><strong><code>Table.get</code></strong></a>()</td><td>Fetch table metadata. Once called, the <code>properties</code> attribute on the table will be fully populated.</td></tr><tr><td><a href="table/table.list_files"><strong><code>Table.list_files</code></strong></a>([max_results, *, ...])</td><td>Return a list of <a href="file">File</a> instances in a file index table.</td></tr><tr><td><a href="table/table.list_variables"><strong><code>Table.list_variables</code></strong></a>([max_results])</td><td>Return a list of <a href="variable">Variable</a> instances associated with this table.</td></tr><tr><td><a href="table/table.to_arrow_batch_iterator"><strong><code>Table.to_arrow_batch_iterator</code></strong></a>([...])</td><td>Return an iterator that yields <a href="https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html">pyarrow.RecordBatches</a>, for processing the table's data in a memory-efficient streaming manner.</td></tr><tr><td><a href="table/table.to_arrow_dataset"><strong><code>Table.to_arrow_dataset</code></strong></a>([max_results, ...])</td><td>Return a <a href="https://arrow.apache.org/docs/python/generated/pyarrow.dataset.Dataset.html">pyarrow.dataset.Dataset</a> for the table. Data is backed by disk, allowing for larger-than-memory analysis.</td></tr><tr><td><a href="table/table.to_arrow_table"><strong><code>Table.to_arrow_table</code></strong></a>([max_results, ...])</td><td>Return a <a href="https://arrow.apache.org/docs/python/generated/pyarrow.Table.html">pyarrow.Table</a> with the table's data.</td></tr><tr><td><a href="table/table.to_geopandas_dataframe"><strong><code>Table.to_geopandas_dataframe</code></strong></a>([...])</td><td>Return a <a href="https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.html">geopandas.GeoDataframe</a>. For working with tables that contain a geography variable.</td></tr><tr><td><a href="table/table.to_dask_dataframe"><strong><code>Table.to_dask_dataframe</code></strong></a>([max_results, ...])</td><td>Return a <a href="https://docs.dask.org/en/stable/dataframe.html">dask.DataFrame</a>. Data is backed by disk, allowing for larger-than-memory analysis.</td></tr><tr><td><a href="table/table.to_directory"><strong><code>Table.to_directory</code></strong></a>([max_results, ...])</td><td>Create a virtual <a href="directory">Directory</a> based off a file index table.</td></tr><tr><td><a href="table/table.to_pandas_dataframe"><strong><code>Table.to_pandas_dataframe</code></strong></a>([max_results, ...])</td><td>Return a <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html">pandas.DataFrame</a> with the table's data.</td></tr><tr><td><a href="table/table.to_polars_lazyframe"><strong><code>Table.to_polars_lazyframe</code></strong></a>([max_results, ...])</td><td>Return a <a href="https://pola-rs.github.io/polars/py-polars/html/reference/lazyframe/index.html">polars.LazyFrame</a>. Data is backed by disk, allowing for larger-than-memory analysis.</td></tr><tr><td><a href="table/table.to_read_streams"><strong><code>Table.to_read_streams</code></strong></a>([target_count, ...])</td><td>Return a list of <a href="readstream">ReadStreams</a> for parallel processing.</td></tr><tr><td><a href="table/table.to_sas"><strong><code>Table.to_sas</code></strong></a>(name, [max_results, ...])</td><td>Load data into the current SAS session. Typically only used within a <a href="https://app.gitbook.com/s/-LVodLwUXgJUGcm5Cvso/reference/workflows/notebooks/sas-notebooks">SAS Redivis notebook</a>.</td></tr><tr><td><a href="table/table.to_polars_lazyframe"><strong><code>Table.to_stata</code></strong></a>([max_results, ...])</td><td>Load data into the current Stata session. Typically only used within a <a href="https://app.gitbook.com/s/-LVodLwUXgJUGcm5Cvso/reference/workflows/notebooks/stata-notebooks">Stata Redivis notebook</a>.</td></tr><tr><td><a href="table/table.update_variables"><strong><code>Table.update_variables</code></strong></a>(variables)</td><td>Batch update variable metadata on a table.</td></tr><tr><td><a href="table/table.variable"><strong><code>Table.variable</code></strong></a>(name)</td><td>Reference a <a href="variable">Variable</a> within the table.</td></tr><tr><td></td><td></td></tr><tr><td><strong>Uploading and modifying data</strong></td><td></td></tr><tr><td><a href="table/table.add_files"><strong><code>Table.add_files</code></strong></a>(*, [files, directory])</td><td>Upload non-tabular files to an unreleased file index table.</td></tr><tr><td><a href="table/table.create"><strong><code>Table.create</code></strong></a>([description, ...])</td><td>Create a table within a dataset if it doesn't already exist. Table must belong to an unreleased version of the dataset.</td></tr><tr><td><a href="table/table.delete"><strong><code>Table.delete</code></strong></a>()</td><td>Delete a table belonging to an unreleased version of a dataset.</td></tr><tr><td><a href="table/table.list_uploads"><strong><code>Table.list_uploads</code></strong></a>([max_results])</td><td>Return a list of uploads on a table</td></tr><tr><td><a href="table/table.update"><strong><code>Table.update</code></strong></a>()</td><td>Update properties on the table (name, description).</td></tr><tr><td><a href="table/table.upload"><strong><code>Table.upload</code></strong></a>()</td><td>Create a reference to an <a href="upload">Upload</a> on the table, which can subsequently be used to upload tabular data.</td></tr></tbody></table>
