# Dataset

## *class* <mark style="color:purple;">Dataset</mark>

Datasets on Redivis are the entity where data is stored. Datasets are made up of tables, non-tabular files, and various metadata. Datasets can be owned by a user or organization, and are version controlled.

## Constructors

<table data-header-hidden><thead><tr><th width="424">Method</th><th>Description</th></tr></thead><tbody><tr><td><a href="organization/organization.dataset"><strong><code>Organization.dataset</code></strong></a>(dataset_reference)</td><td>Construct a new dataset instance that references a dataset owned by an organization.</td></tr><tr><td><a href="organization/organization.list_datasets"><strong><code>Organization.list_datasets</code></strong></a>([max_results])</td><td>Returns a list of Datasets owned by an organization.</td></tr><tr><td><a href="user/user.dataset"><strong><code>User.dataset</code></strong></a>(dataset_reference)</td><td>Construct a new dataset instance that references a dataset owned by a user.</td></tr><tr><td><a href="user/user.list_datasets"><strong><code>User.list_datasets</code></strong></a>([max_results])</td><td>Returns a list of Datasets owned by a user.</td></tr></tbody></table>

## Examples

{% tabs %}
{% tab title="Basics" %}

```python
dataset = redivis.organization("Demo").dataset("US Fires")

# Will throw an error if the dataset doesn't exists
# Can first call dataset.exists() to check for existence
dataset.get()

print(dataset.properties)
```

{% endtab %}

{% tab title="Create" %}

```python
dataset = redivis.user("my_username").dataset("My dataset")

dataset.create(public_access_level="overview")

print(dataset.properties)
```

{% endtab %}

{% tab title="New version" %}

```python
dataset = redivis.user("my_username").dataset("My dataset")

dataset = dataset.create_next_version()

# We can upload new data to existing tables once we have a "next" version
with open("data.csv", "rb") as file:
    dataset.table("My table").upload("data.csv").upload_file(file)
    
dataset.release()

```

{% endtab %}

{% tab title="Query" %}

```python
dataset = redivis.organization("Demo").dataset("CMS 2014 Medicare Data")

# The home_health_agencies table is assumed to be within the dataset,
#   since it isn't otherwise qualified
query = dataset.query("""
    SELECT * FROM home_health_agencies
    WHERE state = 'CA'
""")

print(query.to_dataframe())
```

{% endtab %}
{% endtabs %}

## Attributes

| **`organization`**        | A reference to the [Organization](https://docs.redivis.com/api/client-libraries/redivis-python/reference/organization) instance that constructed this dataset. Will be `None` if the dataset belongs to a user.                                                                                                                                                                                                                                                                                                                                                                                                           |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`properties`**          | <p>A dict containing the <a href="../../../resource-definitions/dataset">API resource representation of the dataset</a>. This will be fully populated after calling <a href="dataset/dataset.get">get()</a>, <a href="dataset/dataset.create_next_version">create\_next\_version()</a>, and <a href="dataset/dataset.release">release()</a>, otherwise will be <code>None</code>. <br><br>This will also be partially populated for datasets returned via the <a href="organization/organization.list_datasets">Organization.list\_datasets</a> and <a href="user/user.list_datasets">User.list\_datasets</a> methods</p> |
| **`qualified_reference`** | The [fully qualified reference](https://docs.redivis.com/api/referencing-resources) for the dataset, which can be used in SQL queries or the REST API. E.g., `demo.ghcn_daily_weather_data:v1_1:7br5`                                                                                                                                                                                                                                                                                                                                                                                                                     |
| **`scoped_reference`**    | The canonical reference for the dataset, without the username qualifier. E.g.,: `ghcn_daily_weather_data:v1_1:7br5`                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| **`user`**                | A reference to the [User](https://docs.redivis.com/api/client-libraries/redivis-python/reference/user) instance that constructed this dataset. Will be `None` if the dataset belongs to an organization.                                                                                                                                                                                                                                                                                                                                                                                                                  |

## Methods

<table data-header-hidden><thead><tr><th width="450"></th><th></th></tr></thead><tbody><tr><td><a href="dataset/dataset.add_labels"><strong><code>Dataset.add_labels</code></strong></a>(labels)</td><td>Add labels to a dataset.</td></tr><tr><td><a href="dataset/dataset.create"><strong><code>Dataset.create</code></strong></a>([*, public_access_level, ...])</td><td>Create a new dataset.</td></tr><tr><td><a href="#create_next_version-if_not_exists-false"><strong><code>Dataset.create_next_version</code></strong></a>([*, if_not_exists])</td><td>Create a "next" (unreleased) version on the dataset. Data can only be uploaded to unreleased versions.</td></tr><tr><td><a href="dataset/dataset.delete"><strong><code>Dataset.delete</code></strong></a>()</td><td>Delete the dataset.</td></tr><tr><td><a href="dataset/dataset.exists"><strong><code>Dataset.exists</code></strong></a>()</td><td>Check whether the dataset exists.</td></tr><tr><td><a href="dataset/dataset.get"><strong><code>Dataset.get</code></strong></a>()</td><td>Get the dataset, populating the <code>properties</code> attribute on the current instance.</td></tr><tr><td><a href="dataset/dataset.list_tables"><strong><code>Dataset.list_tables</code></strong></a>([max_results])</td><td>List all tables in the dataset.</td></tr><tr><td><a href="dataset/dataset.list_versions"><strong><code>Dataset.list_versions</code></strong></a>([max_results])</td><td>List all versions for the dataset.</td></tr><tr><td><a href="dataset/dataset.next_version"><strong><code>Dataset.next_version</code></strong></a>()</td><td>Return a reference to the dataset at the version subsequent to the currently referenced version.</td></tr><tr><td><a href="dataset/dataset.previous_version"><strong><code>Dataset.previous_version</code></strong></a>()</td><td>Return a reference to the dataset at the version prior to the currently referenced version.</td></tr><tr><td><a href="dataset/dataset.query"><strong><code>Dataset.query</code></strong></a>(query_string)</td><td>Create a query scoped to the dataset.</td></tr><tr><td><a href="dataset/dataset.release"><strong><code>Dataset.release</code></strong></a>()</td><td>Release the <code>next</code> version of the dataset.</td></tr><tr><td><a href="dataset/dataset.remove_labels"><strong><code>Dataset.remove_labels</code></strong></a>()</td><td>Remove labels from a dataset.</td></tr><tr><td><a href="dataset/dataset.table"><strong><code>Dataset.table</code></strong></a>(table_reference)</td><td>Create a reference to a specific table within the dataset.</td></tr><tr><td><a href="dataset/dataset.unrelease"><strong><code>Dataset.unrelease</code></strong></a>()</td><td>Unrelease the current version of the dataset, moving it back to an unreleased, "next" version.</td></tr><tr><td><a href="dataset/dataset.table"><strong><code>Dataset.update</code></strong></a>([*, name, public_access_level, ...])</td><td>Update certain attributes on the dataset.</td></tr><tr><td><a href="dataset/dataset.update_variables"><strong><code>Dataset.update_variables</code></strong></a>(variables)</td><td>Batch update variable metadata across tables in a dataset.</td></tr><tr><td><a href="dataset/dataset.version"><strong><code>Dataset.version</code></strong></a>([tag])</td><td>Create a reference to a <a href="version">version instance</a> at a particular tag.</td></tr></tbody></table>
