Data uploads


In order to create new tables (or release new versions, you will need to upload your data from existing sources). Redivis supports data ingest from numerous sources across myriad data formats, as well as the ability to combine multiple uploads into a single table.
For a guided walkthrough of uploading data to a dataset, please see the Creating a dataset guide.

Supported file types

.csv, .tsv, .psv, .dsv, .txt, .tab, *
Any text-delimited file. Redivis will auto-infer the delimiter, or you can specify it manually. This is also the default format for files with missing file extensions.
See working with text-delimited files
Avro format
Compressed data blocks using the DEFLATE and Snappy codecs are supported.
Nested and repeated fields are not supported.
Parquet format
Nested and repeated fields are not supported.
Orc format
Nested and repeated fields are not supported.
Newline-delimited JSON
Nested and repeated fields are not supported.
.xls, .xlsx
Excel file
Only the first sheet will be ingested.
SAS data file
Default formats will be interpreted to the corresponding variable type, and variable labels will automatically be imported.
User defined formats (.sas7bcat) are not support.
Stata data file
Variable labels and value labels will automatically be imported.
SPSS data file
Variable labels and value labels will automatically be imported.
Google sheets
Sheets file stored in Google Drive
Only the first tab of data will be ingested.
Uploading compressed (gzipped) files:
Generally, you should upload uncompressed data files to Redivis, as uncompressed files can be read in parallel and thus upload substantially faster. If you prefer to store your source data in a compressed format, Avro, Parquet, and ORC are the preferred data formats, as these support parallelized compressed data ingestion at the row level.
Redivis will decompress text-delimited files, though the data ingest process may be substantially slower. If your file is compressed, it must have the .gz file extension if you're uploading locally (e.g., my_data.csv.gz) or have it's header set to Content-Encoding: gzip if served from a URL or cloud storage location.

Quotas & limits

  • Max dataset size: 15TB
  • Max row size: 100MB
  • Max file sizes
    • Avro: 5TB
    • Parquet: 5TB
    • ORC: 5TB
    • CSV: 5TB (4GB compressed)
    • NDJSON: 5TB (4GB compressed)
    • SAS(.sas7bdat): 500GB
    • Stata(.dta), SPSS(.sav): 25GB
    • Excel(xls, xlsx): 25GB
  • Max variables (columns): 9,990 * †
  • Max uploads per table, per version: 500 †
* The variable maximum applies across all versions of a table. If a variable exists in a previous version of the table, but is subsequently deleted, it will still count towards this variable maximum.
† Depending on the length of your variable names, as well as the typing of your variables, the actual limits for max variables and uploads may be lower. The query generated to select variables from each upload cannot exceed 256K characters, and the query generated to select across all uploads cannot exceed 12M characters. If either of these limits are reached, the upload will fail with an accompanying error message.

Working with text-delimited files

A text-delimited file is a file that uses a specific character (the delimiter) to separate columns, with newlines separating rows.

Delimited file requirements

  • Must be UTF-8 encoded (ASCII is a valid subset of UTF-8)
  • Quote characters in cells must be properly escaped. For example, if a cell contains the content: Jane said, "Why hasn't this been figured out by now?" it must be encoded as: "Jane said, ""Why hasn't this been figured out by now?"""
  • The quote character must be used to escape the quote character. For example, the sequence \" is not valid for an escaped quote; it must be ""
  • Empty strings will be converted to null values

Delimited file options

The delimiter will be auto-inferred based upon an analysis of the file being uploaded. In rare cases, this inference may fail; you can specify the delimiter to override this inference.
Quote character
Specify the character used to escape delimiters. Generally " , though some files may not have a quote character (in which case, they must not include the delimiter within any cells).
Has header row
Specifies whether the first row is a header containing the variable names. This will cause data to be read beginning on the 2nd row. If you don't provide a header in your file, variables will be automatically created as var1, var2, var3, etc...
Allow quoted newlines
This option is necessary if there are newline characters within particular cells in your data (e.g., multiple paragraphs in one cell). Checking this box unnecessarily will not cause any errors, though it will substantially slow down data ingest, and may cause inaccuracies in other error reporting.
Skip corrupted records
By default, an upload will fail if a corrupted record is encountered. This includes a record that has a mismatched number of columns, or is otherwise not parsable. If this box is checked, the number of skipped records will be displayed next to each file once it has been imported.

Variable names and types

Renaming variables
Variable names are automatically inferred from the source data, with invalid characters replaced with an underscore (_). If the same variable is found more than once in any given file, it will automatically have a counter appended to it (e.g., "varName2").
Variable type inference
All values of a variable must be compatible with its type. Redivis will automatically choose the most specific, valid type for a variable, with string being the default type.
Please note the following rules:
  • If all values of a variable are null, its type will be string
  • Numeric values with leading zeros will be stored as string in order to preserve the leading zeros (e.g., 000583 )
  • Data stored with decimal values will be stored as a float , even if that value is a valid integer (e.g., 1.0 ).
  • Temporal data types should be formatted using the canonical types below. Redivis will attempt to parse other common date(time) formats, though this will only be successful when the format is unambiguous and internally consistent.
    • Date: YYYY-[M]M-[D]D
    • DateTime: YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]
    • Time: [H]H:[M]M:[S]S[.DDDDDD]

Working with multiple uploads

You can create up to 500 uploads per table, per version. Files will automatically be appended to each other based on their variable names (case insensitive), with the goal of creating one continuous table with a consistent schema.
Conflicting variable types
If files have conflicting types across a given variable, the lowest-denominator type for that variable is chosen when the files are merged.

Import sources

By default, you may upload data from your local computer or a public URL. However, Redivis supports numerous integrations for data ingest across common sources.

Google Cloud Storage (GCS)

You may import any object that you have read access to in GCS by specifying a bucket name and path to that object. You may import multiple objects at once by providing a prefix followed by a wildcard character, e.g.: /my-bucket/my-folder/* or /my-bucket/my-folder/prefix* .

Amazon S3

You may import any object that you have read access to in GCS by specifying a bucket name and path to that object. You may import multiple objects at once by providing a prefix followed by a wildcard character, e.g.: /my-bucket/my-folder/* or /my-bucket/my-folder/prefix* .

Google Drive

You may import any file of valid format that you have stored within your Drive, including Google Sheets.

Google BigQuery

You may import any table that you have read access to in BigQuery. You must specify the table in the form project_name.dataset_id.table_id . To import multiple tables within a dataset, you may use wildcards. E.g., project_name.dataset_id.* or project_name.dataset_id.prefix* .
Please note that importing from table views is not currently supported.


Coming soon.


Coming soon.

Scripted imports

In addition to uploading data through the browser interface, you can leverage the redivis-python and redivis-js client libraries to automate data ingest and data release pipelines.

Batch uploads

import redivis
dataset = redivis.user("your-username").dataset("some dataset")
table = (
.table("Table name")
.create(description="Some description")
with open("path/to/file", "rb") as f:
table.create_upload(name="file.csv", type="delimited", data=f)
Coming soon

Streamed uploads

import redivis
table = redivis.user("user_name").dataset("dataset_name").table("table_name")
schema = [
{ "name": "string_var", "type": "string" },
{ "name": "int_var", "type": "integer" },
{ "name": "datetime_var", "type": "dateTime" }
rows = [
{ "string_var": "hello", "int_var": 1, "datetime_var": None },
{ "string_var": "world", "int_var": 2, "datetime_var": "2020-01-01T00:00:00.123" }
upload = table.upload("some_streamed_data")
# Only call create if the upload doesn't already exist
Consult the complete client library documentation for more details and additional examples:
Redivis API
Redivis API

Error handling

A file may fail to import due to several reasons; in each case, Redivis endeavors to provide a clear error message for you to fix the error.
In order to view full error information, including a snapshot of where the error occurred in your source file (when applicable), double click on the failed upload in the upload manager

Network issues

When transferring a file from your computer (or more rarely, from other import sources), there may be an interruption to the internet connection that prevents the file from being fully uploaded. In these cases, you should simply try uploading the file again.

Invalid or corrupted source data

Data invalidity is most common when uploading text-delimited files, though it can happen with any file format. While some data invalidity errors may require further investigation off of Redivis, others may be due to incorrect options provided in the file upload process. When possible, Redivis will display ~1000 characters that are near the error in the source file, allowing you to identify the potential source of failure.
For example, the following screenshot highlights content near the error, where we can see that a single cell contains multiple line break characters. By reimporting the file setting the "Allow quoted new lines" option to true, this problem will be resolved.
Last modified 2mo ago