Comment on page
In order to create new tables, or update existing tables with new data, you will need to ingest a valid tabular data file from a supported data source. Redivis supports numerous tabular data formats, with robust error handling and type inference. You can also perform scripted imports via the programmatic interface.
Tables on Redivis can be made up of one or more tabular file uploads. The multi-upload use case applies when data with the same general schema is broken up across files. For example, a dataset where there is a separate file for each year. In general, it's best to combine such files into a single table, as it is easier for researchers to query a single "tall" table than multiple tables (and Redivis's high-performance query engine keeps these queries quick, even at terabyte scale).
Redivis will try to detect the file type based on the file extension, though you can manually specify the type as needed.
Uploading compressed (gzipped) files:
Generally, you should upload uncompressed data files to Redivis, as uncompressed files can be read in parallel and thus upload substantially faster.
If you prefer to store your source data in a compressed format, Avro, Parquet, and ORC are the preferred data formats, as these support parallelized compressed data ingestion at the row level.
Redivis will decompress text-delimited files, though the data ingest process may be substantially slower. If your file is compressed, it must have the .gz file extension if you're uploading locally (e.g.,
my_data.csv.gz) or have it's header set to
Content-Encoding: gzipif served from a URL or cloud storage location.
A text-delimited file is a file that uses a specific character (the delimiter) to separate columns, with newlines separating rows.
- Must be UTF-8 encoded (ASCII is a valid subset of UTF-8)
- Quote characters in cells must be properly escaped. For example, if a cell contains the content:
Jane said, "Why hasn't this been figured out by now?"it must be encoded as:
"Jane said, ""Why hasn't this been figured out by now?"""
- The quote character must be used to escape the quote character. For example, the sequence
\"is not valid for an escaped quote; it must be
- Empty strings will be converted to
Specifies whether the first row is a header containing the variable names. This will cause data to be read beginning on the 2nd row. If you don't provide a header in your file, variables will be automatically created as
var1, var2, var3, etc...
By default, an upload will fail if an invalid record is encountered. This includes a record that has a mismatched number of columns, or is otherwise not parsable. If this box is checked, the number of skipped records will be displayed on each upload once it has been imported.
Whether to allow rows that contain fewer or more columns than the first row of your file. It is recommended to leave this option unchecked, as jagged rows are generally a sign of a parsing error that should be remedied by changing other options or fixing the file.
Whether newlines exist within specific data cells (e.g., paragraphs of text). If set to Auto, Redivis will determine the value based on analysis of the beginning of the file.
It is best to only set this value to "Yes" if you know your data contain line breaks, as this will slow down the import and may cause incorrect error reporting.
The delimiter will be auto-inferred based upon an analysis of the file being uploaded. In rare cases, this inference may fail; you can specify the delimiter to override this inference.
Specify the character used to escape delimiters. Generally
", though some files may not have a quote character (in which case, they must not include the delimiter within any cells).
Cells containing a quote character must have that character escaped.
Typically, the escape sequence character is the same as the quote character, but some files may use a different value, such as a backslash (\).
Variable names are automatically inferred from the source data. They can only contain alphanumeric or underscore characters, and must start with a letter or underscore. Any invalid characters will be replaced with an underscore (
Variable names must be unique within the table. If the same variable is found more than once in any given file, it will automatically have a counter appended to it (e.g., "variable_2").
The max number of characters for a variable name is 60. Any names with more characters will be truncated.
Variable type inference
Please note the following rules:
- If all values of a variable are null, its type will be
- Numeric values with leading zeros will be stored as
stringin order to preserve the leading zeros (e.g.,
- Data stored with decimal values will be stored as a
float, even if that value is a valid integer (e.g.,
- Temporal data types should be formatted using the canonical types below. Redivis will attempt to parse other common date(time) formats, though this will only be successful when the format is unambiguous and internally consistent.
You can create up to 500 uploads per table, per version. Files will automatically be appended to each other based on their variable names (case insensitive), with the goal of creating one continuous table with a consistent schema.
If a variable is missing in some of the files you uploaded, the values for the missing variable will be set to null for all rows in the upload.
Conflicting variable types
A file may fail to import due to several reasons; in each case, Redivis endeavors to provide a clear error message for you to fix the error.
In order to view full error information, including a snapshot of where the error occurred in your source file (when applicable), click on the failed upload in the upload manager
When transferring a file from your computer (or more rarely, from other import sources), there may be an interruption to the internet connection that prevents the file from being fully uploaded. In these cases, you should simply try uploading the file again.
Data invalidity is most common when uploading text-delimited files, though it can happen with any file format. While some data invalidity errors may require further investigation outside of Redivis, others may be due to incorrect options provided in the file upload process. When possible, Redivis will display ~1000 characters that are near the error in the source file, allowing you to identify the potential source of failure.
The Redivis data import tool has been built to gracefully handle a wide range of data formats and encodings. However, errors can still occur if the source data is "invalid"; some common problems (and their solutions) are outlined below.
Some SQL databases and tutorials will generate invalid CSV escape sequences by default. Specifically:
val1,val2,"string with \"quotes\" inside"
val1,val2,"string with ""quotes"" inside"
The "proper" escape sequence is a doubling of the quote character. For MySQL, this would look like
INTO OUTFILE '/.../out.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"', ESCAPED BY '"'
LINES TERMINATED BY '\n';
If you only have access to the invalid file generated by a previous database dump, you can specify a custom Quote Character of
\in the advanced import options, and Redivis will reformat the file as part of the ingest process (Redivis will also auto-detect this custom escape sequence in many scenarios). Using a custom escape sequence may cause data import processing to take a bit longer.
If your data has paragraphs of text within a particular data cell, and the "Has quoted newlines" advanced option isn't set, the data import may fail (see above screenshot for example). Redivis will automatically set this option to true if it identifies a quoted newline in the top ~1000 records of the file, but if quoted newlines don't occur until later, you'll need to make sure to set this option manually for the import to succeed.
While rare, it is always possible that data transfers will be interrupted by the vagaries of networking. If this happens, we recommend simply retrying your upload. If the problem persists, please reach out to [email protected].
Last modified 1mo ago