Modify a Data Schema#

Whether you hand-crafted your schema or Apperate inferred your schema from a data sampling, it’s good to test your schema and fine-tune it to fit your needs. Here are some possible reasons to update a schema:

  • Add property descriptions

  • Add a new property

  • Remove a deprecated property

  • Change a property type

  • Index a property

  • SmartLink a property

  • Require values for a property

  • Forbid null values for a property

  • Rename your dataset

Let’s tour schema editing.

Warning

Changing the schema of a parent dataset can break or alter its associated views.

Important

If you update your schema, it’s best to update it early before you add lots of data to it.

Open the Schema Editor#

From your dataset’s Overview page, click Edit. The schema editor appears.

Here are the editor sections:

Dataset ID: This field allows you to rename the dataset.

Select action for existing data: This drop-down menu provides options for updating the dataset’s existing data with regards to any schema modifications you apply. The Specify how to handle existing data step below explains the options and how they relate to various schema changes.

Properties: This table shows your dataset columns, their types and constraints, indexes, and descriptions. For details, see Dataset Properties.

Opt-in to IEX Cloud’s metadata graph: This section enables you to SmartLink a primary (or secondary) index property to the financial metadata graph. See Understanding Datasets to learn more about SmartLinks.

Update Your Properties#

In the Properties table, make any property modifications you want, delete unneeded properties, and or add any new properties. You can change a property’s type, constraints, and description, and designate the property as one of the dataset’s three indexes.

See also

Dataset Properties details the properties options.

Note

You can’t change a property’s name. Instead, add a new property as a replacement and remove the existing property.

Specify How to Handle Existing Data#

Before you update the schema, you must inform Apperate how to handle the existing data. Here are the options:

  • Leave existing data as is: Preserves the existing data.

  • Delete all existing data: Removes ALL the existing data. Before doing this, MAKE SURE to back up any data you want to preserve.

  • Update existing data: Immediately modifies the data to adhere to the schema.

  • Reingest data using a new schema: Reloads the existing data, validating it with the new schema and replacing the existing data, indexes, and metadata graph mappings (SmartLinks).

Important

Reingestion is only for datasets with 1,000,000 records or less. Reingestion is only intended at the beginning of a dataset’s lifetime.

Data Handling Best Practices#

Here are some best practices to consider for existing data with regards to specific schema modifications.

Modification

Considerations

Specify a new/different index

Select Reingest data using a new schema.

Specify a new/different SmartLink

Select Reingest data using a new schema.

Allow/forbid null for a property

If you want to update existing data, select Reingest data using a new schema.

If you are forbidding null values for the property, existing records that have null values are dropped. See Troubleshooting Schema Update Issues below for guidance on handling these records.

Require values for a property

If you want to update existing data, select Update existing data. Existing records missing the property are dropped. See Troubleshooting Schema Update Issues below for guidance on handling these records.

Change a property type

If you want to update existing data, select Update existing data.

Supported conversions:
- integer → number
- date → string
- string → date

Add a plain (unindexed, unmapped) property

If you want to update existing data, select Update existing data to add the new property with the type’s default value.

The Reingest data using a new schema existing data action is unnecessary.

Remove a plain (unindexed, unmapped) property

No particular best practices.

Select the action that best fits your scenario.

Apply Your Changes#

When you’re done modifying the schema and selecting your existing data action, click Update Dataset. Apperate applies the schema modifications to your data and your dataset Overview appears.

If you have schema update issues, see how you can troubleshoot them next. Otherwise, enjoy your updated schema!

Troubleshooting Any Update Issues#

If data reingestion fails for a schema update, the invalid records are excluded from ingestion. Here’s how to troubleshoot the invalid records.

  1. Go to the Ingestion Logs.

  2. Select the Ingestion Jobs tab.

  3. Check the job’s Invalid Records column. The document icon in the Invalid Records column links to the ingestion job’s invalid record list.

    invalid-records-1.png

  4. Click the Invalid Records icon to view or download the invalid records CSV file.

  5. Copy the record data and fix it to make it valid.

  6. Add the record to the dataset using one of these ways:

Terrific! You’re becoming a schema management expert.