Data

Integrating data into the system is key to making your machine learning models work effectively. This process involves setting up a data source that can be automated once configured. All operations occur at the dataset level, so understanding how to create and manage datasets and data sources is crucial.

Overview of the Process

Here’s an outline of the steps involved in setting up a data source:

Create a Dataset

This is the container where your data sources will live. Think of it as a project-level container that can hold multiple data sources.

This image shows how to create a new dataset in the system. Datasets allow you to organize your data sources for different business cases.

Add a Data Source

Within the dataset, create multiple data sources to load the necessary data.

This image displays the various data source connection options such as CSV, JSON, Google BigQuery, and Microsoft SQL Server.

Select the Source

Choose the data source you want to integrate from the list of supported sources:

CSV
JSON
Google BigQuery
Google Cloud Storage
Microsoft SQL Server
SFTP
Agillic
Active Campaign

Enter Credentials

Input the credentials required to connect to the data source. This might involve entering project IDs, API keys, or authentication tokens, depending on the source.

This image shows examples of credential input for Google BigQuery and Microsoft SQL Server.

Preview Data

Once connected, preview the data to ensure the source is correctly connected and the data format is as expected.

In this image, you can see a preview of the data fields (e.g., ContactID, Gender, Birthdate) before confirming the connection.

Create Data Mapping

After previewing, map the data fields to specific entities in the allyy data structure. Mapping allows the system to understand the relationship between fields in your data and allyy’s internal structure (Contacts, Responses, Offers, etc.).

Streaming vs Batch:

Batch Data

Synchronization happens manually or on a schedule. To pull in data, click the synchronize button or schedule it via a workflow.

Streaming Data

Data flows in real time. Use the start/stop streaming buttons to manage continuous data flow. Streaming data cannot be scheduled.

Conclusion

Setting up data sources allows for seamless integration of external data into the system. By following this process, you can configure a data source once and reuse it indefinitely, either by manually pulling in data or using real-time streaming. Ensuring that the data is correctly mapped to the Allyy data structure is crucial for leveraging it effectively in models and predictions.

Guidelines

Processes

Dashboards

Integrations

Overview of the Process

Streaming vs Batch:

Batch Data

Streaming Data

Conclusion

Guidelines

Processes

Dashboards

Integrations

​Overview of the Process

​Streaming vs Batch:

Batch Data

Streaming Data

​Conclusion

Overview of the Process

Streaming vs Batch:

Conclusion