Concepts¶

This page explains the terminology and concepts you will encounter while using Obelisk. We thoroughly recommend reading this page before you start interacting with the platform.

Datasets¶

Obelisk revolves around data: our mission is to safely and securely store your data and make it accessible for a wide-range of applications. A Dataset¹ is the key component for organizing this data.

Each Dataset holds data belonging to single logical entity: e.g. a research project, an application depending on Obelisk as a data layer, an open-data source being published via Obelisk, etc... Within the Dataset, individual bits of information is structured in separate time-series called Metrics. When storing data into Obelisk, the Dataset and Metric it belongs to are attached as immutable meta-data properties. Until they are removed, individual data records will always belong to a fixed Dataset and Metric.

Datasets are the unit of data isolation and are used to manage access control. The owner of a dataset can define access roles, invite other users to the Dataset, assign roles to existing members, etc...

Membership¶

You will only be able to interact with a Dataset if you are member (or the Dataset is marked as open-data). To become a member of a Dataset, you can:

Navigate to the Dataset via the Catalog Overview [TODO: link] and click 'Request Access' ².
Click on a Dataset invite link generated by one of the Dataset managers. You will then be able to review the type of Membership associated with this invite and click on the 'Accept' button to become a member.

Roles¶

Each member can be assigned zero or multiple roles within a Dataset. A role is defined by a set of individual permissions (READ, WRITE, MANAGE) and an optional read filter (which enables restricting members with this role to a subset of data available in the Dataset). By default, each Dataset will have the following roles:

Role name	Permissions granted	Description
consumer	`READ`	A consumer can only read from the Dataset.
contributor	`READ`, `WRITE`	A contributor can read from and write to the Dataset.
manager	`READ`, `WRITE`, `MANAGE`	A manager can read from, write to and manage the Dataset (e.g. perform access control).

Metrics¶

When storing data in Obelisk, next to assigning a target Dataset, you also need to specify the 'time-series' the individual records belong to. A metric is defined by a name and a type separated by the character sequence ::, for example: humidity_rh::number

The metric name is humidity_rh, referring to a series of data representing the relative humidity. The metric type is number, indicating the measured values of the series must be represented in a numerical way.

Obelisk supports the following metric types:

Type	Designation	Description
`number`	primitive	Metric Type used to represent numerical series. This type can be used for Integer and Floating point numbers, but internally Obelisk uses a 64-bit floating point representation.
`number[]`	primitive	Metric Type used to represent series for which the values are number arrays of a limited size.
`bool`	primitive	Metric Type used to represent series for which the values are booleans.
`string`	complex	Metric Type used to represent series for which the values are strings of a limited size.
`json`	complex	Metric Type used to represent series for which the values are JSON objects of a limited size.

Notice that there are two categories of types based on their designation: primitive types and complex types.

Primitive types can be stored very efficiently using compression techniques such as Double Delta or Gorilla and generally will impose less strain on the system while adding and retrieving data of these types.

Complex types cannot be compressed efficiently and will impose more overhead on the system in terms of resources (CPU, memory, disk).

Traditional time-series databases typically only support the primitive type category, but with Obelisk we wanted to open up this type of functionality to a broader range of applications by also supporting strings and full JSON objects.

However, we recommend to always prefer the primitive types when possible. This is reflected in the way the rate limiting mechanism works. Clients will always be able to do more ingests and queries within an hour when interacting with primitive types (compared to complex types).

Things¶

For each data point stored in Obelisk, the Dataset and Metric it belongs to are mandatory dimensions, which can never be omitted. Additionally, data providers can also attach a source³ to the data point, referencing the Thing that produced the data.

Your Account¶

Most interactions with the Obelisk platform require you to be authenticated. Our services use User information to implement access control and fair system usage, but also to provide a better contextualized user experience.

Your account gives you access to:

My Teams¶

A User can be a member of multiple teams. This membership can have an effect on how the user interacts with the platform:

A team can be assigned one or more roles in a Dataset. All team members will automatically inherit the same role(s).
A team can have a usage plan that allows using more Obelisk system resources (e.g. more ingest operations, more queries). All team members will automatically benefit from the higher limits imposed by this plan.
Team members can view the other members of a Team and list which development clients are available.
Team members can list and download Data Exports that were shared by other members.
Team members can list and subscribe to Data Streams that were shared by other members.

'My Teams' allows a User to list his/her teams, or create a new Team⁴.

My Datasets¶

'My Datasets' allows a User to list the Datasets he/she is a member of, or create a new Dataset⁴.

Dataset membership determines data access control.

My Clients¶

Developers can create clients to facilitate interfacing with Obelisk in their applications and services. 'My Clients' allows a User to list his/her clients, or to create additional clients.

There is no limit to the number of personal clients that can be created by a User, as all these clients are subjected to a single rate limiting pool linked to the User.

My Rate Limits¶

To ensure fair usage, each User is subjected to a number of usage limits. 'My Rate Limits' gives an overview of all applicable limits and their current quota.

My Streams¶

'My Streams' allows a User to list active streams or to set up a new data stream. The number of active streams is restricted to a maximum defined by the usage limits imposed on the User.

My Exports¶

'My Exports' allows a User to list bulk data exports or to initiate the generation of new exports. The number of exports that can be stored for an Obelisk account is restricted to a maximum defined by the usage limits imposed on the User.

Teams¶

Teams are a way of organizing users that are interacting with Obelisk towards a common goal and share a certain level of trust (e.g. members of an organisation working on a research project that uses Obelisk).

Teams can be leveraged for giving access to a Dataset for a large group of users with a single operation, or to share development clients, streams and exports with a group of trusted users (e.g. colleagues or project partners).

Team Membership¶

You will only be able to interact with a Team if you are member. A team manager can generate an invite link which can be used by users to join the team directly.

There are no configurable roles associated with a Team (unlike Dataset), but members can be marked as 'team manager' which grants special permissions, including:

Updating team metadata
Managing team members, streams and export
Creating or revoking team invites

Team Clients¶

Team members can create clients associated with the Team. These clients will be visible to other team members and modifiable by team managers.

The user that creates the client remains the 'owner' of the client, but as opposed to personal clients, team clients have an individual set of usage limits imposed by the team usage plan.

Team Streams / Exports¶

Team members can create shared Data Streams and Data Exports within a Team. These streams / exports will be visible to other team members and modifiable by team managers.

The user that creates the client remains the 'owner' of the stream / export. Unlike team clients, team streams / exports count towards the personal quota of the user. E.g. a user will not be able to create a shared team export, if the user already reached the maximum exports he/she can support.

Clients¶

When interfacing with Obelisk personally (e.g. using the Catalog or 3^rd party dashboards), users can authenticate via an OpenID provider and all auth operations will be handled by the application and your browser in the background.

To enable software to interact with Obelisk without human intervention, we support development Clients to be generated which can authenticate with Obelisk using OAuth 2 protocols. You can read our guide on How to authenticate? to learn more about this topic.

Data Streams¶

Obelisk is primarily designed as an efficient storage system for time-based data with fine-grained access control and powerful query APIs. However, we acknowledge that some applications can benefit from push-based communication.

Obelisk allows creating and subscribing to Data Streams, which enable clients to actively listen to new data coming in (instead of relying on polling via the Query APIs). Obelisk uses Server-Sent-Events to implement push-based communication.

There are a number of restrictions that apply:

The amount of active streams is limited:
- For User and personal clients the maximum amount is determined by the personal usage limits of the User.
- For team clients the maximum amount is determined by the client usage limits imposed by the team usage plan.
The producer of the data can choose whether the data is available for streaming⁵.

You can read more about Data Streams here.

Data Exports¶

While the Obelisk Query API supports paging through large amounts of data, it can be cumbersome to use when targeting millions of records (because of the request overhead, rate limits, etc...).

Data Exports allow users to request bulk downloads of large data collections. The data is collected on the Obelisk server, converted into CSV and then compressed. This feature can be very useful for data scientists that want to perform offline processing on a specific dataset.

You can read more about Data Exports here.

Rate limiting¶

Obelisk limits the number of requests, active streams and batch exports Users and Clients can make, to ensure the scalability and stability of the system.

There are two concepts used to implement rate limiting:

Usage Plans¶

A Usage Plan defines the usage boundaries of a Team in terms of the following properties:

Usage Plan Property	Description
maxUsers	Maximum amount of members the Team can have.
userUsageLimit	The set of Usage Limits the plan grants to members of the Team⁶.
maxClients	Maximum amount of clients that can be associated with the Team (see Team Clients).
clientUsageLimit	The set of Usage Limits the clients that are associated with the Team are subjected to.

When no specific plan is assigned to a Team, the system will fall back to a default Usage Plan.

Usage Limits¶

Individual User or Team clients are restricted by a set of Usage Limits:

Usage Limit Property	Description
maxHourlyPrimitiveEventsStored	Amount of primitive Metric Event instances that can be sent to the Ingest API (each hour) with the intention of storing the data (i.e. when not using the stream_only mode).
maxHourlyComplexEventsStored	Amount of complex Metric Event instances that can be sent to the Ingest API (each hour) with the intention of storing the data (i.e. when not using the stream_only mode).
maxHourlyPrimitiveEventsStreamed	Amount of primitive Metric Event instances that can be sent to the Ingest API (each hour) with the intention of streaming the data (i.e. when not using the store_only mode).
maxHourlyComplexEventsStreamed	Amount of complex Metric Event instances that can be sent to the Ingest API (each hour) with the intention of streaming the data (i.e. when not using the store_only mode).
maxHourlyPrimitiveEventQueries	Maximum number of raw event queries per hour, targeting primitive Metric types.
maxHourlyComplexEventQueries	Maximum number of raw event queries per hour, targeting complex Metric types (or a combination of primitive and complex types).
maxHourlyPrimitiveStatsQueries	Maximum number of aggregate (stats) queries per hour, targeting primitive Metric types.
maxHourlyComplexStatsQueries	Maximum number of aggregate (stats) queries per hour, targeting numerical data derived from complex Metric types.
maxDataExports	Maximum number of exports a User can have available at the same time.
maxDataExportRecords	Maximum number of records each Data Export can contain.
maxDataStreams	Maximum number of concurrent active streams a User can have.

When no specific set of Usage Limits is assigned, the system will fall back to a default set.

How to extend my limits?¶

Obelisk has support for different levels of usage plans and limits, a more demanding project could be granted higher limits.

Granting a bigger usage plan, is the equivalent of subscribing to a more expensive subscription in a commercial platform. Contact us using the Issue Tracker to discuss terms.

Users having experience with older versions of Obelisk will be familiar with the term 'Scope'. 'Dataset' in Obelisk v3 replaces 'Scope', but there are some differences in how it functions, hence the change in name. ↩
Only available if the Dataset is configured to be listed publicly (this is the case by default). ↩
See also source in the Metric Event format. ↩
The creation of new Datasets or Teams is only available if the platform manager enabled this feature when installing the system. ↩↩
This is part of the auto-regulating traffic policy of Obelisk. Data producers can only ingest a limited amount of data that is marked as available for streaming. Combining this with a maximum number of active streams per User/Client, enables us to have control over the total number of data that is streamed. ↩
The effective set of limits that is used to evaluate requests coming from the User or his/her clients is the aggregated maximum limit from all the combined Usage Limits affecting the User (personal usage limits + the usage limits granted through Team memberships). ↩