The join operator in Axiom Processing Language (APL) combines rows from two datasets based on matching values in specified columns. Use join to correlate data from different sources or datasets, such as linking logs to traces or enriching logs with additional metadata.

This operator is useful when you want to:

  • Combine information from two datasets with shared keys.
  • Analyze relationships between different types of events.
  • Enrich existing data with supplementary details.

The join operator is currently in private preview. To try it out, contact Axiom.

The preview of the join operator works with an upper limit of 5,000 events on the left side of the join and 50,000 on the right side of the join.

Kinds of join

The kinds of join and their typical use cases are the following:

  • inner (default): Returns rows where the join conditions exist in both datasets. All matching rows from the right dataset are included for each matching row in the left dataset. Useful to retain all matches without limiting duplicates.
  • innerunique: Matches rows from both datasets where the join conditions exist in both. For each row in the left dataset, only the first matching row from the right dataset is returned. Optimized for performance when duplicate matching rows on the right dataset are irrelevant.
  • leftouter: Returns all rows from the left dataset. If a match exists in the right dataset, the matching rows are included; otherwise, columns from the right dataset are null. Retains all data from the left dataset, enriching it with matching data from the right dataset.
  • rightouter: Returns all rows from the right dataset. If a match exists in the left dataset, the matching rows are included; otherwise, columns from the left dataset are null. Retains all data from the right dataset, enriching it with matching data from the left dataset.
  • fullouter: Returns all rows from both datasets. Matching rows are combined, while non-matching rows from either dataset are padded with null values. Combines both datasets while retaining unmatched rows from both sides.
  • leftanti: Returns rows from the left dataset that have no matches in the right dataset. Identifies rows in the left dataset that do not have corresponding entries in the right dataset.
  • rightanti: Returns rows from the right dataset that have no matches in the left dataset. Identifies rows in the right dataset that do not have corresponding entries in the left dataset.
  • leftsemi: Returns rows from the left dataset that have at least one match in the right dataset. Only columns from the left dataset are included. Filters rows in the left dataset based on existence in the right dataset.
  • rightsemi: Returns rows from the right dataset that have at least one match in the left dataset. Only columns from the right dataset are included. Filters rows in the right dataset based on existence in the left dataset.

The preview of the join operator currently only supports inner join. Support for other kinds of join is coming soon.

Summary of kinds of join

Kind of joinBehaviorMatches returned
innerAll matches between left and right datasetsMultiple matches allowed
inneruniqueFirst match for each row in the left datasetOnly unique matches
leftouterAll rows from the left, with matching rows from the right or nullLeft-dominant
rightouterAll rows from the right, with matching rows from the left or nullRight-dominant
fullouterAll rows from both datasets, with unmatched rows padded with nullComplete join
leftantiRows in the left dataset with no matches in the right datasetNo matches
rightantiRows in the right dataset with no matches in the left datasetNo matches
leftsemiRows in the left dataset with at least one match in the right datasetMatching rows (left dataset only)
rightsemiRows in the right dataset with at least one match in the left datasetMatching rows (right dataset only)

Choose the right kind of join

  • Use inner for standard joins where you need all matches.
  • Use leftouter or rightouter when you need to retain all rows from one dataset.
  • Use leftanti or rightanti to find rows that do not match.
  • Use fullouter for complete combinations of both datasets.
  • Use leftsemi or rightsemi to filter rows based on existence in another dataset.

For users of other query languages

If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.

Usage

Syntax

LeftDataset
| join kind=KindOfJoin RightDataset on Conditions

Parameters

  • LeftDataset: The first dataset, also known as the outer dataset or the left side of the join. If you expect one of the datasets to contain consistently less data than the other, specify the smaller dataset as the left side of the join.
  • RightDataset: The second dataset, also known as the inner dataset or the right side of the join.
  • KindOfJoin: Optionally, the kind of join to perform.
  • Conditions: The conditions for matching rows. The conditions are equality expressions that determine how Axiom matches rows from the LeftDataset (left side of the equality expression) with rows from the RightDataset (right side of the equality expression)).
    • To join datasets on a field that has the same name in the two datasets, simply use the field name. For example, on id.
    • To join datasets on a field that has different names in the two datasets, define the two field names in an equality expression such as on id == trace_id.
    • The two sides of the equality expression must have the same data type.
    • You can use expressions in the join conditions. For example, to compare two fields of different data types, use on id_string == tostring(trace_id_int).
    • You can define multiple join conditions. To separate conditions, use commas (,). Don’t use and. For example, on id == trace_id, span == span_id.

Returns

The join operator returns a new table containing rows that match the specified join condition. The fields from the left and right datasets are included.

Use case examples

Join HTTP logs with trace data to correlate user activity with performance metrics.

Query

['otel-demo-logs']
| join ['http-logs']
| join kind=inner ['otel-demo-traces'] on span_id

Output

_timespan_idtrace_idduration
2024-12-01user123trace123500ms

This query links user activity in HTTP logs to trace data to investigate performance issues.

  • union: Combines rows from multiple datasets without requiring a matching condition.
  • where: Filters rows based on conditions, often used with join for more precise results.