join
This page explains how to use the join operator in APL.
The join
operator in Axiom Processing Language (APL) combines rows from two datasets based on matching values in specified columns. Use join
to correlate data from different sources or datasets, such as linking logs to traces or enriching logs with additional metadata.
This operator is useful when you want to:
- Combine information from two datasets with shared keys.
- Analyze relationships between different types of events.
- Enrich existing data with supplementary details.
The join
operator is currently in private preview. To try it out, contact Axiom.
The preview of the join
operator works with an upper limit of 5,000 events on the left side of the join and 50,000 on the right side of the join.
Kinds of join
The kinds of join and their typical use cases are the following:
inner
(default): Returns rows where the join conditions exist in both datasets. All matching rows from the right dataset are included for each matching row in the left dataset. Useful to retain all matches without limiting duplicates.innerunique
: Matches rows from both datasets where the join conditions exist in both. For each row in the left dataset, only the first matching row from the right dataset is returned. Optimized for performance when duplicate matching rows on the right dataset are irrelevant.leftouter
: Returns all rows from the left dataset. If a match exists in the right dataset, the matching rows are included; otherwise, columns from the right dataset arenull
. Retains all data from the left dataset, enriching it with matching data from the right dataset.rightouter
: Returns all rows from the right dataset. If a match exists in the left dataset, the matching rows are included; otherwise, columns from the left dataset arenull
. Retains all data from the right dataset, enriching it with matching data from the left dataset.fullouter
: Returns all rows from both datasets. Matching rows are combined, while non-matching rows from either dataset are padded withnull
values. Combines both datasets while retaining unmatched rows from both sides.leftanti
: Returns rows from the left dataset that have no matches in the right dataset. Identifies rows in the left dataset that do not have corresponding entries in the right dataset.rightanti
: Returns rows from the right dataset that have no matches in the left dataset. Identifies rows in the right dataset that do not have corresponding entries in the left dataset.leftsemi
: Returns rows from the left dataset that have at least one match in the right dataset. Only columns from the left dataset are included. Filters rows in the left dataset based on existence in the right dataset.rightsemi
: Returns rows from the right dataset that have at least one match in the left dataset. Only columns from the right dataset are included. Filters rows in the right dataset based on existence in the left dataset.
The preview of the join
operator currently only supports inner
join. Support for other kinds of join is coming soon.
Summary of kinds of join
Kind of join | Behavior | Matches returned |
---|---|---|
inner | All matches between left and right datasets | Multiple matches allowed |
innerunique | First match for each row in the left dataset | Only unique matches |
leftouter | All rows from the left, with matching rows from the right or null | Left-dominant |
rightouter | All rows from the right, with matching rows from the left or null | Right-dominant |
fullouter | All rows from both datasets, with unmatched rows padded with null | Complete join |
leftanti | Rows in the left dataset with no matches in the right dataset | No matches |
rightanti | Rows in the right dataset with no matches in the left dataset | No matches |
leftsemi | Rows in the left dataset with at least one match in the right dataset | Matching rows (left dataset only) |
rightsemi | Rows in the right dataset with at least one match in the left dataset | Matching rows (right dataset only) |
Choose the right kind of join
- Use
inner
for standard joins where you need all matches. - Use
leftouter
orrightouter
when you need to retain all rows from one dataset. - Use
leftanti
orrightanti
to find rows that do not match. - Use
fullouter
for complete combinations of both datasets. - Use
leftsemi
orrightsemi
to filter rows based on existence in another dataset.
For users of other query languages
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
Usage
Syntax
Parameters
LeftDataset
: The first dataset, also known as the outer dataset or the left side of the join. If you expect one of the datasets to contain consistently less data than the other, specify the smaller dataset as the left side of the join.RightDataset
: The second dataset, also known as the inner dataset or the right side of the join.KindOfJoin
: Optionally, the kind of join to perform.Conditions
: The conditions for matching rows. The conditions are equality expressions that determine how Axiom matches rows from theLeftDataset
(left side of the equality expression) with rows from theRightDataset
(right side of the equality expression)).- To join datasets on a field that has the same name in the two datasets, simply use the field name. For example,
on id
. - To join datasets on a field that has different names in the two datasets, define the two field names in an equality expression such as
on id == trace_id
. - The two sides of the equality expression must have the same data type.
- You can use expressions in the join conditions. For example, to compare two fields of different data types, use
on id_string == tostring(trace_id_int)
. - You can define multiple join conditions. To separate conditions, use commas (
,
). Don’t useand
. For example,on id == trace_id, span == span_id
.
- To join datasets on a field that has the same name in the two datasets, simply use the field name. For example,
Returns
The join
operator returns a new table containing rows that match the specified join condition. The fields from the left and right datasets are included.
Use case examples
Join HTTP logs with trace data to correlate user activity with performance metrics.
Query
Output
_time | span_id | trace_id | duration |
---|---|---|---|
2024-12-01 | user123 | trace123 | 500ms |
This query links user activity in HTTP logs to trace data to investigate performance issues.