Using Match Recognize SQL to retrieve knowledge within data series

The match recognize clause can be used since Oracle Database 12c. This clause is useful for data transformation of sequence data i.e. web session data. Considering a simple data set with web sessions of only two Users where the time_id denotes a point in time on the time axis and User_id the respective User:

image

A web session is defined as a series of action (on the timeaxis) within a gap of less then 10 time units.

Match recognize allows to transform the above data into partitions that meet certain conditions e.g. actions within less the 10 units from the previous action. Using the following statement on above Table:

SELECT
time_id,
userid,
session_id
FROM (SELECT
TO_NUMBER(j.session_doc.time_id) as time_id,
j.session_doc.user_id as userid
FROM json_sessionization j)
MATCH_RECOGNIZE(
PARTITION BY userid ORDER BY time_id
MEASURES match_number() as session_id
ALL ROWS PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (b s*)
DEFINE
s as (time_id - PREV(time_id) <=10)
);

The statement will create a new column that specifies a session indicator:

image

The statement can be enhanced to provide even more detail about the session in terms of start and end time and duration:

SELECT
time_id,
userid,
session_id,
no_of_events,
start_time,
end_time,
session_duration
FROM (SELECT
TO_NUMBER(j.session_doc.time_id) as time_id,
j.session_doc.user_id as userid
FROM json_sessionization j)
MATCH_RECOGNIZE(
PARTITION BY userid ORDER BY time_id
MEASURES match_number() as session_id,
COUNT(*) as no_of_events,
FIRST(b.time_id) start_time,
LAST(s.time_id) end_time,
LAST(s.time_id) - FIRST(b.time_id) session_duration
ALL ROWS PER MATCH
PATTERN (b s+)
DEFINE
s as (time_id - PREV(time_id) <= 10)
);

This will return the following calculations on the session:

image

Conclusion

Thus, using match recognize the data can also be aggregated to calculate measures.

A complete tutorial to create the test data can be found here.

Advertisements