Ref url: http://blogs.oracle.com/optimizer/ for more information…
….
Star transformation was introduced in Oracle 8i to process star queries efficiently. These queries are commonly used in data warehouse applications that follow the Star Schema data model. The Star Schema is so called because the data model diagram resembles a star. The center of the star consists of one or more fact tables and the points of the star are the dimension tables.
The basic idea of this transformation is to steer clear of using a full table scan access method on large tables, referred to as fact tables in the Star Schema. In a typical star query, the fact table is joined to several much smaller dimension tables. The fact table typically contains one key (referred to as foreign key) for every dimension table as well as a number of measure columns such as sales amount. The corresponding key in the dimension table is referred to as the primary key. The join is based on a foreign key of the fact table with the corresponding primary key of the dimension table. The query also contains filter predicates on other columns of the dimension tables that typically are very restrictive. The combination of these filters help to dramatically reduce the data set processed from the fact table. The goal of star transformation is to access only this reduced set of data from the fact table.
Consider the following star query Q1. The query is to find the total sales amount in all cities in California for quarters Q1 and Q2 of year 1999 through the Internet.
Q1:
SELECT c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount
FROM sales s, times t, customers c, channels ch
WHERE s.time_id = t.time_id
AND s.cust_id = c.cust_id
AND s.channel_id = ch.channel_id
AND c.cust_state_province = ‘CA’
AND ch.channel_desc = ‘Internet’
AND t.calendar_quarter_desc IN (‘1999-01′,’1999-02’)
GROUP BY c.cust_city, t.calendar_quarter_desc;
Sales is the fact table while the other tables are considered as dimension tables. The Sales table contains one row for every sale of a product and thus it may contain billions of sales records. However only a few of them are sold to customers in California through the Internet for the specified quarters. The query is transformed into Q2.
Q2:
SELECT c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount
FROM sales s, times t, customers c
WHERE s.time_id = t.time_id
AND s.cust_id = c.cust_id
AND c.cust_state_province = ‘CA’
AND t.calendar_quarter_desc IN (‘1999-01′,’1999-02’)
AND s.time_id IN (SELECT time_id
FROM times
WHERE calendar_quarter_desc IN(‘1999-01′,’1999-02’))
AND s.cust_id IN (SELECT cust_id
FROM customers
WHERE cust_state_province=’CA’)
AND s.channel_id IN (SELECT channel_id
FROM channels
WHERE channel_desc = ‘Internet’)
GROUP BY c.cust_city, t.calendar_quarter_desc;
Star transformation is essentially about adding subquery predicates corresponding to the constraint dimensions. These subquery predicates are referred to as bitmap semi-join predicates. The transformation is performed when there are indexes on the fact join columns (s.timeid, s.custid…). By driving bitmap AND and OR operations (bitmaps can be from bitmap indexes or generated from regular B-Tree indexes) of the key values supplied by the subqueries, only the relevant rows from the fact table need to be retrieved. If the filters on the dimension tables filter out a lot of data, this can be much more efficient than a full table scan on the fact table. After the relevant rows have been retrieved from the fact table, they may need to be joined back to the dimension tables, using the original predicates. In some cases, the join back can be eliminated. We will discuss this situation later.
Table 1 shows the query plan for the transformed query. Note that the sales table has a bitmap access path instead of a full table scan. For each key value coming from the subqueries (lines 11, 16, 21), the bitmaps are retrieved from the fact table indexes (lines 12, 17, 22). Each bit in the bitmap corresponds to a row in fact table. The bit is set if the key value from the subquery is same as the value in the row of fact table. For example, the bitmap [1][0][1][0][0][0]…(all 0s for remaining rows) indicate that rows 1 and 3 of fact table has matching key value from subquery. Lets say the above bitmap is for a key value from customers table subquery.
The operations in lines 9, 14, 19 iterates over the keys from the subqueries and get the corresponding bitmaps. Lets say the customers subquery produces one more key value with the bitmap [0][1][0][0][0][0]…
The bitmaps for each subquery are merged (ORed) (lines 8, 13 and 18). In the above example, it will produce a single bitmap [1][1][1][0][0][0]… for customers subquery after merging the two bitmaps.
The merged bitmaps are ANDed (line 7). Lets say the bitmap from channels is [1][0][0][0][0][0]… If you AND this bitmap with the bitmap from customers subquery it will produce [1][0][0][0][0]…
The corresponding rowids of the final bitmap are generated (line 6). The fact table rows are retrieved using the rowids (line 5). In the above example, it will generate only 1 rowid corresponding to the first row and fetches only a single row instead of scanning the entire fact table.
The representation of bitmaps in the above example is for illustration purpose only. In oracle, they are represented and stored in a compressed form.
Table 1: The plan of the transformed query
Id | Operation | Name |
0 | SELECT STATEMENT | |
1 | HASH GROUP BY | |
2 | HASH JOIN | |
3 | HASH JOIN | |
4 | PARTITION RANGE SUBQUERY | |
5 | TABLE ACCESS BY LOCAL INDEX ROWID | SALES |
6 | BITMAP CONVERSION TO ROWIDS | |
7 | BITMAP AND | |
8 | BITMAP MERGE | |
9 | BITMAP KEY ITERATION | |
10 | BUFFER SORT | |
11 | TABLE ACCESS FULL | CHANNELS |
12 | BITMAP INDEX RANGE SCAN | SALES_CHANNEL_BIX |
13 | BITMAP MERGE | |
14 | BITMAP KEY ITERATION | |
15 | BUFFER SORT | |
16 | TABLE ACCESS FULL | TIMES |
17 | BITMAP INDEX RANGE SCAN | SALES_TIME_BIX |
18 | BITMAP MERGE | |
19 | BITMAP KEY ITERATION | |
20 | BUFFER SORT | |
21 | TABLE ACCESS FULL | CUSTOMERS |
22 | BITMAP INDEX RANGE SCAN | SALES_CUST_BIX |
23 | TABLE ACCESS FULL | CUSTOMERS |
24 | TABLE ACCESS FULL | TIMES |
Join back elimination
The subqueries and their bitmap tree only filter the fact table based on the dimension filters, so it may still be necessary to join to the dimension table. The join back of the dimension table is eliminated when all the predicates on dimension tables are part of the semijoin subquery predicate, the column(s) selected from the subquery are unique and the dimension columns are not in select list, group by etc. In the above example, the table channels is not joined back to the sales table since it is not referenced outside and channel_id is unique.
Temporary table transformation
If the join back is not eliminated, Oracle stores the results of the subquery in a temporary table to avoid re-scanning the dimension table (for bitmap key generation and join back). In addition to this, the results are materialized if the query is run in parallel, so that each slave can select the results from the temporary tables instead of executing the subquery again.
For example, if Oracle materializes the results of the subquery on customers into a temporary table, the transformed query Q3 will be as follows.
Q3:
SELECT t1.c1 cust_city, t.calendar_quarter_desc calendar_quarter_desc,
sum(s.amount_sold) sales_amount
FROM sales s, sh.times t, sys_temp_0fd9d6621_e7e24 t1
WHERE s.time_id=t.time_id
AND s.cust_id=t1.c0
AND (t.calendar_quarter_desc=’1999-q1′ OR t.calendar_quarter_desc=’1999-q2′)
AND s.cust_id IN (SELECT t1.c0 FROM sys_temp_0fd9d6621_e7e24 t1)
AND s.channel_id IN (SELECT ch.channel_id
FROM channels ch
WHERE ch.channel_desc=’internet’)
AND s.time_id IN (SELECT t.time_id
FROM times t
WHERE t.calendar_quarter_desc=’1999-q1′
OR t.calendar_quarter_desc=’1999-q2′)
GROUP BY t1.c1, t.calendar_quarter_desc
Note that customers is replaced by the temporary table sys_temp_0fd9d6621_e7e24 and references to columns cust_id and cust_city are replaced by the corresponding columns of the temporary table. The temporary table will be created with 2 columns – (c0 number, c1 varchar2(30)). These columns corresponds to cust_id and cust_city of customers table. The table will be populated using the following query Q4 at the beginning of the execution of the statement Q3.
Q4:
SELECT c.cust_id, c.cust_city FROM customers WHERE c.cust_state_province = ‘CA’
Table 2 shows the plan for the transformed query.
Table 2: Plan with temporary table transformation
0 | SELECT STATEMENT | |||
1 | TEMP TABLE TRANSFORMATION | |||
2 | LOAD AS SELECT | sys_temp_ 0fd9d6621_e7e24 | ||
3 | TABLE ACCESS FULL | CUSTOMERS | ||
4 | HASH GROUP BY | |||
5 | HASH JOIN | |||
6 | HASH JOIN | |||
7 | PARTITION RANGE SUBQUERY | |||
8 | TABLE ACCESS BY LOCAL INDEX ROWID | SALES | ||
9 | BITMAP CONVERSION TO ROWIDS | |||
10 | BITMAP AND | |||
11 | BITMAP MERGE | |||
12 | BITMAP KEY ITERATION | |||
13 | BUFFER SORT | |||
14 | TABLE ACCESS FULL | CHANNELS | ||
15 | BITMAP INDEX RANGE SCAN | SALESCHANNELBIX
|
||
16 | BITMAP MERGE | |||
17 | BITMAP KEY ITERATION | |||
18 | BUFFER SORT | |||
19 | TABLE ACCESS FULL | TIMES | ||
20 | BITMAP INDEX RANGE SCAN | SALESTIMEBIX
|
||
21 | BITMAP MERGE | |||
22 | BITMAP KEY ITERATION | |||
23 | BUFFER SORT | |||
24 | TABLE ACCESS FULL | sys_temp_0fd9d6621_e7e24 | ||
25 | BITMAP INDEX RANGE SCAN | SALESCUSTBIX | ||
26 | TABLE ACCESS FULL | sys_temp_0fd9d6621_e7e24 | ||
27 | TABLE ACCESS FULL | TIMES |
The lines 1,2 and 3 of the plan materialize the customers subquery into the temporary table. In line 24, it scans the temporary table (instead of the subquery) to build the bitmap from the fact table. Line 26 is for scanning the temporary table for joining back instead of scanning customers. Note that the filter on customers is not needed to be applied on the temporary table since the filter is already applied while materializing the temporary table.
Enabling the transformation
Star transformation is controlled by the star_transformation_enabled parameter. The parameter takes 3 values.
- TRUE – The Oracle optimizer performs transformation by identifying fact and constraint dimension tables automatically. This is done in a cost-based manner, i.e. the transformation is performed only if the cost of the transformed plan is lower than the non-transformed plan. Also the optimizer will attempt temporary table transformation automatically whenever materialization improves performance.
- FALSE – The transformation is not tried.
- TEMP_DISABLE – This value has similar behavior as TRUE except that temporary table transformation is not tried.
The default value of the parameter is FALSE. You have to change the parameter value and create indexes on the joining columns of the fact table to take advantage of this transformation.
Summary
Star transformation improves the performance of queries with a very big fact table joined to multiple dimension tables when the dimension tables have very selective predicates. The transformation avoids the full scan of the fact table. It fetches only relevant rows from the fact table that will eventually join to the constraint dimension rows. The transformation is performed based on cost – only when the cost of the transformed plan is lower than that of the non-transformed plan. If the dimension filters do not significantly reduce the amount of data to be retrieved from the fact table, then a full table scan is more efficient.
In this post we have tried to illustrate the basic ideas behind star transformation by showing simple example queries and plans. Oracle can do star transformation in more complex cases. For example, a query with multiple fact tables, snowflakes (dimension is a join of several normalized tables instead of denormalized single table), etc.
- Simple view merging, for simple select-project-join views.
- Outer-join view merging for outer-joined views.
- Complex view merging, for distinct and group by views.
select e.first_name, e.last_name, dept_locs_v.street_address, dept_locs_v.postal_code from employees e, (select d.department_id, d.department_name, l.street_address, l.postal_code from departments d, locations l where d.location_id = l.location_id) dept_locs_v where dept_locs_v.department_id = e.department_id and e.last_name = 'Smith';
----------------------------------------------------------------- | Id | Operation | Name | Cost (%CPU)| ----------------------------------------------------------------- | 0 | SELECT STATEMENT | | 7 (15)| |* 1 | HASH JOIN | | 7 (15)| | 2 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 2 (0)| |* 3 | INDEX RANGE SCAN | EMP_NAME_IX | 1 (0)| | 4 | VIEW | | 5 (20)| |* 5 | HASH JOIN | | 5 (20)| | 6 | TABLE ACCESS FULL | LOCATIONS | 2 (0)| | 7 | TABLE ACCESS FULL | DEPARTMENTS | 2 (0)| ----------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("DEPT_LOCS_V"."DEPARTMENT_ID"="E"."DEPARTMENT_ID") 3 - access("E"."LAST_NAME"='Smith') 5 - access("D"."LOCATION_ID"="L"."LOCATION_ID")
select e.first_name, e.last_name, l.street_address, l.postal_code from employees e, departments d, locations l where d.location_id = l.location_id and d.department_id = e.department_id and e.last_name = 'Smith';
------------------------------------------------------------------- | Id | Operation | Name | Cost (%CPU)| ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4 (0)| | 1 | NESTED LOOPS | | | | 2 | NESTED LOOPS | | 4 (0)| | 3 | NESTED LOOPS | | 3 (0)| | 4 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 2 (0)| |* 5 | INDEX RANGE SCAN | EMP_NAME_IX | 1 (0)| | 6 | TABLE ACCESS BY INDEX ROWID| DEPARTMENTS | 1 (0)| |* 7 | INDEX UNIQUE SCAN | DEPT_ID_PK | 0 (0)| |* 8 | INDEX UNIQUE SCAN | LOC_ID_PK | 0 (0)| | 9 | TABLE ACCESS BY INDEX ROWID | LOCATIONS | 1 (0)| ------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 5 - access("E"."LAST_NAME"='Smith') 7 - access("E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID") 8 - access("D"."LOCATION_ID"="L"."LOCATION_ID")
- The view contains constructs other than select, project, join, including:
- Group by
- Distinct
- Outer-join
- Spreadsheet clause
- Connect by
- Set operators
- Aggregation
- The view appears on the right side of a semi- or anti-join.
- The view contains subqueries in the select list.
- The outer query block contains PL/SQL functions.
select e1.first_name||' '||e1.last_name emp_name, dept_managers_v.manager_name, dept_managers_v.department_name from employees e1, (select e2.manager_id, e2.first_name||' '||e2.last_name as manager_name, d.department_id, d.department_name from departments d, employees e2 where d.manager_id = e2.employee_id) dept_managers_v where dept_managers_v.department_id = e1.department_id(+) and dept_managers_v.manager_id = e1.manager_id(+);
select e1.first_name||' '||e1.last_name emp_name, dept_managers_v.manager_name, dept_managers_v.department_name from employees e1, (select e2.manager_id, e2.first_name||' '||e2.last_name as manager_name, d.department_id, d.department_name from departments d, employees e2 where d.manager_id = e2.employee_id) dept_managers_v where dept_managers_v.department_id = e1.department_id(+);
select e1.first_name||' '||e1.last_name emp_name, e2.first_name||' '||e2.last_name as manager_name, d.department_name from employees e1, employees e2, departments d where d.manager_id = e2.employee_id and d.department_id = e1.department_id(+);
create view cust_prod_totals_v as select sum(s.quantity_sold) total, s.cust_id, s.prod_id from sales s group by s.cust_id, s.prod_id; select c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email from customers c, products p, cust_prod_totals_v where c.country_id = 'US' and c.cust_id = cust_prod_totals_v.cust_id and cust_prod_totals_v.total > 100 and cust_prod_totals_v.prod_id = p.prod_id and p.prod_name = 'T3 Faux Fur-Trimmed Sweater';
select c.cust_id, cust_first_name, cust_last_name, cust_email from customers c, products p, sales s where c.country_id = 'US' and c.cust_id = s.cust_id and s.prod_id = p.prod_id and p.prod_name = 'T3 Faux Fur-Trimmed Sweater' group by s.cust_id, s.prod_id, p.rowid, c.rowid, c.cust_email, c.cust_last_name, c.cust_first_name, c.cust_id having sum(s.quantity_sold) > 100;
-------------------------------------------------------- | Id | Operation | Name | Cost (%CPU)| -------------------------------------------------------- | 0 | SELECT STATEMENT | | 2101 (18)| |* 1 | FILTER | | | | 2 | HASH GROUP BY | | 2101 (18)| |* 3 | HASH JOIN | | 2099 (18)| |* 4 | HASH JOIN | | 1801 (19)| |* 5 | TABLE ACCESS FULL| PRODUCTS | 96 (5)| | 6 | TABLE ACCESS FULL| SALES | 1620 (15)| |* 7 | TABLE ACCESS FULL | CUSTOMERS | 296 (11)| -------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(SUM("QUANTITY_SOLD")>100) 3 - access("C"."CUST_ID"="CUST_ID") 4 - access("PROD_ID"="P"."PROD_ID") 5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater') 7 - filter("C"."COUNTRY_ID"='US')
There is no view in the plan above, which is what one would expect after the view has been merged. However, there are some cases where a view will still appear in the plan even after view merging, with a name like VW_NWVW_1. We’ll discuss the reasons why in a moment, but first let’s look at an example. This also gives us a chance to look at an example of distinct view merging. Consider this query to find customers in the US that bought a particular product:
select c.cust_id, c.cust_first_name, c.cust_last_name, c.cust_email from customers c, products p, (select distinct s.cust_id, s.prod_id from sales s) cust_prod_v where c.country_id = 'US' and c.cust_id = cust_prod_v.cust_id and cust_prod_v.prod_id = p.prod_id and p.prod_name = 'T3 Faux Fur-Trimmed Sweater';
The view can be merged, though it is based on cost, since the reduction in data due to distinct may make the join cheaper. In this case, however, it is cheaper to merge the view, so we get this equivalent query:
select nwvw.cust_id, nwvw.cust_first_name, nwvw.cust_last_name, nwvw.cust_email from (select distinct c.rowid, p.rowid, s.prod_id, s.cust_id, c.cust_id, c.cust_first_name, c.cust_last_name, c.cujst_email from customers c, products p, sales s where c.country_id = 'US' and c.cust_id = s.cust_id and s.prod_id = p.prod_id and p.prod_name = 'T3 Faux Fur-Trimmed Sweater') nwvw;
and this plan:
------------------------------------------- | Id | Operation | Name | ------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | VIEW | VM_NWVW_1 | | 2 | HASH UNIQUE | | |* 3 | HASH JOIN | | |* 4 | HASH JOIN | | |* 5 | TABLE ACCESS FULL| PRODUCTS | | 6 | TABLE ACCESS FULL| SALES | |* 7 | TABLE ACCESS FULL | CUSTOMERS | ------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("C"."CUST_ID"="S"."CUST_ID") 4 - access("S"."PROD_ID"="P"."PROD_ID") 5 - filter("P"."PROD_NAME"='T3 Faux Fur-Trimmed Sweater') 7 - filter("C"."COUNTRY_ID"='US')
So why do we still have a view after we’ve supposedly merged the view? The new view is what we call a « projection view ». When we merge the view, we move the distinct to the outer query block. But when we move the distinct, we have to add several additional columns, in order to maintain semantic equivalence with the original query. So we put all of that into a new view, so we can select out just the columns we want in the outer query block’s select list. But we still get all of the benefits we promised from merging the view — all of the tables are in one query block and the optimizer is free to permute them as it desires in the final join order, and the distinct operation has been delayed until after all of the joins are completed. These projection views appear in queries where a distinct view has been merged, or a group by view is merged into an outer query block which also contains group by, having, and/or aggregates. In the latter case, the projection view contains the group by, having, and aggregates from the original outer query block.
- The outer query tables do not have a rowid or unique column
- View appears in a connect by query block
- View contains grouping sets, rollup, pivot
- View or outer query block contains spreadsheet clause