Queries often require complex conditions (or "query filters") to return appropriate results. WHERE clauses (or conditions) reduce the amount of data to be processed in a SELECT statement (comprised of desired result objects or columns) by specifying that only those rows meeting the criteria in the WHERE clause are displayed. Depending upon which tool you use to query the Data Warehouse, your degree of control over the query language and operators will vary.
- BusinessObjects and Microsoft Access employ graphical interfaces that construct appropriate SQL behind the scenes based on the tables and joins the user indicates.
- Oracle SQL*Plus, on the other hand, requires the user to write his or her own SQL statments.
- The Oracle database system, which forms the foundation of the Data Warehouse, permits the use of powerful SQL operators, some of which may also be available as post-query operators in your desktop tool.
The following guidelines are for all users of the Data Warehouse, regardless of query tool, and provide good practices for efficient querying.
The main difficulty most people have with writing queries is knowing which table to use. If you are unsure about which table to use, refer to the table help sections "Common Uses" and "Cautions." These sections may help you decide part of the data collection documentation on the web. Use the search functionality on the main page of each collection to help find exactly what you need.
If possible, include an indexed data element in your condition statement. A query with a record selection condition using an indexed data element tells the system to go directly to the rows in the table that contain the value indicated and to stop retrieving data when the value is no longer found. If a query does not select records based on an indexed data element in its record selection condition, the system starts searching at the first row in the table and works through every row until it reaches the last row in the table. Indexed columns are noted in each collection's documentation.
Certain operators or query segments are processed by the system without the use of indexes, even if the column in the condition is indexed. It may, of course, be necessary for you to construct your query in this manner to retrieve correct results, but in considering alternatives in query construction you may wish to keep in mind the following situations where indexes may not be used:
- Negative comparisons such as Not Equal (represented by =! in SQL), Different From, or Not In. Avoid negative phrasing of condition statements as much as possible. In general, it is easier (both for the system and for you) to interpret a positive phrase than a negative phrase. For example, instead of the condition statement "If term is not greater than 1998A," rephrase the statement to "If term is less than or equal to 1998A." Or, if practical, eliminate the condition from the query and filter your results on the desktop.
- Nulls such as Is Null or Is Not Null.
- Like or Matches Pattern comparison with a date or number column. For example, to retrieve employee payments from March (of any year) use "FISCAL_MONTH_SEQ = '09'" rather than "CHECK_DATE Like 03/%".
- Wildcards at the beginning of a string. Avoid matching patterns beginning with a wildcard (Like %...). A wildcard at the end of a pattern is definitely appropriate and can be very efficient (e.g., Where Fund Like 5% will retrieve all Funds 500000 - 599999).
- Indexed columns modified by an expression or function (e.g., rather than concatenating all Chart of Account segments as COA_CNAC||COA_ORG||COA_BC||COA_FUND||COA_OBJECT||COA_PROGRAM||COA_CREF, select the column COA_ACCOUNT, which is indexed). Also, comparing an indexed column to another indexed column using Greater Than (>) or Less Than (<).
- Check the "and/or" qualifiers in the records selection criteria of the condition statement.
- For example, a query coded to get students with the following conditions statement will actually return every student in COL for 1998A and every student for 1998C regardless of the division:
- If division is equal to 'COL' and term is equal to '1998A' or term is equal to '1998C'
- The query coded to get students with the following conditions statement will return every student in COL for 1998A and 1998C:
- If division is equal to 'COL' and (term is equal to '1998A' or term is equal to '1998C')
For example, if you are authorized to access data only from a particular department, one of your record selection conditions should state "If Organization='My Organization'," where organization is the code for your department.
Check to make sure that your query is as precise as possible. This includes selecting the tables that will give the best results, reviewing selection conditions and sort criteria, and if it makes sense to do so, including at least one indexed data element in the conditions statement. For example, if you want to find all undergraduate freshmen and their names, choose the Person table rather than the ADDRESS table. This is because a student can have multiple addresses, and choosing the ADDRESS table would return a name for each address the student has listed.
For example, a grade change can affect a student's grade point average (GPA). A query executed before and after the grade change may or may not result in a changed GPA. In addition, keep in mind that there is a "data delay" between Warehouse collections and their respective source systems. Refresh schedules are noted in each collection's documentation. Note the date, time of the query when creating reports or communicating results to others so they can keep in mind the timing of the data when reviewing results.
Queries can take many minutes to execute; complex queries can take longer. It is not uncommon for a query to take 5 to 10 minutes to complete. In general, let the query run until it finishes. If the query takes longer than 1 hour to complete, contact Enterprise Information & Analytics.