SQL Basics – Working with ERP Data

24 01 2010

January 24, 2010

This blog article is based on a portion of a presentation that I gave at a regional ERP user’s group meeting.  While some of the information is specific to that particular ERP platform, the concepts should be general enough that the material may be applied to other environments.

Typically, a language called Structured Query Language (SQL) is used to directly communicate with the database.  As with all languages, there are syntax rules that must be followed.  In general, data is stored in a series of tables, which may be thought of as if they were worksheets in an Excel spreadsheet.  The various tables may be joined together to provide greater detail, but great care must be taken to correctly join the tables together.  The correct table joining conditions may be partially determined by examining the primary and foreign key relationships between the tables, and we will talk about that more later in the presentation.

Tips:

Relationships between tables containing related information may be determined by:

  • Primary (parent) and foreign (child) relationships defined in the database (see Data Dict Foreign Keys worksheet).
  • Primary key columns are often named ID, and the foreign key columns are often named table_ID, for example: ACCOUNT.ID = ACCOUNT_BALANCE.ACCOUNT_ID
  • Relationships may be discovered by searching for other tables in the database containing the same column names (see Data Dict Tables worksheet).

SQL Basics:

Indexes on table columns may allow a query to execute faster, but it is important that all of the beginning columns in the index are used (don’t forget the TYPE column when retrieving information from the WORK_ORDER table, or the WORKORDER_TYPE column when accessing the OPERATION table).  While indexes usually help when a small amount of information is needed from a table, other methods (full table scan) are sometimes more appropriate. 

Indexes usually cannot be used for those columns in the WHERE clause if the column appears inside a function name – index will not be used for   TRUNC(LABOR_TICKET.TRANSACTION_DATE) =  – unless a function based index is created for that function and column combination.

When multiple tables must be accessed, each column retrieved should be prefixed with the table name (or an aliased name for the table) containing the column.  Prefixing the columns improves the readabilty of the SQL statement and prevents errors that happen when two tables contain columns with the same names.

In a WHERE clause, character type data should appear in single quotes ( ‘ ), and number type data should not appear in single quotes.  Dates should not rely on implicit data type conversion –  don’t use ’24-JAN-2010′ as there is a chance that the implicit conversion will fail in certain environments.

Information retrieved from the database using a SQL statement may be grouped to summarize the data.

Executing SQL:

Assume that we are new to SQL and just start typing a SQL statement, hoping that the database will be able to help us make a correct request – since that kind of works in Microsoft Access.

SELECT DISTINCT
  *
FROM
  WORK_ORDER,
  OPERATION,
  REQUIREMENT;

When we execute this SQL statement, the database server spins and spins (not the formal meaning of a spin), until the SQL statement finally falls over and dies (to the uninitiated, this is not supposed to happen when a query executes).

Mon Jan 07 11:01:32 2009
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPORARY_DATA
Mon Jan 07 11:12:31 2009
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPORARY_DATA
Mon Jan 07 11:15:11 2009
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPORARY_DATA
Mon Jan 07 11:25:28 2009
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPORARY_DATA
Mon Jan 07 11:28:13 2009
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPORARY_DATA

Depending on the database engine and the database administrator, it might be that the database is down for a long time, or that just the query tool that submitted the SQL statement crashes after forcing the CPUs on the server to spin excessively.  Be careful about who has access to a query tool that access the database.

Simple SQL – Retrieve the part ID, description, product code, and quantity on hand for all parts:

The following is a simple SQL statement which will retrieve four columns from the PART table for all parts, essentially in random order.  You may notice that my SQL statement is formatted in a very specific way – the reason for this formatting will become more clear later.  Essentially, standardized formats help improve database performance (by reducing the number of hard parses) – for ad hoc SQL statements (those created for one time use), the performance difference probably will not be noticed, but when placed into various applications that execute the SQL statements repeatedly, the performance difference will be very clear.

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  QTY_ON_HAND
FROM
  PART;

Retrieve the part ID, description, product code, and quantity on hand for all parts with a commodity code of AAAA:

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  QTY_ON_HAND
FROM
  PART
WHERE
  COMMODITY_CODE = 'AAAA';

Retrieve the part ID, description, product code, and quantity on hand for all parts with a commodity code of AAAA with more than 10 on hand:

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  QTY_ON_HAND
FROM
  PART
WHERE
  COMMODITY_CODE = 'AAAA'
  AND QTY_ON_HAND > 10

Retrieve the part ID, description, product code, and quantity on hand for all parts with a commodity code beginning with  A  with 10 to 100 on hand:

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  QTY_ON_HAND
FROM
  PART
WHERE
  COMMODITY_CODE LIKE 'A%'
  AND QTY_ON_HAND BETWEEN 10 AND 100;

Retrieve the part ID, description, product code, and quantity on hand sorted by product code, then part ID – Fixing the Random Order:

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  QTY_ON_HAND
FROM
  PART
WHERE
  COMMODITY_CODE LIKE 'A%'
  AND QTY_ON_HAND BETWEEN 10 AND 100
ORDER BY
  PRODUCT_CODE,
  ID;

Retrieve the product code, and total quantity on hand by product code, sorted by product code:

SELECT
  PRODUCT_CODE,
  SUM(QTY_ON_HAND) AS TOTAL_QTY
FROM
  PART
WHERE
  COMMODITY_CODE LIKE 'F%'
GROUP BY
  PRODUCT_CODE
ORDER BY
  PRODUCT_CODE;

The above example changed the previous example quite a bit, so that only those parts with a commodity code beginning with F are returned – in the example, I want to determine the total number of parts on hand by product code (labeled TOTAL_QTY) for those parts with a commodity code beginning with F.  In addition to the ORDER BY clause, a GROUP BY clause was also needed.  The columns that must be listed in the group by clause are those columns in the SELECT clause which are not inside a SUM(), AVG(), MIN(), MAX(), or similar function.

Retrieve the product code, and total quantity on hand by product code, return only those with a total quantity on hand more than 100, sorted by product code:

SELECT
  PRODUCT_CODE,
  SUM(QTY_ON_HAND) AS TOTAL_QTY
FROM
  PART
WHERE
  COMMODITY_CODE LIKE 'F%'
GROUP BY
  PRODUCT_CODE
HAVING
  SUM(QTY_ON_HAND) > 100
ORDER BY
  PRODUCT_CODE;

Retrieve the top level part ID produced by all unreleased, firmed, and released work orders, include the work order, lot, part description, and quantity on hand:

Now that we know how to work with data stored in a single table, let’s take a look at an example with two tables.  Each column returned from the tables should be prefixed with the table name – primarily in case where the same column name appears in both tables, but doing this also makes it easier to troubleshoot problems with the SQL statement at a later time.  The following SQL statement retrieves a list of all parts produced by non-closed and non-canceled work orders that are in the system (status is unreleased, firmed, or released).

SELECT
  WORK_ORDER.BASE_ID,
  WORK_ORDER.LOT_ID,
  WORK_ORDER.SPLIT_ID,
  WORK_ORDER.PART_ID,
  PART.DESCRIPTION,
  PART.QTY_ON_HAND,
  WORK_ORDER.DESIRED_QTY,
  WORK_ORDER.RECEIVED_QTY
FROM
  WORK_ORDER,
  PART
WHERE
  WORK_ORDER.TYPE = 'W'
  AND WORK_ORDER.SUB_ID='0'
  AND WORK_ORDER.PART_ID=PART.ID
  AND WORK_ORDER.DESIRED_QTY > WORK_ORDER.RECEIVED_QTY
  AND WORK_ORDER.STATUS IN ('U', 'F', 'R')
ORDER BY
  WORK_ORDER.PART_ID,
  WORK_ORDER.BASE_ID,
  WORK_ORDER.LOT_ID,
  WORK_ORDER.SPLIT_ID;

The following SQL statement is essentially the same SQL statement as the last, just with table aliases (or short-names) which significantly reduce the amount of typing.

SELECT
  WO.BASE_ID,
  WO.LOT_ID,
  WO.SPLIT_ID,
  WO.PART_ID,
  P.DESCRIPTION,
  P.QTY_ON_HAND,
  WO.DESIRED_QTY,
  WO.RECEIVED_QTY
FROM
  WORK_ORDER WO,
  PART P
WHERE
  WO.TYPE = 'W'
  AND WO.SUB_ID='0'
  AND WO.PART_ID=P.ID
  AND WO.DESIRED_QTY > WO.RECEIVED_QTY
  AND WO.STATUS IN ('U', 'F', 'R')
ORDER BY
  WO.PART_ID,
  WO.BASE_ID,
  WO.LOT_ID,
  WO.SPLIT_ID;

Retrieve the engineering master information for a part

Back to the original example which brought down the database server (or at the least filled the temp tablespace to its maximum size), adding in two references to the PART table, each with a different alias name.  This SQL statement will retrieve the main header card, all operations, and all material requirements for a specific fabricated part.  But, there is a catch.  Operations without material requirements are excluded from the output.  Fixing that problem requires the use of an outer join, which on Oracle is indicated by a (+) following the column name that is permitted to be NULL, and on SQL Server the outer join is indicated by an * to the side of the equality that is NOT permitted to be NULL.  (Note that there are also ANSI style inner and outer joins, but these are not mentioned here).

SELECT
  WO.BASE_ID || DECODE(O.WORKORDER_SUB_ID, '0', '/', '-' || O.WORKORDER_SUB_ID  || '/') || WO.LOT_ID AS WORK_ORDER,
  WO.DESIRED_QTY - WO.RECEIVED_QTY AS REMAINING_QTY,
  WO.PART_ID AS WO_PART_ID,
  P.DESCRIPTION AS WO_PART_DESC,
  O.SEQUENCE_NO AS OP,
  O.RESOURCE_ID,
  SR.DESCRIPTION AS RESOURCE_DESC,
  O.SETUP_HRS,
  O.RUN_HRS,
  O.CALC_END_QTY,
  R.PIECE_NO,
  R.PART_ID AS REQ_PART_ID,
  P2.DESCRIPTION AS REQ_PART_DESC,
  R.CALC_QTY
FROM
  WORK_ORDER WO,
  PART P,
  OPERATION O,
  SHOP_RESOURCE SR,
  REQUIREMENT R,
  PART P2
WHERE
  WO.TYPE = 'M'
  AND P.ID = 'ABC123'
  AND P.ID = WO.BASE_ID
  AND P.ENGINEERING_MSTR = WO.LOT_ID
  AND WO.SPLIT_ID = '0'
  AND WO.SUB_ID = '0'
  AND WO.TYPE = O.WORKORDER_TYPE
  AND WO.BASE_ID = O.WORKORDER_BASE_ID
  AND WO.LOT_ID = O.WORKORDER_LOT_ID
  AND WO.SPLIT_ID = O.WORKORDER_SPLIT_ID
  AND O.RESOURCE_ID = SR.ID(+)
  AND O.WORKORDER_TYPE = R.WORKORDER_TYPE(+)
  AND O.WORKORDER_BASE_ID = R.WORKORDER_BASE_ID(+)
  AND O.WORKORDER_LOT_ID = R.WORKORDER_LOT_ID(+)
  AND O.WORKORDER_SPLIT_ID = R.WORKORDER_SPLIT_ID(+)
  AND O.WORKORDER_SUB_ID = R.WORKORDER_SUB_ID(+)
  AND O.SEQUENCE_NO = R.OPERATION_SEQ_NO(+)
  AND R.PART_ID = P2.ID(+)
ORDER BY
  O.WORKORDER_SUB_ID,
  O.SEQUENCE_NO,
  R.PART_ID;

Analyze the UNIT_MATERIAL_COST column in the PART table.  For each part, find the relative cost (high to low) ranking, average cost, smallest cost, highest cost, and the total number in each group, when the parts are grouped individually by product code, commodity code, and also preferred vendor (all parts without a preferred vendor are grouped together):

SELECT
  ID,
  DESCRIPTION,
  PRODUCT_CODE,
  COMMODITY_CODE,
  UNIT_MATERIAL_COST,
  ROW_NUMBER() OVER (PARTITION BY PRODUCT_CODE ORDER BY COMMODITY_CODE,ID) PART_WITHIN_PC,
  COUNT(1) OVER (PARTITION BY PRODUCT_CODE ORDER BY COMMODITY_CODE,ID) PART_WITHIN_PC2,
  RANK() OVER (PARTITION BY PRODUCT_CODE ORDER BY UNIT_MATERIAL_COST DESC NULLS LAST) RANK_PC_COST,
  AVG(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) AVG_PC_COST,
  MIN(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) MIN_PC_COST,
  MAX(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) MAX_PC_COST,
  COUNT(UNIT_MATERIAL_COST) OVER (PARTITION BY PRODUCT_CODE) COUNT_PC,
  RANK() OVER (PARTITION BY COMMODITY_CODE ORDER BY UNIT_MATERIAL_COST DESC NULLS LAST) RANK_CC_COST,
  AVG(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) AVG_CC_COST,
  MIN(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) MIN_CC_COST,
  MAX(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) MAX_CC_COST,
  COUNT(UNIT_MATERIAL_COST) OVER (PARTITION BY COMMODITY_CODE) COUNT_CC,
  RANK() OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB') ORDER BY UNIT_MATERIAL_COST
    DESC NULLS LAST) RANK_VENDOR_COST,
  AVG(UNIT_MATERIAL_COST) OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB')) AVG_VENDOR_COST,
  MIN(UNIT_MATERIAL_COST) OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB')) MIN_VENDOR_COST,
  MAX(UNIT_MATERIAL_COST) OVER (PARTITION BY NVL(PREF_VENDOR_ID,'IN_HOUSE_FAB')) MAX_VENDOR_COST,
  COUNT(UNIT_MATERIAL_COST) OVER (PARTITION BY PREF_VENDOR_ID) COUNT_VENDOR
FROM
  PART
ORDER BY
  ID;

On Oracle, there are also analytical functions which allow information to be grouped together without the need for a GROUP BY clause, and each column returned could potentially be grouped using different criteria.  There are several interesting analytical functions that make otherwise difficult comparisons both easy to accomplish and efficient to execute.  Many of the analytical functions allow data to be summarized by groups without losing the detail contained in each row of the data, for instance we are able to select the part_ID, description, and unit_material_cost without grouping on those columns.  PARTITION BY may be thought of as behaving like GROUP BY.  The inclusion of ORDER BY within the OVER clause means that only those rows encountered to that point, when sorted in the specified order, will be considered.

Show the sum of the hours worked for each employee by shift date, along with the previous five days and the next five days, and the next Monday after the shift date – looking at previous and next rows in the data, using inline view:

SELECT
  EMPLOYEE_ID,
  SHIFT_DATE,
  NEXT_DAY(SHIFT_DATE,'MONDAY') PAYROLL_PREPARE_DATE,
  LAG(HOURS_WORKED,5,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV5_HOURS,
  LAG(HOURS_WORKED,4,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV4_HOURS,
  LAG(HOURS_WORKED,3,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV3_HOURS,
  LAG(HOURS_WORKED,2,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV2_HOURS,
  LAG(HOURS_WORKED,1,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) PREV_HOURS,
  HOURS_WORKED,
  LEAD(HOURS_WORKED,1,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT_HOURS,
  LEAD(HOURS_WORKED,2,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT2_HOURS,
  LEAD(HOURS_WORKED,3,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT3_HOURS,
  LEAD(HOURS_WORKED,4,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT4_HOURS,
  LEAD(HOURS_WORKED,5,0) OVER (PARTITION BY EMPLOYEE_ID ORDER BY SHIFT_DATE) NEXT5_HOURS
FROM
  (SELECT
    EMPLOYEE_ID,
    SHIFT_DATE,
    SUM(HOURS_WORKED) HOURS_WORKED
  FROM
    LABOR_TICKET
  WHERE
    SHIFT_DATE>=TRUNC(SYSDATE-14)
  GROUP BY
    EMPLOYEE_ID,
    SHIFT_DATE
  ORDER BY
    EMPLOYEE_ID,
    SHIFT_DATE);

LAG and LEAD are interesting functions which permit looking at previous and next rows, when sorted in the specified order.

SQL coding is not hard to understand – as long as you build out from a simple SQL statement to the SQL statement that returns the desired output.