Eliminate Rows Having a Letter and Number Combination

3 01 2010

January 3, 2010

(Back to the Previous Post in the Series)

In a recent message thread on the comp.databases.oracle.misc group, the following question was asked:

I am currently using Oracle9i Enterprise Edition Release 9.2.0.4.0. I have a table with following data

Table 1 (Sample data)

a12345
A123423
g13452
G452323
h34423
r34323
b23232
n232323

I am currently using this as a subquery in one of the query. As per a new request I have to now exclude all values which start with h, b or n followed by numeric values. So end result the subquery should give me is

Table 1 (Sample data)

a12345
A123423
g13452
G452323
r34323

I am little stumped on this for now. Could not get it right in my query. Can anyone please advise here. Let me know if any more information is needed from my side.

Note: The starting character in all values can sometimes in “lower case” or sometimes in “upper case”.

Interesting problem, although it would have been helpful had the OP provided the DDL and DML to create the test case.  Let’s see if there is a hard way to solve this problem:

CREATE TABLE T10(HOMEWORK VARCHAR2(20));

INSERT INTO T10 VALUES ('a12345');
INSERT INTO T10 VALUES ('A123423');
INSERT INTO T10 VALUES ('g13452');
INSERT INTO T10 VALUES ('G452323');
INSERT INTO T10 VALUES ('h34423');
INSERT INTO T10 VALUES ('r34323');
INSERT INTO T10 VALUES ('b23232');
INSERT INTO T10 VALUES ('n232323');
INSERT INTO T10 VALUES ('NB151517');
INSERT INTO T10 VALUES ('C0151517');
INSERT INTO T10 VALUES ('f9151517');
INSERT INTO T10 VALUES ('HE4423');

COMMIT;

Note that I added a couple of extra rows just for fun (actually to help with testing).

Let’s look at the ASCII values of the first and second characters:

SELECT
  HOMEWORK,
  ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1,
  ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2
FROM
  T10;

HOMEWORK     ASC_VAL1   ASC_VAL2
---------- ---------- ----------
a12345             97         49
A123423            65         49
g13452            103         49
G452323            71         52
h34423            104         51
r34323            114         51
b23232             98         50
n232323           110         50
NB151517           78         66
C0151517           67         48
f9151517          102         57
HE4423             72         69

OK, I see the ones that we want to exclude, let’s build a matrix:

SELECT
  HOMEWORK,
  ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1,
  ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2,
  DECODE(ASCII(SUBSTR(HOMEWORK,1,1)),104,1,72,1,66,1,98,1,78,1,110,1,0) IS_EXC1,
  DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-47),1,DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-58),-1,1,0),0) IS_EXC2
FROM
  T10;

HOMEWORK     ASC_VAL1   ASC_VAL2    IS_EXC1    IS_EXC2
---------- ---------- ---------- ---------- ----------
a12345             97         49          0          1
A123423            65         49          0          1
g13452            103         49          0          1
G452323            71         52          0          1
h34423            104         51          1          1
r34323            114         51          0          1
b23232             98         50          1          1
n232323           110         50          1          1
NB151517           78         66          1          0
C0151517           67         48          0          1
f9151517          102         57          0          1
HE4423             72         69          1          0

If there is a 1 in both of the right-most columns, then the row should be eliminated.  What is the easiest way to tell if there is a 1 in both columns?  Multiply the column values together, and if we receive a product of 1 then the row should be excluded:

SELECT
  *
FROM
  (SELECT
    HOMEWORK,
    ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1,
    ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2,
    DECODE(ASCII(SUBSTR(HOMEWORK,1,1)), 104,1,72,1,66,1,98,1,78,1,110,1,0) IS_EXC1,
    DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-47),1,DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-58),-1,1,0),0) IS_EXC2
  FROM
    T10)
WHERE
  IS_EXC1*IS_EXC2<>1;

HOMEWORK     ASC_VAL1   ASC_VAL2    IS_EXC1    IS_EXC2
---------- ---------- ---------- ---------- ----------
a12345             97         49          0          1
A123423            65         49          0          1
g13452            103         49          0          1
G452323            71         52          0          1
r34323            114         51          0          1
NB151517           78         66          1          0
C0151517           67         48          0          1
f9151517          102         57          0          1
HE4423             72         69          1          0

An explanation of the IS_EXC2 column follows:

The numbers 0 through 9 have ASCII values ranging from 48 to 57.

  1. Obtain the second character in the column: SUBSTR(HOMEWORK,2,1) 
  2. Use the ASCII function to find the ASCII value of the second character 
  3. Subtract 47 from the ASCII value for the second character
  4. If the difference is greater than 0, then:
    ** Subtract 58 from that ASCII value
  5. If the difference is less than 0, then we found an ASCII value between 48 and 57 – therefore the second character must be a number
    ** Return the number 1 if the ASCII value is between 48 and 57, otherwise return 0

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Maxim Demenko offered the following as a solution to the problem which uses the RTRIM function: 

Just to mention another approach regarding your question:

SQL> with t as (
   2   select 'a12345' c from dual  union all
   3   select 'A123423' from dual  union all
   4   select 'g13452' from dual  union all
   5   select 'G452323' from dual  union all
   6   select 'h34423' from dual  union all
   7   select 'r34323' from dual  union all
   8   select 'b23232' from dual  union all
   9   select 'n' from dual union all
  10   select 'n232323' from dual
  11  )
  12  -- End test data
  13  select c
  14  from t
  15  where not lower(rtrim(c,'0123456789')) in ('h','b','n')
  16  /

C
-------
a12345
A123423
g13452
G452323
r34323

Maxim’s solution is quite impressive.  Here is an explanation of his solution:

SELECT
  *
FROM
  T10;

HOMEWORK
--------
a12345
A123423
g13452
G452323
h34423
r34323
b23232
n232323
NB151517
C0151517
f9151517
HE4423

The demo table has 12 rows.

The first part of his solution does this:

SELECT
  HOMEWORK,
  RTRIM(HOMEWORK,'0123456789') TEST
FROM
  T10;

HOMEWORK   TEST
---------- ----
a12345     a
A123423    A
g13452     g
G452323    G
h34423     h
r34323     r
b23232     b
n232323    n
NB151517   NB
C0151517   C
f9151517   f
HE4423     HE

Notice in the above that the TEST column shows that the RTRIM function eliminated everything to the right of the first digit, including that first digit.  Then, his solution simply determines if what is left (in the TEST column) is one of h, b, or n, and if it is, the row is eliminated.

The output of Maxim’s solution:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  NOT LOWER(RTRIM(HOMEWORK,'0123456789')) IN ('h','b','n');

HOMEWORK
---------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now that we have seen the hard way to solve the problem and a very clever way to solve it, are there any other ways?

A CASE structure could be used rather than the cumbersome nested DECODE and SIGN statements.  A CASE structure will be easier to maintain:

SELECT
  CASE WHEN ASCII(SUBSTR(HOMEWORK,2,1)) >= 48
        AND ASCII(SUBSTR(HOMEWORK,2,1)) <= 57 THEN 1
    ELSE 0 END IS_EXC2
FROM
  T10;

You could transform this section to a CASE structure also:

DECODE(ASCII(SUBSTR(HOMEWORK,1,1)),104,1,72,1,66,1,98,1,78,1,110,1,0) IS_EXC1

 

SELECT
  CASE ASCII(SUBSTR(HOMEWORK,1,1))
    WHEN 104 THEN 1
    WHEN 72 THEN 1
    WHEN 66 THEN 1
    WHEN 98 THEN 1
    WHEN 78 THEN 1
    WHEN 110 THEN 1
    ELSE 0 END IS_EXC1
FROM
  T10;

Finally, you could combine the two CASE structures in the WHERE clause:

SELECT
  HOMEWORK,
  ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1,
  ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2
FROM
  T10
WHERE
  (CASE ASCII(SUBSTR(HOMEWORK,1,1))
    WHEN 104 THEN 1
    WHEN 72 THEN 1
    WHEN 66 THEN 1
    WHEN 98 THEN 1
    WHEN 78 THEN 1
    WHEN 110 THEN 1
    ELSE 0 END) *
  (CASE WHEN ASCII(SUBSTR(HOMEWORK,2,1)) >= 48
        AND ASCII(SUBSTR(HOMEWORK,2,1)) <= 57 THEN 1
    ELSE 0 END) = 0;

HOMEWORK     ASC_VAL1   ASC_VAL2
---------- ---------- ----------
a12345             97         49
A123423            65         49
g13452            103         49
G452323            71         52
r34323            114         51
NB151517           78         66
C0151517           67         48
f9151517          102         57
HE4423             72         69

Here are a couple more solutions:
The silly way with a MINUS operation:

SELECT
  HOMEWORK
FROM
  T10
MINUS
SELECT
  HOMEWORK
FROM
  T10
WHERE
  UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N')
  AND SUBSTR(HOMEWORK,2,1) IN ('1','2','3','4','5','6','7','8','9','0');

HOMEWORK
--------
A123423
C0151517
G452323
HE4423
NB151517
a12345
f9151517
g13452
r34323

The neat solution with MINUS:

SELECT
  HOMEWORK
FROM
  T10
MINUS
SELECT
  HOMEWORK
FROM
  T10
WHERE
  UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N')
  AND SUBSTR(HOMEWORK,2,1) IN (
    SELECT
      TO_CHAR(ROWNUM-1)
    FROM
      DUAL
    CONNECT BY
      LEVEL<=10);

HOMEWORK
--------
A123423
C0151517
G452323
HE4423
NB151517
a12345
f9151517
g13452
r34323

The NOT method:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  NOT(UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N')
    AND SUBSTR(HOMEWORK,2,1) IN
('1','2','3','4','5','6','7','8','9','0'));

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

The neat solution with NOT:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  NOT(UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N')
    AND SUBSTR(HOMEWORK,2,1) IN (
      SELECT
        TO_CHAR(ROWNUM-1)
      FROM
        DUAL
      CONNECT BY
        LEVEL<=10));

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

The left outer join method:

SELECT
  T10.HOMEWORK
FROM
  T10,
  (SELECT
    HOMEWORK
  FROM
    T10
  WHERE
    (UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N'))
    AND (SUBSTR(HOMEWORK,2,1) IN (
      SELECT
        TO_CHAR(ROWNUM-1)
      FROM
        DUAL
      CONNECT BY
        LEVEL<=10))) NT10
WHERE
  T10.HOMEWORK=NT10.HOMEWORK(+)
  AND NT10.HOMEWORK IS NULL;

HOMEWORK
--------
A123423
C0151517
r34323
HE4423
g13452
f9151517
a12345
G452323
NB151517

The Cartesian join method:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  UPPER(SUBSTR(HOMEWORK,1,2)) NOT IN
(SELECT
  L||N
FROM
  (SELECT
    DECODE(ROWNUM,1,'H',2,'B',3,'N') L
  FROM
    DUAL
  CONNECT BY
    LEVEL<=3),
  (SELECT
    TO_CHAR(ROWNUM-1) N
  FROM
    DUAL
  CONNECT BY
    LEVEL<=10));

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Mark Powell offered the following method using the TRANSLATE function:

Here is a solution that uses a translate function.  My result vary because I could not remember the actual starting letters specified by the OP as I do not have access to Oracle and the forum at the same time.  I made my solution case sensitive and used “b,g, and h”.  I added two rows to ensure at least one row that started with one of the exclude letters when followed by digits whould appear in the output.

1 > select * from t10
  2  where homework not in (
  3    select homework
  4    from t10
  5    where ( substr(homework,1,1) in ('b','g','h')
  6    and instr(translate(homework,'012345678','999999999'),'9') > 0 ))
  7  /

HOMEWORK
--------------------
a12345
A123423
G452323
r34323
n232323
NB151517
C0151517
f9151517
HE4423
hxxxxxxx          -- added
gabcdefg          -- added

11 rows selected.

The above assumes that all the data is of the form Letter || digits and that no data with mixed letters and digits where the presence of letters should cause the data to not be excluded.  The following would handle data with those rules using something like h123x as a test case.

  5    where ( substr(homework,1,1) in (‘b’,’g’,’h’)
  6    and       replace(translate(substr(homework,2,length (homework)),
  7            ‘012345678’,’999999999′),’9′,”) is null

Using an upper or lower rtrim depending on case sensitivity desired as Maxum demostrated does seem a lot slicker of a solution.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If the OP were running Oracle 10g R1 or later the following would also work:
REGEXP_INSTR method:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  REGEXP_INSTR(UPPER(HOMEWORK),'[HBN][0123456789]')<>1;

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

Shortened version of the above:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  REGEXP_INSTR(UPPER(HOMEWORK),'[HBN][0-9]')<>1

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423

REGEXP_REPLACE method:

SELECT
  HOMEWORK
FROM
  T10
WHERE
  REGEXP_REPLACE(SUBSTR(UPPER(HOMEWORK),1,2),'[HBN][0123456789]',NULL) IS NOT NULL;

HOMEWORK
--------
a12345
A123423
g13452
G452323
r34323
NB151517
C0151517
f9151517
HE4423