January 3, 2010
(Back to the Previous Post in the Series)
In a recent message thread on the comp.databases.oracle.misc group, the following question was asked:
I am currently using Oracle9i Enterprise Edition Release 9.2.0.4.0. I have a table with following data
Table 1 (Sample data)
a12345 A123423 g13452 G452323 h34423 r34323 b23232 n232323I am currently using this as a subquery in one of the query. As per a new request I have to now exclude all values which start with h, b or n followed by numeric values. So end result the subquery should give me is
Table 1 (Sample data)
a12345 A123423 g13452 G452323 r34323I am little stumped on this for now. Could not get it right in my query. Can anyone please advise here. Let me know if any more information is needed from my side.
Note: The starting character in all values can sometimes in “lower case” or sometimes in “upper case”.
Interesting problem, although it would have been helpful had the OP provided the DDL and DML to create the test case. Let’s see if there is a hard way to solve this problem:
CREATE TABLE T10(HOMEWORK VARCHAR2(20)); INSERT INTO T10 VALUES ('a12345'); INSERT INTO T10 VALUES ('A123423'); INSERT INTO T10 VALUES ('g13452'); INSERT INTO T10 VALUES ('G452323'); INSERT INTO T10 VALUES ('h34423'); INSERT INTO T10 VALUES ('r34323'); INSERT INTO T10 VALUES ('b23232'); INSERT INTO T10 VALUES ('n232323'); INSERT INTO T10 VALUES ('NB151517'); INSERT INTO T10 VALUES ('C0151517'); INSERT INTO T10 VALUES ('f9151517'); INSERT INTO T10 VALUES ('HE4423'); COMMIT;
Note that I added a couple of extra rows just for fun (actually to help with testing).
Let’s look at the ASCII values of the first and second characters:
SELECT HOMEWORK, ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1, ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2 FROM T10; HOMEWORK ASC_VAL1 ASC_VAL2 ---------- ---------- ---------- a12345 97 49 A123423 65 49 g13452 103 49 G452323 71 52 h34423 104 51 r34323 114 51 b23232 98 50 n232323 110 50 NB151517 78 66 C0151517 67 48 f9151517 102 57 HE4423 72 69
OK, I see the ones that we want to exclude, let’s build a matrix:
SELECT HOMEWORK, ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1, ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2, DECODE(ASCII(SUBSTR(HOMEWORK,1,1)),104,1,72,1,66,1,98,1,78,1,110,1,0) IS_EXC1, DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-47),1,DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-58),-1,1,0),0) IS_EXC2 FROM T10; HOMEWORK ASC_VAL1 ASC_VAL2 IS_EXC1 IS_EXC2 ---------- ---------- ---------- ---------- ---------- a12345 97 49 0 1 A123423 65 49 0 1 g13452 103 49 0 1 G452323 71 52 0 1 h34423 104 51 1 1 r34323 114 51 0 1 b23232 98 50 1 1 n232323 110 50 1 1 NB151517 78 66 1 0 C0151517 67 48 0 1 f9151517 102 57 0 1 HE4423 72 69 1 0
If there is a 1 in both of the right-most columns, then the row should be eliminated. What is the easiest way to tell if there is a 1 in both columns? Multiply the column values together, and if we receive a product of 1 then the row should be excluded:
SELECT * FROM (SELECT HOMEWORK, ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1, ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2, DECODE(ASCII(SUBSTR(HOMEWORK,1,1)), 104,1,72,1,66,1,98,1,78,1,110,1,0) IS_EXC1, DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-47),1,DECODE(SIGN(ASCII(SUBSTR(HOMEWORK,2,1))-58),-1,1,0),0) IS_EXC2 FROM T10) WHERE IS_EXC1*IS_EXC2<>1; HOMEWORK ASC_VAL1 ASC_VAL2 IS_EXC1 IS_EXC2 ---------- ---------- ---------- ---------- ---------- a12345 97 49 0 1 A123423 65 49 0 1 g13452 103 49 0 1 G452323 71 52 0 1 r34323 114 51 0 1 NB151517 78 66 1 0 C0151517 67 48 0 1 f9151517 102 57 0 1 HE4423 72 69 1 0
An explanation of the IS_EXC2 column follows:
The numbers 0 through 9 have ASCII values ranging from 48 to 57.
- Obtain the second character in the column: SUBSTR(HOMEWORK,2,1)
- Use the ASCII function to find the ASCII value of the second character
- Subtract 47 from the ASCII value for the second character
- If the difference is greater than 0, then:
** Subtract 58 from that ASCII value - If the difference is less than 0, then we found an ASCII value between 48 and 57 – therefore the second character must be a number
** Return the number 1 if the ASCII value is between 48 and 57, otherwise return 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Maxim Demenko offered the following as a solution to the problem which uses the RTRIM function:
Just to mention another approach regarding your question:
SQL> with t as ( 2 select 'a12345' c from dual union all 3 select 'A123423' from dual union all 4 select 'g13452' from dual union all 5 select 'G452323' from dual union all 6 select 'h34423' from dual union all 7 select 'r34323' from dual union all 8 select 'b23232' from dual union all 9 select 'n' from dual union all 10 select 'n232323' from dual 11 ) 12 -- End test data 13 select c 14 from t 15 where not lower(rtrim(c,'0123456789')) in ('h','b','n') 16 / C ------- a12345 A123423 g13452 G452323 r34323
Maxim’s solution is quite impressive. Here is an explanation of his solution:
SELECT * FROM T10; HOMEWORK -------- a12345 A123423 g13452 G452323 h34423 r34323 b23232 n232323 NB151517 C0151517 f9151517 HE4423
The demo table has 12 rows.
The first part of his solution does this:
SELECT HOMEWORK, RTRIM(HOMEWORK,'0123456789') TEST FROM T10; HOMEWORK TEST ---------- ---- a12345 a A123423 A g13452 g G452323 G h34423 h r34323 r b23232 b n232323 n NB151517 NB C0151517 C f9151517 f HE4423 HE
Notice in the above that the TEST column shows that the RTRIM function eliminated everything to the right of the first digit, including that first digit. Then, his solution simply determines if what is left (in the TEST column) is one of h, b, or n, and if it is, the row is eliminated.
The output of Maxim’s solution:
SELECT HOMEWORK FROM T10 WHERE NOT LOWER(RTRIM(HOMEWORK,'0123456789')) IN ('h','b','n'); HOMEWORK --------- a12345 A123423 g13452 G452323 r34323 NB151517 C0151517 f9151517 HE4423
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now that we have seen the hard way to solve the problem and a very clever way to solve it, are there any other ways?
A CASE structure could be used rather than the cumbersome nested DECODE and SIGN statements. A CASE structure will be easier to maintain:
SELECT CASE WHEN ASCII(SUBSTR(HOMEWORK,2,1)) >= 48 AND ASCII(SUBSTR(HOMEWORK,2,1)) <= 57 THEN 1 ELSE 0 END IS_EXC2 FROM T10;
You could transform this section to a CASE structure also:
DECODE(ASCII(SUBSTR(HOMEWORK,1,1)),104,1,72,1,66,1,98,1,78,1,110,1,0) IS_EXC1
SELECT CASE ASCII(SUBSTR(HOMEWORK,1,1)) WHEN 104 THEN 1 WHEN 72 THEN 1 WHEN 66 THEN 1 WHEN 98 THEN 1 WHEN 78 THEN 1 WHEN 110 THEN 1 ELSE 0 END IS_EXC1 FROM T10;
Finally, you could combine the two CASE structures in the WHERE clause:
SELECT HOMEWORK, ASCII(SUBSTR(HOMEWORK,1,1)) ASC_VAL1, ASCII(SUBSTR(HOMEWORK,2,1)) ASC_VAL2 FROM T10 WHERE (CASE ASCII(SUBSTR(HOMEWORK,1,1)) WHEN 104 THEN 1 WHEN 72 THEN 1 WHEN 66 THEN 1 WHEN 98 THEN 1 WHEN 78 THEN 1 WHEN 110 THEN 1 ELSE 0 END) * (CASE WHEN ASCII(SUBSTR(HOMEWORK,2,1)) >= 48 AND ASCII(SUBSTR(HOMEWORK,2,1)) <= 57 THEN 1 ELSE 0 END) = 0; HOMEWORK ASC_VAL1 ASC_VAL2 ---------- ---------- ---------- a12345 97 49 A123423 65 49 g13452 103 49 G452323 71 52 r34323 114 51 NB151517 78 66 C0151517 67 48 f9151517 102 57 HE4423 72 69
Here are a couple more solutions:
The silly way with a MINUS operation:
SELECT HOMEWORK FROM T10 MINUS SELECT HOMEWORK FROM T10 WHERE UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N') AND SUBSTR(HOMEWORK,2,1) IN ('1','2','3','4','5','6','7','8','9','0'); HOMEWORK -------- A123423 C0151517 G452323 HE4423 NB151517 a12345 f9151517 g13452 r34323
The neat solution with MINUS:
SELECT HOMEWORK FROM T10 MINUS SELECT HOMEWORK FROM T10 WHERE UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N') AND SUBSTR(HOMEWORK,2,1) IN ( SELECT TO_CHAR(ROWNUM-1) FROM DUAL CONNECT BY LEVEL<=10); HOMEWORK -------- A123423 C0151517 G452323 HE4423 NB151517 a12345 f9151517 g13452 r34323
The NOT method:
SELECT HOMEWORK FROM T10 WHERE NOT(UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N') AND SUBSTR(HOMEWORK,2,1) IN ('1','2','3','4','5','6','7','8','9','0')); HOMEWORK -------- a12345 A123423 g13452 G452323 r34323 NB151517 C0151517 f9151517 HE4423
The neat solution with NOT:
SELECT HOMEWORK FROM T10 WHERE NOT(UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N') AND SUBSTR(HOMEWORK,2,1) IN ( SELECT TO_CHAR(ROWNUM-1) FROM DUAL CONNECT BY LEVEL<=10)); HOMEWORK -------- a12345 A123423 g13452 G452323 r34323 NB151517 C0151517 f9151517 HE4423
The left outer join method:
SELECT T10.HOMEWORK FROM T10, (SELECT HOMEWORK FROM T10 WHERE (UPPER(SUBSTR(HOMEWORK,1,1)) IN ('H','B','N')) AND (SUBSTR(HOMEWORK,2,1) IN ( SELECT TO_CHAR(ROWNUM-1) FROM DUAL CONNECT BY LEVEL<=10))) NT10 WHERE T10.HOMEWORK=NT10.HOMEWORK(+) AND NT10.HOMEWORK IS NULL; HOMEWORK -------- A123423 C0151517 r34323 HE4423 g13452 f9151517 a12345 G452323 NB151517
The Cartesian join method:
SELECT HOMEWORK FROM T10 WHERE UPPER(SUBSTR(HOMEWORK,1,2)) NOT IN (SELECT L||N FROM (SELECT DECODE(ROWNUM,1,'H',2,'B',3,'N') L FROM DUAL CONNECT BY LEVEL<=3), (SELECT TO_CHAR(ROWNUM-1) N FROM DUAL CONNECT BY LEVEL<=10)); HOMEWORK -------- a12345 A123423 g13452 G452323 r34323 NB151517 C0151517 f9151517 HE4423
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mark Powell offered the following method using the TRANSLATE function:
Here is a solution that uses a translate function. My result vary because I could not remember the actual starting letters specified by the OP as I do not have access to Oracle and the forum at the same time. I made my solution case sensitive and used “b,g, and h”. I added two rows to ensure at least one row that started with one of the exclude letters when followed by digits whould appear in the output.
1 > select * from t10 2 where homework not in ( 3 select homework 4 from t10 5 where ( substr(homework,1,1) in ('b','g','h') 6 and instr(translate(homework,'012345678','999999999'),'9') > 0 )) 7 / HOMEWORK -------------------- a12345 A123423 G452323 r34323 n232323 NB151517 C0151517 f9151517 HE4423 hxxxxxxx -- added gabcdefg -- added 11 rows selected.
The above assumes that all the data is of the form Letter || digits and that no data with mixed letters and digits where the presence of letters should cause the data to not be excluded. The following would handle data with those rules using something like h123x as a test case.
5 where ( substr(homework,1,1) in (‘b’,’g’,’h’)
6 and replace(translate(substr(homework,2,length (homework)),
7 ‘012345678’,’999999999′),’9′,”) is null
Using an upper or lower rtrim depending on case sensitivity desired as Maxum demostrated does seem a lot slicker of a solution.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the OP were running Oracle 10g R1 or later the following would also work:
REGEXP_INSTR method:
SELECT HOMEWORK FROM T10 WHERE REGEXP_INSTR(UPPER(HOMEWORK),'[HBN][0123456789]')<>1; HOMEWORK -------- a12345 A123423 g13452 G452323 r34323 NB151517 C0151517 f9151517 HE4423
Shortened version of the above:
SELECT HOMEWORK FROM T10 WHERE REGEXP_INSTR(UPPER(HOMEWORK),'[HBN][0-9]')<>1 HOMEWORK -------- a12345 A123423 g13452 G452323 r34323 NB151517 C0151517 f9151517 HE4423
REGEXP_REPLACE method:
SELECT HOMEWORK FROM T10 WHERE REGEXP_REPLACE(SUBSTR(UPPER(HOMEWORK),1,2),'[HBN][0123456789]',NULL) IS NOT NULL; HOMEWORK -------- a12345 A123423 g13452 G452323 r34323 NB151517 C0151517 f9151517 HE4423
Recent Comments