SQL 문법
개요
NazareDB에서 사용할 수 있는 SQL 문법에 대해 설명합니다. NazareDB는 Apache DataFusion을 기반으로 하며, DataFusion이 제공하는 SQL 문법과 호환되는 기능을 제공합니다. 다만 현재 NazreDB는 읽기에 최적화된 데이터 베이스 입니다. 따라서 테이블을 생성하는 DDL 문법과 데이터를 삽입하는 DML 문법 등 은 지원하지 않습니다. 또한 일부 최신 DataFusion의 기능은 지원하지 않을 수 있습니다.
아래의 문법 들은 도구 메뉴의 JupyterHub 서비스 를 실행하여 확인하실 수 있습니다. 자세한 Apache DataFusion의 SQL 문법은 DataFusion SQL Syntax를 참조하세요.
Data Types
NazareDB는 다음과 같은 SQL 데이터 타입을 지원합니다. 해당 데이터 타입 들은 Arrow DataType과 호환됩니다.
-
문자 타입 (Character Types)
CHAR
VARCHAR
TEXT
STRING
-
숫자 타입 (Numeric Types)
TINYINT
SMALLINT
INT
orINTEGER
BIGINT
FLOAT
REAL
DOUBLE
DECIMAL(precision, scale)
-
날짜/시간 타입 (Date/Time Types)
DATE
TIME
TIMESTAMP
: Nanosecond precision 만 지원합니다.INTERVAL
-
불리언 타입 (Boolean Types)
BOOLEAN
-
이진 타입 (Binary Types)
BYTEA
DataTypes에 대한 자세한 설명은 DataFusion SQL DataTypes를 참조하세요.
SQL 문법
이 섹션에서는 NazareDB에서 사용할 수 있는 기본 SQL 문법에 대해 설명합니다.
Information Schema
아래의 명령어로 NazareDB의 테이블 리스트와 각 테이블의 컬럼 스키마를 확인할 수 있습니다.
show tables;
+---------------+--------------------+------------+------------+
| table_catalog | table_schema | table_name | table_type |
+---------------+--------------------+------------+------------+
| datafusion | public | t | BASE TABLE |
| datafusion | information_schema | tables | VIEW |
| datafusion | information_schema | views | VIEW |
| datafusion | information_schema | columns | VIEW |
+---------------+--------------------+------------+------------+
show columns from table;
+---------------+--------------+------------+-------------+-----------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
+---------------+--------------+------------+-------------+-----------+-------------+
| datafusion | public | t | Int64(1) | Int64 | NO |
+---------------+--------------+------------+-------------+-----------+-------------+
Select
NazareDB는 다음과 같은 SELECT 문을 지원합니다.
- With
```sql
WITH x AS (SELECT a, MAX(b) AS b FROM t GROUP BY a)
SELECT a, b FROM x;
- Select
SELECT a, b, a + b FROM table
SELECT DISTINCT person, age FROM employees
- Where
SELECT a FROM table WHERE a > 10
- Group By
SELECT a, b, MAX(c) FROM table GROUP BY a, b
SELECT a, b, ARRAY_AGG(c, ORDER BY d) FROM table GROUP BY a, b
SELECT a, b, MAX(c) FROM table GROUP BY a, b HAVING MAX(c) > 10
- Union
SELECT
a,
b,
c
FROM table1
UNION ALL
SELECT
a,
b,
c
FROM table2
- Order By
SELECT age, person FROM table ORDER BY age;
SELECT age, person FROM table ORDER BY age DESC;
SELECT age, person FROM table ORDER BY age, person DESC;
- Limit
SELECT age, person FROM table LIMIT 10
- Join
-- Inner Join
select * from x inner join x y ON x.column_1 = y.column_1;
+----------+----------+----------+----------+
| column_1 | column_2 | column_1 | column_2 |
+----------+----------+----------+----------+
| 1 | 2 | 1 | 2 |
+----------+----------+----------+----------+
-- Left Outer Join
select * from x left join x y ON x.column_1 = y.column_2;
+----------+----------+----------+----------+
| column_1 | column_2 | column_1 | column_2 |
+----------+----------+----------+----------+
| 1 | 2 | | |
+----------+----------+----------+----------+
-- Right Outer Join
select * from x right join x y ON x.column_1 = y.column_2;
+----------+----------+----------+----------+
| column_1 | column_2 | column_1 | column_2 |
+----------+----------+----------+----------+
| | | 1 | 2 |
+----------+----------+----------+----------+
-- Full Outer Join
select * from x full outer join x y ON x.column_1 = y.column_2;
+----------+----------+----------+----------+
| column_1 | column_2 | column_1 | column_2 |
+----------+----------+----------+----------+
| 1 | 2 | | |
| | | 1 | 2 |
+----------+----------+----------+----------+
-- Natural Join
select * from x natural join x y;
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1 | 2 |
+----------+----------+
-- Cross Join
select * from x cross join x y;
+----------+----------+----------+----------+
| column_1 | column_2 | column_1 | column_2 |
+----------+----------+----------+----------+
| 1 | 2 | 1 | 2 |
+----------+----------+----------+----------+
자세한 SELECT 문법은 DataFusion SQL SELECT를 참조하세요.
Subquery
-- Subquery in SELECT clause
SELECT
column_1,
(
SELECT
first_value(string)
FROM
y
WHERE
number = x.column_1
) AS "numeric string"
FROM
x;
+----------+----------------+
| column_1 | numeric string |
+----------+----------------+
| 1 | one |
| 2 | two |
+----------+----------------+
-- Subquery in FROM clause
SELECT
column_2
FROM
(
SELECT
*
FROM
x
WHERE
column_1 > 1
);
+----------+
| column_2 |
+----------+
| 4 |
+----------+
-- Subquery in WHERE clause
SELECT
*
FROM
x
WHERE
column_2 > (
SELECT
AVG(number)
FROM
y
);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 2 | 4 |
+----------+----------+
-- Subquery in HAVING clause
SELECT
AVG(number) AS avg,
(number % 2 = 0) AS even
FROM
y
GROUP BY
even
HAVING
avg = (
SELECT
MAX(column_1)
FROM
x
);
+-------+--------+
| avg | even |
+-------+--------+
| 2 | false |
+-------+--------+
Operators
NazareDB는 다음과 같은 연산자를 지원합니다.
-- Numerical Operators
SELECT 1 + 2;
SELECT 4 - 3;
SELECT 2 * 3;
SELECT 8 / 4;
SELECT 7 % 3;
-- Comparison Operators
SELECT 1 = 1;
SELECT 1 != 2;
SELECT 3 < 4;
SELECT 3 <= 3;
SELECT 6 > 5;
SELECT 5 >= 5;
SELECT 0 IS DISTINCT FROM NULL;
SELECT NULL IS NOT DISTINCT FROM NULL;
-- Regex Match
SELECT 'datafusion' ~ '^datafusion(-cli)*';
+-------------------------------------------------+
| Utf8("datafusion") ~ Utf8("^datafusion(-cli)*") |
+-------------------------------------------------+
| true |
+-------------------------------------------------+
-- Regex Case-Insensitive Match
SELECT 'datafusion' ~* '^DATAFUSION(-cli)*';
+--------------------------------------------------+
| Utf8("datafusion") ~* Utf8("^DATAFUSION(-cli)*") |
+--------------------------------------------------+
| true |
+--------------------------------------------------+
-- Not Regex Match
SELECT 'datafusion' !~ '^DATAFUSION(-cli)*';
+--------------------------------------------------+
| Utf8("datafusion") !~ Utf8("^DATAFUSION(-cli)*") |
+--------------------------------------------------+
| true |
+--------------------------------------------------+
-- Not Regex Case-Insensitive Match
SELECT 'datafusion' !~* '^DATAFUSION(-cli)+';
+---------------------------------------------------+
| Utf8("datafusion") !~* Utf8("^DATAFUSION(-cli)+") |
+---------------------------------------------------+
| true |
+---------------------------------------------------+
-- Like Match
SELECT 'datafusion' ~~ 'dat_f%n';
+---------------------------------------+
| Utf8("datafusion") ~~ Utf8("dat_f%n") |
+---------------------------------------+
| true |
+---------------------------------------+
-- Case-Insensitive Like Match
SELECT 'datafusion' ~~* 'Dat_F%n';
+----------------------------------------+
| Utf8("datafusion") ~~* Utf8("Dat_F%n") |
+----------------------------------------+
| true |
+----------------------------------------+
-- Not Like Match
SELECT 'datafusion' !~~ 'Dat_F%n';
+----------------------------------------+
| Utf8("datafusion") !~~ Utf8("Dat_F%n") |
+----------------------------------------+
| true |
+----------------------------------------+
-- Not Case-Insensitive Like Match
SELECT 'datafusion' !~~* 'Dat%F_n';
+-----------------------------------------+
| Utf8("datafusion") !~~* Utf8("Dat%F_n") |
+-----------------------------------------+
| true |
+-----------------------------------------+
+-----------------------------------------+
| Utf8("datafusion") !~~* Utf8("Dat%F_n") |
+-----------------------------------------+
| true |
+-----------------------------------------+
-- Logical Operators
SELECT true AND true;
+---------------------------------+
| Boolean(true) AND Boolean(true) |
+---------------------------------+
| true |
+---------------------------------+
SELECT false OR true;
+---------------------------------+
| Boolean(false) OR Boolean(true) |
+---------------------------------+
| true |
+---------------------------------+
-- Bitwise Operators
SELECT 5 & 3;
+---------------------+
| Int64(5) & Int64(3) |
+---------------------+
| 1 |
+---------------------+
SELECT 5 | 3;
+---------------------+
| Int64(5) | Int64(3) |
+---------------------+
| 7 |
+---------------------+
SELECT 5 # 3;
+---------------------+
| Int64(5) # Int64(3) |
+---------------------+
| 6 |
+---------------------+
SELECT 5 >> 3;
+----------------------+
| Int64(5) >> Int64(3) |
+----------------------+
| 0 |
+----------------------+
SELECT 5 << 3;
+----------------------+
| Int64(5) << Int64(3) |
+----------------------+
| 40 |
+----------------------+
-- String Concatenation
SELECT 'Hello, ' || 'DataFusion!';
+----------------------------------------+
| Utf8("Hello, ") || Utf8("DataFusion!") |
+----------------------------------------+
| Hello, DataFusion! |
+----------------------------------------+
-- Array Contains
SELECT make_array(1,2,3) @> make_array(1,3);
+-------------------------------------------------------------------------+
| make_array(Int64(1),Int64(2),Int64(3)) @> make_array(Int64(1),Int64(3)) |
+-------------------------------------------------------------------------+
| true |
+-------------------------------------------------------------------------+
-- Array Is Contained By
SELECT make_array(1,3) <@ make_array(1,2,3);
+-------------------------------------------------------------------------+
| make_array(Int64(1),Int64(3)) <@ make_array(Int64(1),Int64(2),Int64(3)) |
+-------------------------------------------------------------------------+
| true |
+-------------------------------------------------------------------------+
자세한 연산자에 대한 설명은 DataFusion SQL Operators를 참조하세요.
Aggregate Functions
NazareDB는 다음과 같은 일반 집계 함수 들을 지원합니다.
- 일반 함수
- avg
- bit_and
- bit_or
- bit_xor
- bool_and
- bool_or
- count
- max
- mean
- median
- min
- sum
- array_agg
- first_value
- last_value
- 통계 함수
- corr
- covar
- covar_pop
- covar_samp
- stddev
- stddev_pop
- stddev_samp
- var
- var_pop
- var_samp
- regr_avgx
- regr_avgy
- regr_count
- regr_intercept
- regr_r2
- regr_slope
- regr_sxx
- regr_syy
- regr_sxy
- 근사 함수
- approx_distinct
- approx_median
- approx_percentile_cont
- approx_percentile_cont_with_weight
각 집계 함수에 대한 설명은 DataFusion SQL Aggregate Functions를 참조하세요.
Window Functions
아래는 window 함수를 사용한 예제입니다.
SELECT depname, empno, salary,
rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
+-----------+-------+--------+--------+
| depname | empno | salary | rank |
+-----------+-------+--------+--------+
| personnel | 2 | 3900 | 1 |
| develop | 8 | 6000 | 1 |
| develop | 10 | 5200 | 2 |
| develop | 11 | 5200 | 2 |
| develop | 9 | 4500 | 4 |
| develop | 7 | 4200 | 5 |
| sales | 1 | 5000 | 1 |
| sales | 4 | 4800 | 2 |
| personnel | 5 | 3500 | 2 |
| sales | 3 | 4800 | 2 |
+-----------+-------+--------+--------+
Window 함수 사용시에 위에서 설명한 모든 Aggregate
함수를 사용할 수 있습니다.
추가적으로 아래의 Ranking 함수와 Analytic 함수를 지원합니다.
- Ranking Functions
- row_number
- rank
- dense_rank
- ntile(expression)
- Analytic Functions
- cume_dist
- percent_rank
- lag(expression, offset, default)
- lead(expression, offset, default)
- first_value(expression)
- last_value(expression)
- nth_value(expression, n)
자세한 Window 함수에 대한 설명은 DataFusion SQL Window Functions를 참조하세요.
Scalar Functions
NazareDB는 다음과 같은 스칼라 함수 들을 지원합니다.
- Math Functions
- abs(numeric_expression)
- acos(numeric_expression)
- acosh(numeric_expression)
- asin(numeric_expression)
- asinh(numeric_expression)
- atan(numeric_expression)
- atanh(numeric_expression)
- atan2(expression_y, expression_x)
- cbrt(numeric_expression)
- ceil(numeric_expression)
- cos(numeric_expression)
- cosh(numeric_expression)
- degrees(numeric_expression)
- exp(numeric_expression)
- factorial(numeric_expression)
- floor(numeric_expression)
- gcd(expression_x, expression_y)
- isnan(numeric_expression)
- iszero(numeric_expression)
- lcm(expression_x, expression_y)
- ln(numeric_expression)
- log(base, numeric_expression)
- log10(numeric_expression)
- log2(numeric_expression)
- nanvl(expression_x, expression_y)
- pi()
- power(base, exponent)
- radians(numeric_expression)
- random()
- round(numeric_expression[, decimal_places])
- signum(numeric_expression)
- sin(numeric_expression)
- sinh(numeric_expression)
- sqrt(numeric_expression)
- tan(numeric_expression)
- tanh(numeric_expression)
- trunc(numeric_expression[, decimal_places])
- Conditional Functions
- coalesce(expression1[, ..., expression_n])
- nullif(expression1, expression2)
- nvl(expression1, expression2)
- nvl2(expression1, expression2, expression3)
- String Functions
- ascii(str)
- bit_length(str)
- btrim(str[, trim_str]) or trim(str[, trim_str])
- length(str) or char_length(str)
- concat(str[, ..., str_n])
- concat_ws(separator, str[, ..., str_n])
- chr(expression)
- ends_with(str, substr)
- initcap(str)
- left(str, n)
- length(str)
- lower(str)
- lpad(str, n[, padding_str])
- ltrim(str[, trim_str])
- octet_length(str)
- repeat(str, n)
- replace(str, substr, replacement)
- reverse(str)
- right(str, n)
- rpad(str, n[, padding_str])
- rtrim(str[, trim_str])
- split_part(str, delimiter, pos)
- starts_with(str, substr)
- strpos(str, substr) or instr(str, substr)
- substr(str, start_pos[, length]) or substring(str, start_pos[, length])
- translate(str, chars, translation)
- to_hex(int)
- upper(str)
- uuid()
- overlay(str PLACING substr FROM pos [FOR count])
- levenshtein(str1, str2)
- substr_index(str, delim, count)
- find_in_set(str, strlist)
- Binary String Functions
- encode(expression, format)
- decode(expression, format)
- Regular Expression Functions
- regexp_like(str, regexp[, flags])
- regexp_match(str, regexp[, flags])
- regexp_replace(str, regexp, replacement[, flags])
- position(substr in origstr)
- Time and Date Functions
- now()
- current_date() or today()
- current_time()
- date_bin(interval, expression, origin-timestamp)
- date_trunc(precision, expression)
- date_part(part, expression)
- extract(field FROM source)
- make_date(year, month, day)
- to_char(expression, format)
- to_date(expression[, ..., format_n])
- to_timestamp(expression[, ..., format_n])
- to_timestamp_millis(expression[, ..., format_n])
- to_timestamp_micros(expression[, ..., format_n])
- to_timestamp_nanos(expression[, ..., format_n])
- to_timestamp_seconds(expression[, ..., format_n])
- from_unixtime(expression)
- to_unixtime(expression[, ..., format_n])
- Array Funcitons
- array_append(array, element)
- array_sort(array, desc, nulls_first)
- array_resize(array, size, value)
- array_concat(array[, ..., array_n])
- array_has(array, element)
- array_has_all(array, sub-array)
- array_has_any(array, sub-array)
- array_dims(array)
- array_distinct(array)
- array_element(array, index)
- array_fill(element, array)
- flatten(array)
- array_intersect(array1, array2)
- array_length(array, dimension)
- array_ndims(array, element)
- array_prepend(element, array)
- array_pop_front(array)
- array_pop_back(array)
- array_position(array, element[, index])
- array_positions(array, element)
- array_repeat(element, count)
- array_remove(array, element)
- array_remove_n(array, element, max)
- array_remove_all(array, element)
- array_replace(array, from, to)
- array_replace_n(array, from, to, max)
- array_replace_all(array, from, to)
- array_reverse(array)
- array_slice(array, begin, end)
- array_to_string(array, delimiter)
- array_union(array1, array2)
- array_except(array1, array2)
- cardinality(array)
- empty(array)
- generate_series(start, stop, step)
- make_array(expression1[, ..., expression_n])
- string_to_array(str, delimiter[, null_str])
- trim_array(array, n)
- unnest(array)
- range(startm end, step)
- Map Functions is not supported yet
- Hash Functions
- digest(expression, algorithm)
- md5(expression)
- sha224(expression)
- sha256(expression)
- sha384(expression)
- sha512(expression)
- Other Functions
- arrow_cast(expression, datatype)
- arrow_typeof(expression)
그 외 다양한 스칼라 함수에 대한 설명은 DataFusion SQL Scalar Functions를 참조하세요.