SQL 문법

개요

NazareDB에서 사용할 수 있는 SQL 문법에 대해 설명합니다. NazareDB는 Apache DataFusion을 기반으로 하며, DataFusion이 제공하는 SQL 문법과 호환되는 기능을 제공합니다. 다만 현재 NazreDB는 읽기에 최적화된 데이터 베이스 입니다. 따라서 테이블을 생성하는 DDL 문법과 데이터를 삽입하는 DML 문법 등 은 지원하지 않습니다. 또한 일부 최신 DataFusion의 기능은 지원하지 않을 수 있습니다.

아래의 문법 들은 도구 메뉴의 JupyterHub 서비스 를 실행하여 확인하실 수 있습니다. 자세한 Apache DataFusion의 SQL 문법은 DataFusion SQL Syntax를 참조하세요.

Data Types

NazareDB는 다음과 같은 SQL 데이터 타입을 지원합니다. 해당 데이터 타입 들은 Arrow DataType과 호환됩니다.

문자 타입 (Character Types)
- CHAR
- VARCHAR
- TEXT
- STRING
숫자 타입 (Numeric Types)
- TINYINT
- SMALLINT
- INT or INTEGER
- BIGINT
- FLOAT
- REAL
- DOUBLE
- DECIMAL(precision, scale)
날짜/시간 타입 (Date/Time Types)
- DATE
- TIME
- TIMESTAMP: Nanosecond precision 만 지원합니다.
- INTERVAL
불리언 타입 (Boolean Types)
- BOOLEAN
이진 타입 (Binary Types)
- BYTEA

DataTypes에 대한 자세한 설명은 DataFusion SQL DataTypes를 참조하세요.

SQL 문법

이 섹션에서는 NazareDB에서 사용할 수 있는 기본 SQL 문법에 대해 설명합니다.

Information Schema

아래의 명령어로 NazareDB의 테이블 리스트와 각 테이블의 컬럼 스키마를 확인할 수 있습니다.

show tables;
+---------------+--------------------+------------+------------+
| table_catalog | table_schema       | table_name | table_type |
+---------------+--------------------+------------+------------+
| datafusion    | public             | t          | BASE TABLE |
| datafusion    | information_schema | tables     | VIEW       |
| datafusion    | information_schema | views      | VIEW       |
| datafusion    | information_schema | columns    | VIEW       |
+---------------+--------------------+------------+------------+

show columns from table;
+---------------+--------------+------------+-------------+-----------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type | is_nullable |
+---------------+--------------+------------+-------------+-----------+-------------+
| datafusion    | public       | t          | Int64(1)    | Int64     | NO          |
+---------------+--------------+------------+-------------+-----------+-------------+

Select

NazareDB는 다음과 같은 SELECT 문을 지원합니다.

- With
```sql
WITH x AS (SELECT a, MAX(b) AS b FROM t GROUP BY a)
SELECT a, b FROM x;

Select

SELECT a, b, a + b FROM table
SELECT DISTINCT person, age FROM employees

Where

SELECT a FROM table WHERE a > 10

Group By

SELECT a, b, MAX(c) FROM table GROUP BY a, b
SELECT a, b, ARRAY_AGG(c, ORDER BY d) FROM table GROUP BY a, b
SELECT a, b, MAX(c) FROM table GROUP BY a, b HAVING MAX(c) > 10

Union

SELECT
    a,
    b,
    c
FROM table1
UNION ALL
SELECT
    a,
    b,
    c
FROM table2

Order By

SELECT age, person FROM table ORDER BY age;
SELECT age, person FROM table ORDER BY age DESC;
SELECT age, person FROM table ORDER BY age, person DESC;

Limit

SELECT age, person FROM table LIMIT 10

Join

-- Inner Join
select * from x inner join x y ON x.column_1 = y.column_1;
+----------+----------+----------+----------+
| column_1 | column_2 | column_1 | column_2 |
+----------+----------+----------+----------+
| 1        | 2        | 1        | 2        |
+----------+----------+----------+----------+
-- Left Outer Join
select * from x left join x y ON x.column_1 = y.column_2;
+----------+----------+----------+----------+
| column_1 | column_2 | column_1 | column_2 |
+----------+----------+----------+----------+
| 1        | 2        |          |          |
+----------+----------+----------+----------+
-- Right Outer Join
select * from x right join x y ON x.column_1 = y.column_2;
+----------+----------+----------+----------+
| column_1 | column_2 | column_1 | column_2 |
+----------+----------+----------+----------+
|          |          | 1        | 2        |
+----------+----------+----------+----------+
-- Full Outer Join
select * from x full outer join x y ON x.column_1 = y.column_2;
+----------+----------+----------+----------+
| column_1 | column_2 | column_1 | column_2 |
+----------+----------+----------+----------+
| 1        | 2        |          |          |
|          |          | 1        | 2        |
+----------+----------+----------+----------+
-- Natural Join
select * from x natural join x y;
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1        | 2        |
+----------+----------+
-- Cross Join
select * from x cross join x y;
+----------+----------+----------+----------+
| column_1 | column_2 | column_1 | column_2 |
+----------+----------+----------+----------+
| 1        | 2        | 1        | 2        |
+----------+----------+----------+----------+

자세한 SELECT 문법은 DataFusion SQL SELECT를 참조하세요.

Subquery

-- Subquery in SELECT clause
SELECT
  column_1,
  (
    SELECT
      first_value(string)
    FROM
      y
    WHERE
      number = x.column_1
  ) AS "numeric string"
FROM
  x;

+----------+----------------+
| column_1 | numeric string |
+----------+----------------+
|        1 | one            |
|        2 | two            |
+----------+----------------+
-- Subquery in FROM clause
SELECT
  column_2
FROM
  (
    SELECT
      *
    FROM
      x
    WHERE
      column_1 > 1
  );

+----------+
| column_2 |
+----------+
|        4 |
+----------+
-- Subquery in WHERE clause
SELECT
  *
FROM
  x
WHERE
  column_2 > (
    SELECT
      AVG(number)
    FROM
      y
  );

+----------+----------+
| column_1 | column_2 |
+----------+----------+
|        2 |        4 |
+----------+----------+

-- Subquery in HAVING clause
SELECT
  AVG(number) AS avg,
  (number % 2 = 0) AS even
FROM
  y
GROUP BY
  even
HAVING
  avg = (
    SELECT
      MAX(column_1)
    FROM
      x
  );

+-------+--------+
|   avg | even   |
+-------+--------+
|     2 | false  |
+-------+--------+

Operators

NazareDB는 다음과 같은 연산자를 지원합니다.

-- Numerical Operators
SELECT 1 + 2;
SELECT 4 - 3;
SELECT 2 * 3;
SELECT 8 / 4;
SELECT 7 % 3;

-- Comparison Operators
SELECT 1 = 1;
SELECT 1 != 2;
SELECT 3 < 4;
SELECT 3 <= 3;
SELECT 6 > 5;
SELECT 5 >= 5;
SELECT 0 IS DISTINCT FROM NULL;
SELECT NULL IS NOT DISTINCT FROM NULL;

-- Regex Match
SELECT 'datafusion' ~ '^datafusion(-cli)*';
+-------------------------------------------------+
| Utf8("datafusion") ~ Utf8("^datafusion(-cli)*") |
+-------------------------------------------------+
| true                                            |
+-------------------------------------------------+
-- Regex Case-Insensitive Match
SELECT 'datafusion' ~* '^DATAFUSION(-cli)*';
+--------------------------------------------------+
| Utf8("datafusion") ~* Utf8("^DATAFUSION(-cli)*") |
+--------------------------------------------------+
| true                                             |
+--------------------------------------------------+
-- Not Regex Match
SELECT 'datafusion' !~ '^DATAFUSION(-cli)*';
+--------------------------------------------------+
| Utf8("datafusion") !~ Utf8("^DATAFUSION(-cli)*") |
+--------------------------------------------------+
| true                                             |
+--------------------------------------------------+
-- Not Regex Case-Insensitive Match
SELECT 'datafusion' !~* '^DATAFUSION(-cli)+';
+---------------------------------------------------+
| Utf8("datafusion") !~* Utf8("^DATAFUSION(-cli)+") |
+---------------------------------------------------+
| true                                              |
+---------------------------------------------------+
-- Like Match
SELECT 'datafusion' ~~ 'dat_f%n';
+---------------------------------------+
| Utf8("datafusion") ~~ Utf8("dat_f%n") |
+---------------------------------------+
| true                                  |
+---------------------------------------+
-- Case-Insensitive Like Match
SELECT 'datafusion' ~~* 'Dat_F%n';
+----------------------------------------+
| Utf8("datafusion") ~~* Utf8("Dat_F%n") |
+----------------------------------------+
| true                                   |
+----------------------------------------+
-- Not Like Match
SELECT 'datafusion' !~~ 'Dat_F%n';
+----------------------------------------+
| Utf8("datafusion") !~~ Utf8("Dat_F%n") |
+----------------------------------------+
| true                                   |
+----------------------------------------+
-- Not Case-Insensitive Like Match
SELECT 'datafusion' !~~* 'Dat%F_n';
+-----------------------------------------+
| Utf8("datafusion") !~~* Utf8("Dat%F_n") |
+-----------------------------------------+
| true                                    |
+-----------------------------------------+
+-----------------------------------------+
| Utf8("datafusion") !~~* Utf8("Dat%F_n") |
+-----------------------------------------+
| true                                    |
+-----------------------------------------+

-- Logical Operators
SELECT true AND true;
+---------------------------------+
| Boolean(true) AND Boolean(true) |
+---------------------------------+
| true                            |
+---------------------------------+

SELECT false OR true;
+---------------------------------+
| Boolean(false) OR Boolean(true) |
+---------------------------------+
| true                            |
+---------------------------------+

-- Bitwise Operators
SELECT 5 & 3;
+---------------------+
| Int64(5) & Int64(3) |
+---------------------+
| 1                   |
+---------------------+

SELECT 5 | 3;
+---------------------+
| Int64(5) | Int64(3) |
+---------------------+
| 7                   |
+---------------------+

SELECT 5 # 3;
+---------------------+
| Int64(5) # Int64(3) |
+---------------------+
| 6                   |
+---------------------+

SELECT 5 >> 3;
+----------------------+
| Int64(5) >> Int64(3) |
+----------------------+
| 0                    |
+----------------------+

SELECT 5 << 3;
+----------------------+
| Int64(5) << Int64(3) |
+----------------------+
| 40                   |
+----------------------+

-- String Concatenation
SELECT 'Hello, ' || 'DataFusion!';
+----------------------------------------+
| Utf8("Hello, ") || Utf8("DataFusion!") |
+----------------------------------------+
| Hello, DataFusion!                     |
+----------------------------------------+

-- Array Contains
SELECT make_array(1,2,3) @> make_array(1,3);
+-------------------------------------------------------------------------+
| make_array(Int64(1),Int64(2),Int64(3)) @> make_array(Int64(1),Int64(3)) |
+-------------------------------------------------------------------------+
| true                                                                    |
+-------------------------------------------------------------------------+

-- Array Is Contained By
SELECT make_array(1,3) <@ make_array(1,2,3);
+-------------------------------------------------------------------------+
| make_array(Int64(1),Int64(3)) <@ make_array(Int64(1),Int64(2),Int64(3)) |
+-------------------------------------------------------------------------+
| true                                                                    |
+-------------------------------------------------------------------------+

자세한 연산자에 대한 설명은 DataFusion SQL Operators를 참조하세요.

Aggregate Functions

NazareDB는 다음과 같은 일반 집계 함수 들을 지원합니다.

일반 함수
- avg
- bit_and
- bit_or
- bit_xor
- bool_and
- bool_or
- count
- max
- mean
- median
- min
- sum
- array_agg
- first_value
- last_value
통계 함수
- corr
- covar
- covar_pop
- covar_samp
- stddev
- stddev_pop
- stddev_samp
- var
- var_pop
- var_samp
- regr_avgx
- regr_avgy
- regr_count
- regr_intercept
- regr_r2
- regr_slope
- regr_sxx
- regr_syy
- regr_sxy
근사 함수
- approx_distinct
- approx_median
- approx_percentile_cont
- approx_percentile_cont_with_weight

각 집계 함수에 대한 설명은 DataFusion SQL Aggregate Functions를 참조하세요.

Window Functions

아래는 window 함수를 사용한 예제입니다.

SELECT depname, empno, salary,
       rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;

+-----------+-------+--------+--------+
| depname   | empno | salary | rank   |
+-----------+-------+--------+--------+
| personnel | 2     | 3900   | 1      |
| develop   | 8     | 6000   | 1      |
| develop   | 10    | 5200   | 2      |
| develop   | 11    | 5200   | 2      |
| develop   | 9     | 4500   | 4      |
| develop   | 7     | 4200   | 5      |
| sales     | 1     | 5000   | 1      |
| sales     | 4     | 4800   | 2      |
| personnel | 5     | 3500   | 2      |
| sales     | 3     | 4800   | 2      |
+-----------+-------+--------+--------+

Window 함수 사용시에 위에서 설명한 모든 Aggregate 함수를 사용할 수 있습니다.

추가적으로 아래의 Ranking 함수와 Analytic 함수를 지원합니다.

Ranking Functions
- row_number
- rank
- dense_rank
- ntile(expression)
Analytic Functions
- cume_dist
- percent_rank
- lag(expression, offset, default)
- lead(expression, offset, default)
- first_value(expression)
- last_value(expression)
- nth_value(expression, n)

자세한 Window 함수에 대한 설명은 DataFusion SQL Window Functions를 참조하세요.

Scalar Functions

NazareDB는 다음과 같은 스칼라 함수 들을 지원합니다.

Math Functions
- abs(numeric_expression)
- acos(numeric_expression)
- acosh(numeric_expression)
- asin(numeric_expression)
- asinh(numeric_expression)
- atan(numeric_expression)
- atanh(numeric_expression)
- atan2(expression_y, expression_x)
- cbrt(numeric_expression)
- ceil(numeric_expression)
- cos(numeric_expression)
- cosh(numeric_expression)
- degrees(numeric_expression)
- exp(numeric_expression)
- factorial(numeric_expression)
- floor(numeric_expression)
- gcd(expression_x, expression_y)
- isnan(numeric_expression)
- iszero(numeric_expression)
- lcm(expression_x, expression_y)
- ln(numeric_expression)
- log(base, numeric_expression)
- log10(numeric_expression)
- log2(numeric_expression)
- nanvl(expression_x, expression_y)
- pi()
- power(base, exponent)
- radians(numeric_expression)
- random()
- round(numeric_expression[, decimal_places])
- signum(numeric_expression)
- sin(numeric_expression)
- sinh(numeric_expression)
- sqrt(numeric_expression)
- tan(numeric_expression)
- tanh(numeric_expression)
- trunc(numeric_expression[, decimal_places])
Conditional Functions
- coalesce(expression1[, ..., expression_n])
- nullif(expression1, expression2)
- nvl(expression1, expression2)
- nvl2(expression1, expression2, expression3)
String Functions
- ascii(str)
- bit_length(str)
- btrim(str[, trim_str]) or trim(str[, trim_str])
- length(str) or char_length(str)
- concat(str[, ..., str_n])
- concat_ws(separator, str[, ..., str_n])
- chr(expression)
- ends_with(str, substr)
- initcap(str)
- left(str, n)
- length(str)
- lower(str)
- lpad(str, n[, padding_str])
- ltrim(str[, trim_str])
- octet_length(str)
- repeat(str, n)
- replace(str, substr, replacement)
- reverse(str)
- right(str, n)
- rpad(str, n[, padding_str])
- rtrim(str[, trim_str])
- split_part(str, delimiter, pos)
- starts_with(str, substr)
- strpos(str, substr) or instr(str, substr)
- substr(str, start_pos[, length]) or substring(str, start_pos[, length])
- translate(str, chars, translation)
- to_hex(int)
- upper(str)
- uuid()
- overlay(str PLACING substr FROM pos [FOR count])
- levenshtein(str1, str2)
- substr_index(str, delim, count)
- find_in_set(str, strlist)
Binary String Functions
- encode(expression, format)
- decode(expression, format)
Regular Expression Functions
- regexp_like(str, regexp[, flags])
- regexp_match(str, regexp[, flags])
- regexp_replace(str, regexp, replacement[, flags])
- position(substr in origstr)
Time and Date Functions
- now()
- current_date() or today()
- current_time()
- date_bin(interval, expression, origin-timestamp)
- date_trunc(precision, expression)
- date_part(part, expression)
- extract(field FROM source)
- make_date(year, month, day)
- to_char(expression, format)
- to_date(expression[, ..., format_n])
- to_timestamp(expression[, ..., format_n])
- to_timestamp_millis(expression[, ..., format_n])
- to_timestamp_micros(expression[, ..., format_n])
- to_timestamp_nanos(expression[, ..., format_n])
- to_timestamp_seconds(expression[, ..., format_n])
- from_unixtime(expression)
- to_unixtime(expression[, ..., format_n])
Array Funcitons
- array_append(array, element)
- array_sort(array, desc, nulls_first)
- array_resize(array, size, value)
- array_concat(array[, ..., array_n])
- array_has(array, element)
- array_has_all(array, sub-array)
- array_has_any(array, sub-array)
- array_dims(array)
- array_distinct(array)
- array_element(array, index)
- array_fill(element, array)
- flatten(array)
- array_intersect(array1, array2)
- array_length(array, dimension)
- array_ndims(array, element)
- array_prepend(element, array)
- array_pop_front(array)
- array_pop_back(array)
- array_position(array, element[, index])
- array_positions(array, element)
- array_repeat(element, count)
- array_remove(array, element)
- array_remove_n(array, element, max)
- array_remove_all(array, element)
- array_replace(array, from, to)
- array_replace_n(array, from, to, max)
- array_replace_all(array, from, to)
- array_reverse(array)
- array_slice(array, begin, end)
- array_to_string(array, delimiter)
- array_union(array1, array2)
- array_except(array1, array2)
- cardinality(array)
- empty(array)
- generate_series(start, stop, step)
- make_array(expression1[, ..., expression_n])
- string_to_array(str, delimiter[, null_str])
- trim_array(array, n)
- unnest(array)
- range(startm end, step)
Map Functions is not supported yet
Hash Functions
- digest(expression, algorithm)
- md5(expression)
- sha224(expression)
- sha256(expression)
- sha384(expression)
- sha512(expression)
Other Functions
- arrow_cast(expression, datatype)
- arrow_typeof(expression)

그 외 다양한 스칼라 함수에 대한 설명은 DataFusion SQL Scalar Functions를 참조하세요.

개요​

Data Types​

SQL 문법​

Information Schema​

Select​

Subquery​

Operators​

Aggregate Functions​

Window Functions​

Scalar Functions​

개요

Data Types

SQL 문법

Information Schema

Select

Subquery

Operators

Aggregate Functions

Window Functions

Scalar Functions