Database) Chapter11. Indexing and Hashing

2월 21, 2017

Database) Chapter11. Indexing and Hashing

Chapter 11. Indexing and Hashing

Index file의 Basic Concept -＞Linked List와 비슷한 느낌

(Type내에 Search key와 key가 어디에 있는지에 대한 pointer가 담겨 있음)

Indices에는 두 가지 basic type이 있음:

-Ordered indices: search keys are stored in sorted order -> search key들이 정렬된 상태로 분포되어 있는 index

-Hash indices: search keys are distributed uniformly across “buckets” using a “Hashing function” -> Hashing function을 이용하여 bucket을 가로질러 search key들이 불균등하게 분포되어 있다

---Ordered Indices => Good for equality and range search

+Primary index: 순차적으로 정렬된 파일 내에서 search key가 파일의 순서를 정하는 index를 의미함. (Index내의 하나의tuple이 하나의 데이터만을 결정해야한다 -> Primary key를 포함하고 있어야함)

+Secondary index: search key가 파일의 sequential order과는 다른 순서를 결정하는 index. (Index내의 하나의 tuple이 복수의 데이터를 결정할 수 있는 index)

+Dense index: Index record appears for every search-key value in the file

파일 내의 모든 search-key를 포함하고 있는 index -> 즉 search key type으로 결정된 domain의 모든 값을 포함하고 있어야 함.

(ex) dept_name : Biology, Comp. Sci., Elec. Eng., Finance,

(ex) ID => 이 경우는 Primary index

+Sparse Index: contain index records for only some search-key values

=> Index의 모든 tuple이 file내의 모든 data를 참조하지 않는 index

-+To locate a record with search-key value K:

=> Find index record with largest search-key value <= K

K보다 작은 search-key value의 pointer를 참조한 다음에 file data에서 value를 찾는다.

=> Good Tradeoff: Sparse index with an index entry for every block in file, corresponding to least search-key value in the block

=> search key들을 찾고 그 후에 데이터를 찾는 특성상 data들이 일정한 block으로 나누어짐.

+Secondary Index => Dense해지기 위한 조건?

Secondary index -> Dense index -> data 식으로 particular한 index를 하나 더 거칠 경우 해결 => Multilevel index

+Multilevel index => 몇 가지의 index들을 거쳐서 data를 참조하는 index형태

-+Outer index : a sparse index of primary index

-+inner index: the primary index file

Index Update: Deletion

-+Single-level index entry deletion

--+Dense index – deletion of search-key is similar to file record deletion

--+Sparse index – if an entry for the search key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the file.

===>만약 next search-key value가 index내에 이미 있을 경우 entry는 대체 없이 그냥 삭제된다.

B+ Tree index file

B+ Tree properties

-+Node에 n-1개의 search key, n개의 pointer가 있다.

-+Search key의 앞 pointer는 value가 < search key인 node들을 가리키고,

뒤 pointer는 value가 >= search key인 node들을 가리킨다.

-+All paths from root to lear are of the same length

-+each node that is not a root or a leaf has between upper(n/2) and n children.

-+A leaf node has between upper( (n-1)/2) and n-1 values

-*Root가 leaf가 아닐 경우 최소 2개의 children을 가지고 있어야 함.

Node의 구성

...

Pn-1

Kn-1

Ki are the search key values ( k1<k2<...<kn-1)

Pi are pointers to children or pointers to records or buckets of records.

Properties of a leaf node

-+ For I=1,2,...,n-1, pointer Pi points to a file record with search key value Ki

-+ If Li, Lj are leaf nodes and i<j, Li’s search-key values are less than or equal to Lj’s search-key values

-+ Pn points to next leaf node in search-key order

Updates on B+ Trees: Insertion

Search-key value가 들어가야 할 leaf node를 찾는다.

만약 search-key가 leaf node에 이미 있다면 -> file에 기록을 추가하고 만약 필요하다면, bucket에 pointer를 추가한다.

만약 search-key가 leaf node에 없다면 -> main file에 record를 추가한다.

만약 leaf node에 공간이 있다면 insert한다.

만약 leaf node에 공간이 없다면 node를 쪼갠다. 크기는 upper( n/2)와 나머지. 전자 node는 original node에 위치하고new node를 움직여야한다. 이를 node의 parent에 insert한다.

만약 부모 노드가 꽉 찼다면, 부모 노드의 부모 노드까지 살펴본다. (계속 iteration)

Updates on B+ Trees: Deletion

delete가 되어야 할 record를 찾는다, 그것을 main file과 bucket으로부터 삭제한다.

leaf node로부터 search-key value, pointer를 삭제한다.

만약 node가 너무 적은 entry를 보유하고 있으면, 그 entry는 한 single node로부터 들어가 merge sibling을 진행한다: 두 개의 node에 있는 모든 search-key value들을 하나의 node로 몰아넣고, other node를 삭제한다, 그 Ki-1와 Pi를 삭제하고,부모로부터도 node를 삭제하고, 반복적인 방법을 수행한다.

Hashing Function

-Static Hashing

+-Bucket : a unit of storage containing one or more records

+-Hash file organization : We obtain the bucket of a record directly from its search-key value using a hash function

+-Hashing function h is a function from the set of all search-key values K to the set of all bucket addresses B. => Hash Function은 접근용도로 record를 어느 특정 구간에 위치시키기 위해 만들어진 function이다.

(example)

+- h(Music) = 1, h(History) = 2,

=> 각각의 Hashing function에 따른 결과에 의해 bucket x에 접근 => 접근 속도가 향상될 수 있음

=> Ideal한 Hashing function은 uniform하고 random해야함.

+- Bucket에서 overflow가 생길 때 -> Overflow bucket을 이용하여 Linked List 형식으로 bucket과 그 다음 bucket을 이어줌.

+- Overflow bucket을 사용하는 hashing을 closed hashing이라고 하고, Overflow bucket을 사용하지 않는 hashing을open hashing이라고 한다.

+- Hash index는 Search key들을 organize하는데, 이는 그들의 record pointer와 함께 hash file structure를 referencing할 수 있는 구조이다.

+- 그렇기 때문에 Hash Indices는 항상 secondary indices 구조가 된다 (하나의 Hash index (bucker)안에 다수의 pointer가 있음)

+- 정적인 구조의 Hashing에서는 bucket address의 고정된 set값만이 저장할 수 있음.

=> 초기의 memory가 작을 경우 overflow가 너무 많이 일어날 수 있음. => 이럴 경우 bucket의 숫자를 dynamically하게 수정하는 것이 좋음.

Dynamic Hashing -> Good for database that grows and shrinks in size

+- Hash function이 dynamically하게 modify될 수 있도록 함.

+- Extendable hashing – one form of dynamic hashing => bit를 이용하여서 만드는 hashing function (Generally하게32bit integer를 이용하여 확장시킴. -> Bucket address table size는 2^i)

+- Hashing function을 통해나온 integer bit

=> A1, A2, A3, A4, A5, A6, ... , A32

1) Limit bit i를 정함

2) Bucket size s를 정함

3) 맨 처음 bit를 비교해서 bucket에 넣다가 Bucket size s를 초과하면

-> 비교하는 bit의 수를 1 증가시킨다.

4) 만약 비교하는 bit의 수가 limit bit i를 초과할 경우, Overflow bucket을 이용하여 Linked List 형식으로 묶음.

이 블로그 검색

Giyeon with CS

Database) Chapter11. Indexing and Hashing

댓글

댓글 쓰기

가장 많이 본 글

2018년 봄학기 카이스트 전산학부 석사과정 면접후기...

SWRC) Universal Dependency Annotation for Multilingual Parsing 번역, 정리