てっくめも

主に技術的なことをつらつらと

Groonga読書会4 参加レポート

2014-08-25 に開催されたGroonga読書会4の参加レポートです。

Groongaの公式ドキュメントを輪読形式で進めていき、Groongaに対する理解を深めるのが目的です。

内容

今回はドキュメントの「4.4. さまざまな検索条件」〜 「4.6. タグ検索・参照関係の逆引」の範囲でした。

4.4. さまざまな検索条件

4.4.1. JavaScriptに似た文法での絞込・全文検索

filter と query を用いるとANDでつないた形とおなじになる

実行結果

> select --table Site --filter "_id <= 1" --output_columns _id,_key
[[0,1408964913.55891,9.65595245361328e-05],[[[1],[["_id","UInt32"],["_key","ShortText"]],[1,"http://example.org/"]]]]

> select --table Site --filter "_id >= 4 && _id <= 6" --output_columns _id,_key
[[0,1408964936.79873,8.98838043212891e-05],[[[3],[["_id","UInt32"],["_key","ShortText"]],[4,"http://example.net/afr"],[5,"http://example.org/aba"],[6,"http://example.com/rab"]]]]

> select --table Site --filter "_id <= 2 || _id >= 7" --output_columns _id,_key
[[0,1408964943.76177,0.000809192657470703],[[[5],[["_id","UInt32"],["_key","ShortText"]],[1,"http://example.org/"],[2,"http://example.net/"],[7,"http://example.net/atv"],[8,"http://example.org/gat"],[9,"http://example.com/vdw"]]]]

4.4.2. scorer を利用したソート

--filter “1”

1は真なので全てのレコードが返ってくる。
省略できない。(_scoreは仮想カラムで、結果セットにしか存在しないため)

HIT数が多い場合は重くなるかもしれない

実行結果

> select --table Site --filter "1" --scorer "_score = rand()" --output_columns _id,_key,_score --sortby _score
[[0,1408965005.85477,0.00125598907470703],[[[9],[["_id","UInt32"],["_key","ShortText"],["_score","Int32"]],[1,"http://example.org/",233665123],[3,"http://example.com/",468703135],[7,"http://example.net/atv",635723058],[4,"http://example.net/afr",1101513929],[9,"http://example.com/vdw",1125898167],[6,"http://example.com/rab",1315634022],[8,"http://example.org/gat",1369133069],[5,"http://example.org/aba",1801979802],[2,"http://example.net/",2145174067]]]]

> select --table Site --filter "1" --scorer "_score = rand()" --output_columns _id,_key,_score --sortby _score
[[0,1408965006.82172,0.00114774703979492],[[[9],[["_id","UInt32"],["_key","ShortText"],["_score","Int32"]],[9,"http://example.com/vdw",608413784],[3,"http://example.com/",628175011],[7,"http://example.net/atv",859484421],[1,"http://example.org/",1059961393],[5,"http://example.org/aba",1131176229],[6,"http://example.com/rab",1653377373],[4,"http://example.net/afr",1656478042],[8,"http://example.org/gat",1914544919],[2,"http://example.net/",2089018456]]]]

> select --table Site --filter "1" --scorer "_score = rand()" --output_columns _id,_key,_score --sortby _score
[[0,1408965008.39878,0.00113821029663086],[[[9],[["_id","UInt32"],["_key","ShortText"],["_score","Int32"]],[4,"http://example.net/afr",149798315],[7,"http://example.net/atv",184803526],[8,"http://example.org/gat",412776091],[1,"http://example.org/",756898537],[6,"http://example.com/rab",1129566413],[9,"http://example.com/vdw",1424268980],[2,"http://example.net/",1734575198],[3,"http://example.com/",1973594324],[5,"http://example.org/aba",2038664370]]]]

4.4.3. 位置情報を用いた絞込・ソート

単純にX軸Y軸は難しい(平面的なのは)
球体の座標計算をしているため

平面モードも存在するらしい。

Groongaには測地系の違いを計算する機能がある。
が、位置によって誤差が異なるので微妙。
正確な場所を求めるのには用いないほうが良い。

緯度経度の区切りは'x'と','が使える

小数の話(前回と同じ)

度数表記からミリ秒表記に変換する際にも当然誤差が発生するので、 もしミリ秒表記を持っているのであればそちらを用いたほうが良い。
が、ない場合は、Googleなどで変換などせずに、そのままGroongaに変換させたほうが良い。

filter と query は動作は同じ
表現力は filter の方が大きい

> column_create --table Site --name location --type WGS84GeoPoint
[[0,1408965236.0137,0.00267982482910156],true]

> load --table Site
[
 {"_key":"http://example.org/","location":"128452975x503157902"}
 {"_key":"http://example.net/","location":"128487316x502920929"},
]
> > > > [[0,1408965258.59775,0.00572419166564941],2]

> select --table Site --query "_id:1 OR _id:2" --output_columns _key,location
[[0,1408965276.89475,0.00426530838012695],[[[2],[["_key","ShortText"],["location","WGS84GeoPoint"]],["http://example.org/","128452975x503157902"],["http://example.net/","128487316x502920929"]]]]
> select --table Site --query "_id:1 OR _id:2" --output_columns _key,location,_score --scorer '_score = geo_distance(location, "128515259x503187188")'
[[0,1408965341.19305,0.00405526161193848],[[[2],[["_key","ShortText"],["location","WGS84GeoPoint"],["_score","Int32"]],["http://example.org/","128452975x503157902",2054],["http://example.net/","128487316x502920929",6720]]]]
> select --table Site --query "_id:1 OR _id:2" --output_columns _key,location,_score --scorer '_score = geo_distance(location, "128515259x503187188")' --sortby -_score
[[0,1408965369.80098,0.0018768310546875],[[[2],[["_key","ShortText"],["location","WGS84GeoPoint"],["_score","Int32"]],["http://example.net/","128487316x502920929",6720],["http://example.org/","128452975x503157902",2054]]]]
> select --table Site --output_columns _key,location --filter 'geo_in_circle(location, "128515259x503187188", 5000)'
[[-22,1408965388.87275,0.00421857833862305,"geo_in_circle(): index for <Site.location> is missing",[["grn_selector_geo_in_circle","geo.c",794]]],[]]

4.5. ドリルダウン

大きな特徴としては、複数のGROUP BY結果を一度に返せる

_nsubrecs は仮想カラム
これも結果セットについている

Drilldownの先の結果については、Groongaでは取得できない。
(機能としてはもっているが、クエリが表現できない。)
Droongaであれば可能。

準備

> table_create --name SiteDomain --flags TABLE_HASH_KEY --key_type ShortText
[[0,1408966250.54083,0.00404238700866699],true]
> table_create --name SiteCountry --flags TABLE_HASH_KEY --key_type ShortText
[[0,1408966256.82079,0.00423598289489746],true]
> column_create --table Site --name domain --flags COLUMN_SCALAR --type SiteDomain
[[0,1408966262.80479,0.00373363494873047],true]
> column_create --table Site --name country --flags COLUMN_SCALAR --type SiteCountry
[[0,1408966268.84379,0.00376415252685547],true]
> load --table Site
[
{"_key":"http://example.org/","domain":".org","country":"japan"},
{"_key":"http://example.net/","domain":".net","country":"brazil"},
{"_key":"http://example.com/","domain":".com","country":"japan"},
{"_key":"http://example.net/afr","domain":".net","country":"usa"},
{"_key":"http://example.org/aba","domain":".org","country":"korea"},
{"_key":"http://example.com/rab","domain":".com","country":"china"},
{"_key":"http://example.net/atv","domain":".net","country":"china"},
{"_key":"http://example.org/gat","domain":".org","country":"usa"},
{"_key":"http://example.com/vdw","domain":".com","country":"japan"}
]
> > > > > > > > > > > [[0,1408966275.01298,0.00594520568847656],9]

実行結果

> select --table Site --limit 0 --drilldown domain
[[0,1408966319.46992,0.0010530948638916],[[[9],[["_id","UInt32"],["_key","ShortText"],["country","SiteCountry"],["domain","SiteDomain"],["link","Site"],["links","Site"],["location","WGS84GeoPoint"],["title","ShortText"]]],[[3],[["_key","ShortText"],["_nsubrecs","Int32"]],[".org",3],[".net",3],[".com",3]]]]

> select --table Site --limit 0 --drilldown country
[[0,1408966364.33675,0.000985622406005859],[[[9],[["_id","UInt32"],["_key","ShortText"],["country","SiteCountry"],["domain","SiteDomain"],["link","Site"],["links","Site"],["location","WGS84GeoPoint"],["title","ShortText"]]],[[5],[["_key","ShortText"],["_nsubrecs","Int32"]],["japan",3],["brazil",1],["usa",2],["korea",1],["china",2]]]]

実行結果

> select --table Site --limit 0 --drilldown domain --drilldown_output_columns _id,_key,_nsubrecs
[[0,1408966464.3928,0.000682592391967773],[[[9],[["_id","UInt32"],["_key","ShortText"],["country","SiteCountry"],["domain","SiteDomain"],["link","Site"],["links","Site"],["location","WGS84GeoPoint"],["title","ShortText"]]],[[3],[["_id","UInt32"],["_key","ShortText"],["_nsubrecs","Int32"]],[1,".org",3],[2,".net",3],[3,".com",3]]]]

実行結果

> select --table Site --limit 0 --filter "domain._id == 1" --drilldown country
[[0,1408966496.30478,0.00199079513549805],[[[3],[["_id","UInt32"],["_key","ShortText"],["country","SiteCountry"],["domain","SiteDomain"],["link","Site"],["links","Site"],["location","WGS84GeoPoint"],["title","ShortText"]]],[[3],[["_key","ShortText"],["_nsubrecs","Int32"]],["japan",1],["korea",1],["usa",1]]]]

4.5.1. 複数のカラムでドリルダウン

実行結果

> select --table Site --limit 0 --drilldown domain,country
[[0,1408967388.49174,0.00192952156066895],[[[9],[["_id","UInt32"],["_key","ShortText"],["country","SiteCountry"],["domain","SiteDomain"],["link","Site"],["links","Site"],["location","WGS84GeoPoint"],["title","ShortText"]]],[[3],[["_key","ShortText"],["_nsubrecs","Int32"]],[".org",3],[".net",3],[".com",3]],[[5],[["_key","ShortText"],["_nsubrecs","Int32"]],["japan",3],["brazil",1],["usa",2],["korea",1],["china",2]]]]

4.5.2. ドリルダウン結果をソートする

実行結果

> select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs
[[0,1408967430.09778,0.00185227394104004],[[[9],[["_id","UInt32"],["_key","ShortText"],["country","SiteCountry"],["domain","SiteDomain"],["link","Site"],["links","Site"],["location","WGS84GeoPoint"],["title","ShortText"]]],[[5],[["_key","ShortText"],["_nsubrecs","Int32"]],["brazil",1],["korea",1],["usa",2],["china",2],["japan",3]]]]

4.5.3. ドリルダウン結果の制限

drilldown_limits -> drilldown_limit

drilldown_limit に -1を指定すると制限なし

> select --table Site --limit 0 --drilldown country --drilldown_sortby _nsubrecs --drilldown_limit 2 --drilldown_offset 2
[[0,1408967474.53076,0.00110244750976562],[[[9],[["_id","UInt32"],["_key","ShortText"],["country","SiteCountry"],["domain","SiteDomain"],["link","Site"],["links","Site"],["location","WGS84GeoPoint"],["title","ShortText"]]],[[5],[["_key","ShortText"],["_nsubrecs","Int32"]],["usa",2],["china",2]]]]

4.6. タグ検索・参照関係の逆引

Groongaはインデックスが表に出ているイメージ。 そもそもインデックスもカラムの1つ。 Groongaでのテーブルはカラムのキーを管理している存在。

準備

> table_create --name Video --flags TABLE_HASH_KEY --key_type UInt32
[[0,1408968190.51778,0.00485920906066895],true]
> table_create --name Tag --flags TABLE_HASH_KEY --key_type ShortText
[[0,1408968197.06486,0.00412249565124512],true]
> column_create --table Video --name title --flags COLUMN_SCALAR --type ShortText
[[0,1408968202.91378,0.00779294967651367],true]
> column_create --table Video --name tags --flags COLUMN_VECTOR --type Tag
[[0,1408968209.15978,0.00527548789978027],true]
> column_create --table Tag --name index_tags --flags COLUMN_INDEX --type Video --source tags
[[0,1408968215.54476,0.0100023746490479],true]
> load --table Video
[
{"_key":1,"title":"Soccer 2010","tags":["Sports","Soccer"]},
{"_key":2,"title":"Zenigata Kinjirou","tags":["Variety","Money"]},
{"_key":3,"title":"groonga Demo","tags":["IT","Server","groonga"]},
{"_key":4,"title":"Moero!! Ultra Baseball","tags":["Sports","Baseball"]},
{"_key":5,"title":"Hex Gone!","tags":["Variety","Quiz"]},
{"_key":6,"title":"Pikonyan 1","tags":["Animation","Pikonyan"]},
{"_key":7,"title":"Draw 8 Month","tags":["Animation","Raccoon"]},
{"_key":8,"title":"K.O.","tags":["Animation","Music"]}
]
> > > > > > > > > > [[0,1408968221.05899,0.0045313835144043],8]

実行結果

> select --table Video --query tags:@Variety --output_columns _key,title
[[0,1408968267.01973,0.00113749504089355],[[[2],[["_key","UInt32"],["title","ShortText"]],[2,"Zenigata Kinjirou"],[5,"Hex Gone!"]]]]

> select --table Video --query tags:@Sports --output_columns _key,title
[[0,1408968286.70478,0.000999212265014648],[[[2],[["_key","UInt32"],["title","ShortText"]],[1,"Soccer 2010"],[4,"Moero!! Ultra Baseball"]]]]

> select --table Video --query tags:@Animation --output_columns _key,title
[[0,1408968302.63875,0.00431966781616211],[[[3],[["_key","UInt32"],["title","ShortText"]],[6,"Pikonyan 1"],[7,"Draw 8 Month"],[8,"K.O."]]]]

> select --table Video --query tags:@Quiz --output_columns _key,title
[[0,1408968483.84577,0.00119495391845703],[[[1],[["_key","UInt32"],["title","ShortText"]],[5,"Hex Gone!"]]]]