Tech Blog2019 - 5 - 4

Created a LINE bot to search Wikipedia

Python Chatbot

Built with `flask`, `line-bot-sdk`, and `wikipedia`, and deployed to `heroku`.

Built with flask, line-bot-sdk, and wikipedia, and deployed to heroku.

Please give it a try.

Final Result

screenshot

The repository is here.

Preparation

Since I want to run pip install -r requirements.txt, I wrote the three libraries mentioned above in requirements.txt, and completed the Heroku and LINE@ settings (obtained the channel access token and channel secret). I also placed runtime.txt and Procfile. Basically, I imitated Running a LINE BOT (python) on Heroku.

About Wikipedia

Everyone's favorite Wikipedia.

1pip install wikipedia

The library Wikipedia installed with this is very convenient, and I interpret it as a wrapper for the Media Wiki API for Python. It should be something that hits the API with requests, parses the markup with something like BeautifulSoup4, and returns it.

1import wikipedia
2
3wikipedia.set_lang("ja")

After setting it to Japanese Wikipedia with this,

1wikipedia.search("string")

returns a list of each page name, and the Wikipedia page to be written has that name (title) as its ID,

1wikipedia.page("page name")

allows you to get a wikipedia.WikipediaPage object. This page object has attributes such as categories, links, content, and summary, which are basically lists of URLs or strings.

1>>> help(wikipedia.WikipediaPage)
2>>>
3"""
4categories
5    List of categories of a page.
6content
7    Plain text content of the page, excluding images, tables, and other data.
8coordinates
9    Tuple of Decimals in the form of (lat, lon) or None
10images
11    List of URLs of images on the page.
12links
13    List of titles of Wikipedia page links on a page.
14    Only includes articles from namespace 0, meaning no Category, User talk, or other meta-Wikipedia pages.
15parent_id
16    Revision ID of the parent version of the current revision of this
17    page. See ``revision_id`` for more information.
18references
19    List of URLs of external links on a page.
20    May include external links within page that aren't technically cited anywhere.
21revision_id
22    Revision ID of the page.
23    The revision ID is a number that uniquely identifies the current
24"""

For the LINE Bot I'm making this time, I decided to make something that selects one page from the candidates obtained for the search word, and returns the summary of that page (the overview right after the title, displayed in OGP, etc.?) and a link to the page to the LINE chat room.

Files, etc.

1.
2├── Procfile
3├── README.md
4├── __pycache__
5│   ├── app.cpython-36.pyc
6│   └── parser.cpython-36.pyc
7├── app.py
8├── assets
9│   └── img
10│       └── linebot-icon.webp
11├── messenger.py
12├── parser.py
13├── requirements.txt
14├── runtime.txt
15└── test.py

`app.py`

Next, I'll quickly write the application part based on flask, but this is also mostly copy-paste. linebot hides the tedious parts, which is very convenient.

1from flask import Flask, request, abort
2
3from linebot import (
4    LineBotApi, WebhookHandler
5)
6from linebot.exceptions import (
7    InvalidSignatureError
8)
9from linebot.models import (
10    MessageEvent, TextMessage, TextSendMessage,
11)
12
13import parser
14import os
15
16app = Flask(__name__)
17
18YOUR_CHANNEL_ACCESS_TOKEN = os.environ.get("YOUR_CHANNEL_ACCESS_TOKEN")
19YOUR_CHANNEL_SECRET = os.environ.get("YOUR_CHANNEL_SECRET")
20
21line_bot_api = LineBotApi(YOUR_CHANNEL_ACCESS_TOKEN)
22handler = WebhookHandler(YOUR_CHANNEL_SECRET)
23
24
25@app.route("/callback", methods=['POST'])
26def callback():
27    # get X-Line-Signature header value
28    signature = request.headers['X-Line-Signature']
29
30    # get request body as text
31    body = request.get_data(as_text=True)
32    app.logger.info("Request body: " + body)
33
34    # handle webhook body
35    try:
36        handler.handle(body, signature)
37    except InvalidSignatureError:
38        print("Invalid signature. Please check your channel access token/channel secret.")
39        abort(400)
40
41    return 'OK'
42
43
44@handler.add(MessageEvent, message=TextMessage)
45def handle_message(event):
46    line_bot_api.reply_message(
47        event.reply_token,
48        TextSendMessage(text=parser.answer(event.message.text)))
49
50
51if __name__ == "__main__":
52    port = int(os.getenv("PORT", 5000))
53    app.run(host="0.0.0.0", port=port)

A sample using flask as app.py is published in the official repository of line-bot-sdk-python. [sample] app.py

Also, this time I used what was solidly posted in the README in the first place. sample code on GitHub

`parser.py`

The parser has language settings as module variables. Is it a bit subtle as a design? Module (global) variables.
usage() for when you use it wrong or want to show help is not implemented.
Surprisingly, the number of characters in WikipediaPage.summary is long, so I cut it off at 1500 characters or more in case it hits the upper limit of the Messaging API.

1import wikipedia
2
3
4# init language setting
5lang = "ja"
6wikipedia.set_lang(lang)
7
8def init() -> None:
9    global lang
10    wikipedia.set_lang(lang)
11
12
13def tokenize(text: str) -> list:
14    """Tokenize input Sentence to list of word"""
15    splited = text.split()
16    if len(splited) == 1:
17        return splited
18    elif len(splited) == 2:
19        if splited[0] in wikipedia.languages.fn().keys():
20            change_lang(splited[0])
21        return splited[1]
22    else:
23        usage()
24
25def search(text: str, rank=0) -> "wikipedia.wikipedia.WikipediaPage":
26    """Search Wikipedia page by Word
27    arg
28    ---
29    rank : int : Return the contents of the search result of the set rank.
30    """
31    try:
32        page = wikipedia.page(wikipedia.search(text)[rank])
33    except wikipedia.exceptions.DisambiguationError:
34        page = wikipedia.page(wikipedia.search(text)[rank+1])
35    return page
36
37
38def encode(page: "wikipedia.wikipedia.WikipediaPage", threshold=1500) -> str:
39    """Transform data into the text for LINE message
40    """
41    summary = page.summary
42    if len(summary) > threshold:
43        summary = summary[:threshold] + "..."
44
45    return f"Result: {page.title}\n\n{summary}\n\n{page.url}"
46
47
48def answer(text: str) -> str:
49    init()
50    word = tokenize(text)
51    page = search(word)
52    return encode(page)
53
54
55def change_lang(language: str) -> None:
56    wikipedia.set_lang(language)
57    return
58
59def usage():
60    pass
61
62if __name__ == "__main__":
63    import argparse
64
65    parser = argparse.ArgumentParser()
66    parser.parse_args()

Just in case, it's properly in PEP8 style, and I'm writing type hints and docstrings. I want to make it a habit.

Deployment, etc.

1$ heroku login
2$ heroku create heroku-line-bot
3$ heroku config:set LINE_CHANNEL_SECRET="<Channel Secret>"
4$ heroku config:set LINE_CHANNEL_ACCESS_TOKEN="<Access Token>"
5$ git push heroku master

atsuya koba