Stanza - A Python NLP Library for Many Human Languages

I tested out Stanza. English tokenizer and definately works. I ran quick test with Japanese lang and output was somewhat unexpected. import stanza # japanese "ja", for english model "en" stanza.download("ja") nlp = stanza.Pipeline("ja") doc = nlp("皆さんおはようございます! ご機嫌いかがですか?") for i, sentence in enumerate(doc.sentences): print(f"===== Sentence {i+1} tokens =====") print(*[f"word: {word.text}\t upos: {word.upos} xpos: {word.xpos}" for word in sentence.words], sep="\n") The output is: ===== Sentence 1 tokens ===== word: 皆さん upos: PRON xpos: NP word: おは upos: VERB xpos: VV word: よう upos: AUX xpos: AV word: ござい upos: VERB xpos: VV word: ます upos: AUX xpos: AV word: ! upos: PUNCT xpos: SYM ===== Sentence 2 tokens ===== word: ご upos: NOUN xpos: XP word: 機 upos: NOUN xpos: XS word: 嫌い upos: NOUN xpos: NN word: か upos: PART xpos: PF word: が upos: ADP xpos: PS word: です upos: AUX xpos: AV word: か upos: PART xpos: PE word: ? upos: PUNCT xpos: SYM I’m not qualified to evaluate accuracy of POS etc but at least as far as tokenization goes, I would expect ...

April 12, 2020 · 1 min · Naoko Reeves

Install GraalVM and run ptyhon with debugger

What is GraalVM? GraalVM is a high-performance, embeddable, polyglot virtual machine for running applications written in JavaScript, Python, Ruby, R, JVM-based languages like Java, Scala, Kotlin, and LLVM-based languages such as C and C++. Here is the Official doc link Hmm… Okay, I have to see it. Let’s install Below is the way I installed GraalVM Community Edition on Ubuntu 18.04. For other platform, the official doc installation guide is here. # update this number to latest version from here: https://github.com/oracle/graal/releases version=1.0.0-rc15 wget https://github.com/oracle/graal/releases/download/vm-${version}/graalvm-ce-${version}-linux-amd64.tar.gz tar -xvzf graalvm-ce-${version}-linux-amd64.tar.gz # clean up rm graalvm-ce-${version}-linux-amd64.tar.gz # to wherever you want. mv graalvm-ce-1.0.0-rc15/ ~/bin/graalvm # if you want to make it permanent, put this in your bashrc export PATH=$HOME/graalvm/bin:$PATH Now that your graalvm/bin in your path, you’ll get the GraalVM versions of those runtimes. ...

April 14, 2019 · 3 min · Naoko Reeves

Install Python on Ubuntu

Check the latest version here At the time of writing, 3.8.0 is the latest and 3.8.5 has release candidate. Also make sure you have sqlite3, libbz2-dev and libffi-dev are installed sudo apt-get install libsqlite3-dev libbz2-dev libffi-dev version=3.8.5 wget https://www.python.org/ftp/python/${version}/Python-${version}.tgz tar xzvf Python-${version}.tgz cd Python-${version} # Linux (or any Unix-like system), the default prefix and exec-prefix are /usr/local. # thus you should be able to omit --prefix here # --enable-optimizations option for significant speed boost (10-20%) but much # slower build process ./configure --prefix /usr/local --enable-optimizations make sudo make install # OR if you want to skip creating the python link then: sudo make altinstall in case you want to remove and re-install it again cause some software was missing before installation ...

October 15, 2018 · 1 min · Naoko Reeves

Intercept Python Logging

The problem: I need to ship specific log record and had formatter written in python. It is pretty complex transformation. I thought of using Logstash but I then need to either convert this python logic or write a plugin to use already written python parser. Plus I need to install logstash… I wanted a simpler solution How to solve it Use custom python logging Handler and Filter! import logging messages = [] logger = logging.getLogger(__name__) logger.setLevel(logging.DEBUG) class ListenFilter(logging.Filter): def filter(self, record): """Determine which log records to output. Returns 0 for no, nonzero for yes. """ if record.getMessage().startswith('dont: '): return False return True class RequestsHandler(logging.Handler): def emit(self, record): """Send the log records (created by loggers) to the appropriate destination. """ messages.append(record.getMessage()) handler = RequestsHandler() logger.addHandler(handler) filter_ = ListenFilter() logger.addFilter(filter_) # log I want logger.info("logme: Howdy!") # log i want to skip logger.info("dont: I'm doing great!") # prints ['logme: Howdy!'] print(messages) Cheers!

July 26, 2018 · 1 min · Naoko Reeves

Python Project Install - develop vs install & setuptools vs pip

The problem: I don’t understand the difference between setup.py develop and setup.py install I don’t understand the difference between setup.py develop and pip install -e [dir] I don’t see the changes to my code when I import my code The difference between setup.py develop and setup.py install In short, you want to run setup.py develop when you are editing code because when you run setup.py install, it will copy your code into site-packages thus if you want to test your latest code you will need to install (copy) again. On the other hand, with develop, it creates a link to your source code so that when you import your code, it is your latest code. ...

March 5, 2018 · 3 min · Naoko Reeves