Since jsLDA seems to have some traction I want to say a bit about its design goals. These are based on experience teaching ML to a range of experience levels in classes and workshops. https://twitter.com/ted_underwood/status/1319727150967250945
First and most important BY FAR: no setup. If you can open a web browser you’re good to go. Every time I used Java or Python I would lose some percent of the audience at every step. In Python classes now we budget a week just to get everyone’s environment ready.
Some never manage to get it working. I cannot overstate how much of a problem the dumpster fire of Python package management is for learners. (Yes, I know about the thing you are about to suggest.) With JavaScript it just works.
This is also important on the server side: by going all JavaScript I only need to serve static documents. All the computation (and the data) stays local.
Second, results first. Start by showing what a topic model does. The magic trick aspect matters 🎩🐇. Once people see what an ML tool is for, it becomes much easier to explain it. No one cares about the equations! (... yet.)
I also wanted to show what *happens*. I deliberately made it so that the first thing you see is random initialization, so users can see patterns emerge. Someone always asks “who decides what the topics are?” — a totally reasonable question. I want to make it clear it’s the algo.
Third, balance power and simplicity. I worked hard to make jsLDA fast enough to be usable for real work on substantial collections, but also simple enough that a new user can just press a button and get a result, and an intermediate user can follow the source code.
I specifically did not use certain tricks that I used in Mallet because I wanted to make the relationship between equations and code transparent. (I’m not against math, I just don’t want to lead with it.)
Fourth, emphasize corpus curation. You can interactively add and remove stop words and see the effect. I think @boydgraber suggested this? It’s a much more important interface than # of topics, but people don’t know that at first.
Fifth, uploads and downloads. This was tricky to get working, but makes the difference between a toy and a tool. I also allows instructors to customize, which also means learners get an example of working with their own collections.
Finally, integrate algorithms with interface. The browser is a high performance computing environment (JavaScript is MUCH faster than Python) embedded in an excellent interactive graphics environment. Plus there’s a code environment hidden underneath! Print those variables!
Want to train people in ML? Start by showing them what it does. No math, no yak shaving. Then allow many directions for play and exploration. Use the browser. Follow @nsthorat, Tensorflow JS is great. What other tools make in-browser ML easy and fun?
You can follow @dmimno.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled: