A couple years ago, OOPSLA was in Amsterdam, and I was closing out a talk about Ringer, the web record and replay tool we’d built. I wanted to show off the higher-level tools we can build on top of Ringer, and I’d started work on Helena, which uses Ringer as a building block, so I planned to demo Helena. I decided to kill two birds with one stone - I’d use Helena to scrape data that would tell me what to do on a visit to Amsterdam.

I live in Seattle, so in particular I was looking for things I could do in Amsterdam that aren’t accessible in Seattle. And I really like food, so if I’m being honest, I was mostly interested in what I could eat in Amsterdam that I can’t eat in Seattle. So, step 1: collect reviews for all the restaurants in Seattle and Amsterdam. I fired up Helena and collected the reviews. See the GIF below for a look at that process:

Gif using Helena to scrape Yelp reviews.

I wanted to look for things that appear more often in the Amsterdam data than in the Seattle data. Looking at the prevalence of each word would be ok, but I figured that might obscure some interesting patterns, since some dishes would be multi-word strings. (For example, “rice table” turned out to be an interesting Dutch meal, and the string “rice table” appeared much more often in Amsterdam than Seattle, but “rice” and “table” were both about evenly represented in the two cities.) I didn’t want to just try every n-gram up to a given size since that sounded pretty slow. So I figured I’d use the fact that Yelp already makes an effort to highlight notable features about the restaurants on its platform. For instance, here are Yelp’s featured reviews for one of the Amsterdam restaurants:

Featured reviews for an Amsterdam restaurant, according to Yelp.

Obviously this is a bit of a mixed bag. The featured reviews highlight two items - “rice table” and “coconut ice cream” - that definitely look like the kind of thing we want. But the third highlighted item, “different dishes,” seems a little vague. Still, this seemed good enough for my very casual purposes. So, step 2: collect the featured items from the featured reviews for all Seattle and Amsterdam restaurants. Here’s a GIF showing how I used Helena to collect these key phrases:

Gif using Helena to scrape key phrases from Yelp's featured reviews.

So now we have four datasets: reviews of Seattle restaurants, reviews of Amsterdam restaurants, key phrases from Seattle restaurants, and key phrases from Amsterdam phrases. I pooled the key phrases from the two cities to make one combined list of interesting phrases. Next, I calculated how many reviews from each city used each interesting phrase. That gives us the data below, showing the percentage of reviews by city that mention each phrase. This chart includes all the phrases for which 0.5% or more of reviews mentioned the phrase, in at least one of the cities.

Wow, lots of stuff that you can find in Amsterdam or Seattle. (All key phrases that appear in 0.5% or more of reviews, in Amsterdam or Seattle.)

Ok, so that one’s a little overwhelming. I didn’t get a lot out of that one. I mean, feel free to mouse over it and look at the prevalence of each phrase in each city. Apparently Seattle loves the word “crisp?” But basically, this was too much data. Let’s filter it a little more.

All key phrases that appear in 2% or more of reviews, in Amsterdam or Seattle.

This is getting a little more reasonable. We can see individual phrases, and we can see that some things appear much more in one city or the other. Looks like “AMS” is pretty much just showing up in Amsterdam, not so much in Seattle; that’s the Amsterdam airport code, so that makes sense. But we’re getting a lot of phrases that actually have about the same incidence in both cities, and that’s not what I was seeking, so let’s try again.

Difference in incidence rates in Amsterdam and Seattle. (All key phrases where difference is about 1 percentage point or greater.)

Here we go. Now we’re charting the difference in incidence across the two cities. Above the line, the phrase appears more in Amsterdam reviews. Below the line, the phrase appears more in Seattle reviews. We’re seeing some good stuff here. Looks like Seattle likes “jus” - always thought that was weirdly prevalent here. “Ave” gets more play in Seattle. (Hello, The Ave! Hello, UW friends!) But it’s clearly time to zoom in on what we’re really seeking here: things that are much more prevalent in Amsterdam than in Seattle.

Difference in incidence rates in Amsterdam and Seattle. (All key phrases where Seattle incidence is minuscule and Amsterdam incidence is at least 0.3% of reviews.)

Here, at last, the key recommendations! Looks like we want to seek out: Euros (ok, we’ll need that to buy the food, fine); Dutch food (I mean sure, but that’s very vague, let’s get some deets); bitterballen (Dutch meatballs! yes!); canals (ok, not food, but I’m on board); Red Light District (…); Amstel (the canal? the beer?); Central Station (yeah, trains can get me to food!); rice table (a huge variety of Indonesian dishes in one meal! yes please!); rijsttafel (the Dutch word for rice table); Vondelpark (a very nice park, and I’m interested, but sadly not food); Leidseplein (hm, again a place, not food); poffertjes (traditional Dutch mini pancakes - sign me up!); Dam Square (not food, but yeah, you should probably go); Rembrandtplein (another good place to go; I guess you can eat on your way there or back?). Overall, definitely a bunch of stuff that is available in Amsterdam and not in Seattle! Success! Although I conclude that Yelp is not using its featured items to highlight only specialty dishes. Still, I’m not going to complain about being steered to the Rembrandtplein or the Vondelpark! So there you have it - a couple quick demonstrations, a few scraper runs, and you too can discover the local specialties at your next destination!

Enjoying a delightful rijsttafel.
Enjoying a delightful rijsttafel.
Epilogue. I actually did follow all of my data's food suggestions, and they were very satisfying. I was particularly a fan of the rice table/rijsttafel, which I tried at a number of different restaurants. (See photo! So many dishes!) The bitterballen were a tasty bar snack with their fancy gravy. The mini pancakes were very frivolous and very indulgent and generally a treat. In short, data-based trip planning is a great albeit silly success!