A couple years ago, OOPSLA was in Amsterdam, and I was closing out a talk about Ringer, the web record and replay tool we’d built. I wanted to show off the higher-level tools we can build on top of Ringer, and I’d started work on Helena, which uses Ringer as a building block, so I planned to demo Helena. I decided to kill two birds with one stone - I’d use Helena to scrape data that would tell me what to do on a visit to Amsterdam.
I live in Seattle, so in particular I was looking for things I could do in Amsterdam that aren’t accessible in Seattle. And I really like food, so if I’m being honest, I was mostly interested in what I could eat in Amsterdam that I can’t eat in Seattle. So, step 1: collect reviews for all the restaurants in Seattle and Amsterdam. I fired up Helena and collected the reviews. See the GIF below for a look at that process:
I wanted to look for things that appear more often in the Amsterdam data than in the Seattle data. Looking at the prevalence of each word would be ok, but I figured that might obscure some interesting patterns, since some dishes would be multi-word strings. (For example, “rice table” turned out to be an interesting Dutch meal, and the string “rice table” appeared much more often in Amsterdam than Seattle, but “rice” and “table” were both about evenly represented in the two cities.) I didn’t want to just try every n-gram up to a given size since that sounded pretty slow. So I figured I’d use the fact that Yelp already makes an effort to highlight notable features about the restaurants on its platform. For instance, here are Yelp’s featured reviews for one of the Amsterdam restaurants:
Obviously this is a bit of a mixed bag. The featured reviews highlight two items - “rice table” and “coconut ice cream” - that definitely look like the kind of thing we want. But the third highlighted item, “different dishes,” seems a little vague. Still, this seemed good enough for my very casual purposes. So, step 2: collect the featured items from the featured reviews for all Seattle and Amsterdam restaurants. Here’s a GIF showing how I used Helena to collect these key phrases:
So now we have four datasets: reviews of Seattle restaurants, reviews of Amsterdam restaurants, key phrases from Seattle restaurants, and key phrases from Amsterdam phrases. I pooled the key phrases from the two cities to make one master list of interesting phrases. Next, I calculated how many reviews from each city used each interesting phrase. That gives us the data below, showing the percentage of reviews by city that mention each phrase. This chart includes all the phrases for which 0.5% or more of reviews mentioned the phrase, in at least one of the cities.
Ok, so that one’s a little overwhelming. I didn’t get a lot out of that one. I mean, feel free to mouse over it and look at the prevalence of each phrase in each city. Apparently Seattle loves the word “crisp?” But basically, this was too much data. Let’s filter it a little more.
This is getting a little more reasonable. We can see individual phrases, and we can see that some things appear much more in one city or the other. Looks like “AMS” is pretty much just showing up in Amsterdam, not so much in Seattle; that’s the Amsterdam airport code, so that makes sense. But we’re getting a lot of phrases that actually have about the same incidence in both cities, and that’s not what I was seeking, so let’s try again.
Here we go. Now we’re charting the difference in incidence across the two cities. Above the line, the phrase appears more in Amsterdam reviews. Below the line, the phrase appears more in Seattle reviews. We’re seeing some good stuff here. Looks like Seattle likes “jus” - always thought that was weirdly prevalent here. “Ave” gets more play in Seattle. (Hello, The Ave! Hello, UW friends!) But it’s clearly time to zoom in on what we’re really seeking here: things that are much more prevalent in Amsterdam than in Seattle.
Here, at last, the key recommendations! Looks like we want to seek out: Euros (ok, we’ll need that to buy the food, fine); Dutch food (I mean sure, but that’s very vague, let’s get some deets); bitterballen (Dutch meatballs! yes!); canals (ok, not food, but I’m on board); Red Light District (…); Amstel (the canal? the beer?); Central Station (yeah, trains can get me to food!); rice table (a huge variety of Indonesian dishes in one meal! yes please!); rijsttafel (the Dutch word for rice table); Vondelpark (a very nice park, and I’m interested, but sadly not food); Leidseplein (hm, again a place, not food); poffertjes (traditional Dutch mini pancakes - sign me up!); Dam Square (not food, but yeah, you should probably go); Rembrandtplein (another good place to go; I guess you can eat on your way there or back?). Overall, definitely a bunch of stuff that is available in Amsterdam and not in Seattle! Success! Although I conclude that Yelp is not using its featured items to highlight only specialty dishes. Still, I’m not going to complain about being steered to the Rembrandtplein or the Vondelpark! So there you have it - a couple quick demonstrations, a few scraper runs, and you too can discover the local specialties at your next destination!