administration mode
Pssst...Ferdy is the creator of JungleDragon, an awesome wildlife community. Visit JungleDragon

 

JungleDragon specie engine, the basic UI »

FERDY CHRISTANT - JAN 14, 2012 (02:41:13 PM)

In the last few updates concerning JungleDragon, I mentioned how I'm working on the specie engine, the part that integrates specie information of Wikipedia with JungleDragon photos. None of this is live yet, so you can't see it. Neither was there any development UI to demonstrate, it was just me complaining how tedious it is to get structured data out of Wikipedia. 

That is still true, and my struggles in that area continue, but hereby I do want to share some first UI work of the specie engine. The scenario is simple: you have uploaded a photo and are asked to identify the specie on the photo/ For that there is an "Add specie" button, which brings up this dialog:

Since multiple species can appear on a single photo, you can add more than one specie, yet you add them one by one. As the dialog states, you can search both by common name (i.e. "Polar bear") as well by the latin name (i.e. "Ursus Maritimus").

As you type a specie name, the list will help you using suggestions. These suggestions concern species known to JungleDragon. This means they are used before. I do not have a database with all species. Instead, as you add a specie not known to JungleDragon, it will be a known specie from that point on. 

What makes a valid specie? Here are the current rules:

  • There must be an english Wikipedia page for your query, or a redirect to such a page
  • That page in particular must be a specie page, meaning:
    • It has the "taxobox" on the right
    • It has to be a specie or a subspecie, meaning it has either the "binomial" or "trinomial" name property. For example, "Bear" is not a specie, but "Brown bear" is.

Ok, given that you entered a valid specie name, one of two things will happen:

  • If the specie is known to JungleDragon already, it is instantly associated with a photo.
  • If the specie is not known to JungleDragon, yet it is a valid specie, I will parse it from Wikipedia in real-time, which takes a few seconds. A loading indicator will make this clear. From that point on, it is a known specie to JungleDragon.

So, that's how the "Add specie" dialog works. It's how you identify a specie on a photo. Once I know that relationship, I can visualize rich specie information next to such a photo. Here's a very early preview:

Check out the sidebar on the right. This photo of an Impala has been associated with the specie Impala, and as a result, it shows the common name, binomial name, description, and range map. 

Be aware that this is just a simple start. I have a lot more data about the specie and I can also visualize it any way I like. Take note of the concept though. This is where JungleDragon v2 is all about. Instantly learning about what is on the photo. And of course, later on you can click through on the specie name which will show a full page with everything there is to know about it.

Wiki parsing engine updates

I need to reserve some room in this post once again for self-pity. To complain about parsing Wikipedia. The overall complaint is that each time I extend my test set of specie queries, I find new problems, new ways in which Wikipedia pages are structured, that my engine cannot handle yet. It's one step forward, two steps back. Here's two recent situations:

  • I've been relying on the taxobox on a specie page to parse the species' taxonomy. Finally I had my engine robust enough to deal with the unlimited ways in which that taxobox can be structured: levels in the taxonomy can be there or not, the amount of levels varies, the spelling of levels varies, the value of a level can be plain text or contain any Wiki markup. Until I discovered yesterday that some specie pages do not use a taxobox, they use an "automatic" taxobox, a complex variations based on specie keys.
  • Another problem as a result of testing is this. Say you'd have a photo of an African Elephant. In the "Add specie" dialog the instructions are clear: "Elephant" will not work as it is not a specie, and thereby not specific enough. So you try "African Elephant". You'd expect this to be specific enough, unless you're a zoologist. See, in this case, even "African Elephant" is not enough. It's not a specie, instead "African Bush Elephant" is the correct specie, according to science and according to Wikipedia. But probably not according to you. In these situations, I have therefore implemented a routing mechanism. It's a manual table that I maintain in which I map a source, in this case a commonly failing yet well-intended query, into a valid target specie. So if you'd type "African Elephant", I will map it to "African Bush Elephant" for you. Over time I'm hoping this table will give you a better chance at the result you expect.

Without a doubt, there will be dozens more problems coming my way. But I will persist through them, because I dearfully believe in the concept. No matter what it takes, this will get done, and it will be done right.

Share |
RATE THIS CONTENT (OPTIONAL)
Was this document useful to you?
 
rating Awesome
rating Good
rating Average
rating Poor
rating Useless
CREATE A NEW COMMENT
required field
required field HTML is not allowed. Hyperlinks will automatically be converted.