Alexa Skills
Got to try developing a demo Alexa Skill recently. That was... interesting.
Firstly, my experience is tainted by needing to communicate with an existing service which doesn't support OAuth and in some cases requires support for Windows Auth, etc...
In other words, our endpoint isn't exactly what Amazon had in mind. Also, we don't really have a great answer to how this would roll out to a customer, given that each customer might potentially have a different endpoint. But, proof of concept!
In the end, I went with JavaScript as my development language. Our servers which Alexa can actually talk to aren't configured with Windows Auth so with jsdom and hacking XmlHttpRequest into JQuery I was able to make things works.
Alexa skills seem to be a lot like a state machine. Which is a straight forward thing once you know how/when/why things are activated. At the same time... it is really F***ed up. Reason being, JavaScript is a scripting language. It wasn't really built as a thing to supply states and state handlers to a state machine. And neither is C# and the same likely goes for the other supported languages. And then it seems like most times it starts out in the prior state, but sometimes it starts in the initial state. Which makes for some fun when you realize that intents meant for one state are coming in on another and you need to determine how to handle them.
On top of this... it is one of the worst documented things I have ever seen and one of the worst laid out.
I went the route of building an AWS lambda function.
I will tell you this right now; there is no official documentation that takes you from start to finish on even a simple Hello World app for this, in any supported language. There is, thankfully, unofficial guides out there. What is documented is pieces of code and links to complete code samples to the body of a lambda function in a single language. Omitted from the guide and code is;
Firstly, my experience is tainted by needing to communicate with an existing service which doesn't support OAuth and in some cases requires support for Windows Auth, etc...
In other words, our endpoint isn't exactly what Amazon had in mind. Also, we don't really have a great answer to how this would roll out to a customer, given that each customer might potentially have a different endpoint. But, proof of concept!
In the end, I went with JavaScript as my development language. Our servers which Alexa can actually talk to aren't configured with Windows Auth so with jsdom and hacking XmlHttpRequest into JQuery I was able to make things works.
Alexa skills seem to be a lot like a state machine. Which is a straight forward thing once you know how/when/why things are activated. At the same time... it is really F***ed up. Reason being, JavaScript is a scripting language. It wasn't really built as a thing to supply states and state handlers to a state machine. And neither is C# and the same likely goes for the other supported languages. And then it seems like most times it starts out in the prior state, but sometimes it starts in the initial state. Which makes for some fun when you realize that intents meant for one state are coming in on another and you need to determine how to handle them.
On top of this... it is one of the worst documented things I have ever seen and one of the worst laid out.
I went the route of building an AWS lambda function.
I will tell you this right now; there is no official documentation that takes you from start to finish on even a simple Hello World app for this, in any supported language. There is, thankfully, unofficial guides out there. What is documented is pieces of code and links to complete code samples to the body of a lambda function in a single language. Omitted from the guide and code is;
- A link to the structure of the intents "file"
- A link to the structure of the utterances "file"
- An easily locatable reference on the different parameter types in an utterance and how to use them
- Where to upload the lambda function
- How to tie the lambda function to the Skill
- Where to find logs when stuff fails
- Clear indication that N. Virginia is the only server which supports the free tier
I'm sure I missed several things. And don't get me wrong. I'm not saying doc on those individual things does not exist. It does. Just not in one place. And not in the place you're taken to when you click on the guide to help build an Alexa Skill. And not in a place you'll find by blindly stumbling around. I ONLY found everything due someone else's personal blog on building an Alexa skill.
Also clearly lacking, a well documented 1st party test harness/environment/IDE/whatever.
Again, it isn't that such stuff doesn't exist. I found a couple of things that would load your skill and allow you to send simulated requests to test your function without uploading. But none of them are
1st party or linked from the guides.
I don't want to say it would be impossible to figure this out from the guides. But it would certainly be incredibly difficult.
All of that said. It is still REALLY cool when you ask a puck on your desk to interact with your application and it just works.
Much of my griping ignores the fact of course that my uses for this device don't really align with Amazon's intentions for it.
That being said... here are a few things Amazon could do better as far as Lambda functions are concerned:
- Guide should be in developer portal. In fact, it should be the wizard for configuring the skill
- Links to formats for utterances, intents, and lists and all types should be exposed as you configure... note: some of this IS present when configuring skill.
- I should be able to upload my lambda function code directly while configuration my skill
- The upload should be able to parse my uploaded zip and extract the utterances and intent definitions (either by expecting in predefined files or via a manifest of some sort)
- Without this there are extra manual deployment steps, and frustrations around versioning
- Should also be able to upload custom lists in this fashion
- Would be nice to modify lists in code or at least be able to programmatically update them
- Lists improve voice recognition and being able to limit it to values relevant to a particular user could improve accuracy drastically
- Same goes for intents and utterances
- Being able to alter these lists dynamically would allow devs to tailor more complex apps to what makes sense on a case by case basis.
- A pre-configure sample app including local test harness
- Even if it is just one of the supported languages, it would improve things 100 fold
- Ability to run apps on the local network
- Ability to prompt for information via cards
The above is a long list. But I'll concede the latter points are more focused at LoB or larger and more complex apps (aside from 7 and 8). #2 would result in an immediate and direct improvement.
#1 and 2 are huge problems, especially for Lambda skills. You need to write the Lambda before you can link it to your Alexa Skill, but the definition of the utterances and intents is in with the skill itself. So, if you don't have prior experience, you write up this lambda function, confused to hell as to how it all actually works. Then, when you finally get back to configuring the skill, the utterance and intent definition shine some light on it... but then you might find you design your lambda function in a piss poor way. It is all back and forth and SUCKS royally.
And, if you need to ever need to change intents, it always requires you to do so in three different places (well, only 2 if you focus on Amazon sites), but you have to change your code, upload the lambda in one place and then reconfigure the skill itself in another.
And now you can see why 2 and 3 are big items for me. If the intents and utterances were files I uploaded along with my lambda I would work on them in my local code along with the lambda function. If I could then also upload it all through the same portal I use to configure the skill I would just need one FUCKING dev portal AND it would all actually make sense.
But, for all the rage... I really have nothing against it. But then, I was getting paid a salary while figuring this all out. I just think it is a million times worse than what Microsoft generally delivers and probably a thousand times worse than what Google and Apple typically deliver to devs.
Comments
Post a Comment