Ithaca - a virtual reality piece using VoiceXML into a 3D Space: February 2006

Continued Server Problems

I'm having the same problems on stage. I'm not sure if this is a browser or LAN thing. Regardless, I need a more scientific way of testing the connection. The few times I can get it running here at NYU, it craps out.

There's probably a maximum number of times I can poll one page. How many other connections is the server able to handle? What is the frame rate of the VIRTOOLS file? I don't need to poll the server at 60 fps. I need to slow it down I do however, need an asynchronous data flow or something close.

Here is my development set up:

Windows XP Professional Version 5.1
IE 6.0.2900

VIRTOOLS 3.5

PHPMyAdmin

I've tested this on my Mac with Safari 10.3.9 and am not having the same problem, so this eliminates the LAN/Network as a culprit, maybe it has something to do with Windows and system-wide caching.

Web Server II

Something's wrong with my PHP files on my server, either that or there's some DB problems. I'm using NYU's stage server instead, I somehow think it's better than my ISP's. I've scripted PERL pages and output to a PHP page (In Dynamic Web, I couldn't figure out the PERL syntax for a stack). For the demo, the user could watch a 2D version work in tandem with a 3D version - I can't really think of any use for this, maybe it's like watching color instead of B&W.

PHP problems

I'm running into some kind of PHP coding, DB, ISP, or network traffic problem that's preventing me from pumping integers in and out seamlessly. It's important to note, I have no redundancy planned in the event that network traffic slows the whole application down.

Test model

I'm planning on using a test model before creating my final one - the head at the start of this blog. I may deform a mesh instead of using bones - this will probably save a major headache. Animating facial expressions is something I have no experience with - at CADA they teach a whole section on that alone for character animation. I'm not going to spend all of my time on this and will be happy with a smile or a blink - it's more important to concentrate on the final model.

Better Focus

Monday I had meeting that was greatly beneficial in terms of focusing on the model part of this thesis and learning about multimodal inputs. I'm roughly on target with my schedule but definitely need all of Spring Break to start building! I'm astonished by VIRTOOLS' capabilities - last year, I just didn't get it. Now, a different story altogether.

Schedule in Brief

Herewith a very loose description of tasks ahead. I would like to have some kind of working prototype as a .cmo by the middle of March. I'm going to use the head posted at the beginning of this blog - it seems like a good point of departure, but I'm hoping for a more elaborate final model.

Development Schedule – Draft I – Voice to 3D

Date
Task
Resource

Feb 23 – Feb 28th
Establish Database Connection in Virtools - make this data gettable for animation
Web/Virtools/VML

March 1st – 6th
Stitch head model and place IK bones for expressions – link DB variable to basic movement
MAYA/Virtools

March 7th – 15th
Scrape gettable VXML vars from log page (needs some kind of automated login)
Web

March 15th – March 30
Plan and start building model and design preliminary voice interface
MAYA, VXML

Meaningful Interaction

There needs to be more to the interaction than just telling an object what to do. I've pretty much concluded that configuring open source is not going to happen, much less a telephone server - that is a different project.

On the upside, my developer environment gives gettable (I think) properties from an IVR interaction.
Here is a screenshot of what their debugging tool looks like. I'm hoping to script access and scrape some of these parameters.

I'm seeing only three parameters of interest ("confidence" and "utterance/interpretation"), perhaps I can extrapolate more. To get these values, I'm going to have to script a login and scrape the web page and return the value back to VIRTOOLS. Would be great to see the Hidden Markov Model for the confidence utterance!

Sphinx Comments

The SPHINX documentation is mainly for voice software experts only. CMU has linked SPHINX up to an IVR system.

Here it is:

1-877-CMU-PLAN (1-877-268-7526) or at +1 412 268 1084.

The quality in one quick pass-through is very good.

This work, I believe, would be a massive project in an of itself - from a business perspective companies could use it, I think, free of charge because of SPHINX's licensing.

I need to find a better way to use IVR, delving into open source is not viable unless I can find some kind of authoritative how-to on the subject.

To summarize: my focus is on VIRTOOLS building blocks and interactivty based on dabase values - not configuring open source speech software.

Introductory Comments

Project in short: Voice as a multimodal input into 3D worlds

(I'm hoping to animate this face according to tone of voice)

Description

To build an environment either 3D, web based, or both, that is influenced in an original manner by voice commands, specifically, using a mobile phone to influence 2D or 3D action and models. I intend to build a new model in MAYA , but during the course of the thesis will use my pre-fabricated models from last semester including a head, a body, a car, a bar counter top.

Personal Statement

My work in Dynamic Web Development and Ubiquitous Computing for Mobile Devices led me to this attempt to combine VoiceXML/DTMF with Interactive 3D. Although voice-driven menus are dry as can be and irritate the hell out of customers, I'm inspired by using VoiceXML for creative purposes - specifically, navigating an historical map of Northern Africa, or telling a car to travel along a path. Unfortunately VoiceXML is a hosted solution - the voice engine is proprietary, so I'm hoping to explore other alternatives to interfacing voice to 3D. Carnegie Mellon University created SPHINX, an open-source engine - time providing, I'll explore this software although I don't want to get bogged down in the technical details of creating a viable software build - that is beyond the scope of this thesis. Of note, VXML falls under the umbrella of "multimodal" datatype. This is significant because an XML web site can be parsed into VXML, meaning that all web sites could be conceivably "voice enabled", or authoring for one input device can be inclusive of another.

Background

WC3 specifications for VoiceXML, multimodal devices
Sphinx CMU software
VoiceXML references and articles
Playstation II Voice activated commands
Readings in Interactive Telecommunications and Design for Voice Interface

Socom II for Playstation II features voice input with a USB headset. Here you can call on additional troops with a contextual menu. I believe the underlying technology is Dragon Naturally Speaking for headsets - this works well, a low margin of error.

Audience

The audience for this project falls into several categories - 2D, 3D for the web, and live interactive 3D. For the broadest reach, I'd like to publish a Virtools .cmo for the web. This would just require a telephone and a browser with the Virtools plug-in to experience the 3D, if the project goes as planned. The web audience could also access JavaScripted pages for examples of 2D interactivity.

The second audience is more specific - I envision a real-time demonstration on a PC with Internet connection and Virtools.

User Scenario

-The user would dial to my VXML provider or turn on a piece of hardware for a live demonstration.
-The dial-in user access my account and listens to a voice menu of options regarding voice commands to web.
-The user points the browser to either a Processing Applet or .cmo file on the web
-The web applet/.cmo makes a connection to my Database
-The Database stores user "moves" and commands
-User speaks to actor on web page
-User navigates aforementioned map

Implementation

This project is mostly software based, using front-end and backend web programming, Virtools Scripting, and modeling in MAYA.

Preliminary Observations

Thus far, I've learned that VXML is powerful, but proprietary. Voice engines are a science in and of themselves. VXML as a specification is multimodal, so if you want to access data from the web, it can be formatted for voice too. I know that voice commands can be captured in a web page through Processing or platform-specific JavaScripting, but I really don't know how it's going to work with exported Virtools .cmo's.

References

Here is a link to groundbreaking work in 3D by NYU's MRL lab and Ken Perlin's "Responsive Face": http://mrl.nyu.edu/~perlin/experiments/head/Face.html The expressiveness of this model is remarkable.

On the voice-recognition side, inspiration for the project input came from Dennis Crowley's Ubiquitous Computing for Mobile Devices where I learned how to intereface a mobile phone to the web and a database.

Ithaca - a virtual reality piece using VoiceXML into a 3D Space

Tuesday, February 28, 2006

Continued Server Problems

Web Server II

PHP problems

Test model

Better Focus

Wednesday, February 22, 2006

Schedule in Brief

Sunday, February 12, 2006

Meaningful Interaction

Saturday, February 11, 2006

Sphinx Comments

Monday, February 06, 2006

Introductory Comments

About Me

Links

Previous Posts

Archives