Ithaca - a virtual reality piece using VoiceXML into a 3D Space

Found the .cmo

I think I found the source files on my web server (!), give me a buzz if you'd like it and the terms of usage.

Reflections Two Years Later

Unfortunately due to a catastrophic drive failure I don't think I can recover my original Virtools Source Files for this project. That's OK. I'd like to redo it some day anyway with alternative tools.

I have some further thoughts on what it means. From a technical standpoint, I had a very primitive multi-user 3D environment using an alternative interface. But more importantly, I'm certain that (positive) human contact in virtual environments can have resonance and beauty. I chose particles because of their impermanence and voice as an input because it is one of the primary modes of expression. Celestial metaphors can get cheesy but I really think there's a hell of a lot going on in the skies above and I attempted to replicate that in terms of very basic connection. As with many interactive pieces, it only gets interesting as more participants join and sometimes leave.

I never really did figure out a systematic way to handle cameras. Is one God-like view better that individual ones that bring you closer to the action, but cut out most of the picture?

Voice Interface Article from SpeekTek Magazine

Speech Tech Mag March 2008

Dumbo Arist's Studio Show

Excerpts from this work will appear at the DUMBO artist's studio located at 20 Jay St., Brooklyn New York - studio 209 on the second floor, Saturday May 12th. Special thank you to Jordan Schachter for video editing!

What is this?

This is an art project (NYU Interactive Telecommunications) using VR/Interactive 3D software Virtools, IVR and web programming. The purpose to to have a rich user interaction using increasingly mature technologies that are intuitive and scale for web audiences and larger groups. The idea is that phone keys or speech can navigate 3D spaces.

Here is a "2D" web test I made that uses a Java applet so that the Object can move around the screen with IVR input (you can download Java3 quickly here). Would be great for maps or games!

The flower - if you scroll further down - is a pre-recorded simulation of how VoiceXML can control 3D objects.

Please contact me for a demo or to discuss applications.

SIGGRAPH 2006

A good show, not quite as good as last year. There was no ostensible voice to virtual reality work, but a lot of what I saw is begging to be speech enabled.

Here were the "best of show" im(not so)ho:

Pixels in Space?

Truly remarkable

swf

Show, continued user testing

Both pieces I created worked. Comments included:

-using the screen as a storefront where pedestrians could call in and see a clothing line
-more movement options
-lower opacity on the background for more particles to stand out
-laminated instructions were helpful
-DTMF used much more than voice since the room was noisy.

In this photo, someone has just called in on the right.

Visual Interface

Today's demonstration of the visual interface went well.

Z-Buffer

Virtools' Z-buffer (example of usage) may improve the visual context.

Call Volume

Several providers restrict the number of incoming calls. Going ahead, this means that 3 callers maximum should be on the system.

.avi

This is a rendering of the particles that must disappear altogether in favour of a more literal representation that works, i.e. a cube in space.

21.6 MB .avi - Win. only

Herewith a cube; it may work better.

Bad Demo

Everything that could possibly go wrong in the second demo did:

-Poor call routing and timeout
-Difficulty seeing particle systems
-Bad music
-VXML interface problems
-Recognition errors
-Camera motion objectionable

There were some good suggestions:

-Place all movement in a grid so that there is context
-Give the user the chance to select music
-Get rid of extraneous animation and graphics
-Add to initial voice menu
-For future, record user's voice input to display later in VIRTOOLS

I have exactly two weeks to iron out some of these very important issues. As suggested, it's best to trim and focus instead of increasing the scope at this point.

Web Playback

Neither the .vmo nor the .cmo render on a web page. This could be because I've overloaded on particles and there is just too much going on in the .cmo. Here's a screenshot - for web playback, I need to think of something else.

Design Problem

This doesn't look right and the particles get lost. It also doesn't quite go with the egalitarian theme I had in mind. Maybe I can change to opacity on the central figure and scale it somehow so that it isn't as distracting.

Final Design

I've been sketching what the project is going to finally look like. Here is a link to a QuickTime movie of the circles that will contain the particles. Each ring will be the parent of a system of generic particles that frame the particle actors. I think I'll remove the NURBS terrain and restrict it just to particles - the challenge is going to be how to make them stand out...Also this weekend, I have to address sound and how it will move according to the camera.Here is a link to the path camera version I will try to approximate in VIRTOOLS

User Testing II

I demonstrated the .vmo in SciViz and it went well, for the most part. It worked!

Comments:

-Terrain and environment need to improve, cameras, colors, perhaps even people at the bottom
-It is unclear which particle system represents whom - they need to be tagged visually and in the VoiceXML menu
-Navigation doesn't work properly, especially with a rotating camera and a move left or right command
-The effect is painterly, this could be improved with a background or better proximity to the particles

Server Issues III

I need to readjust the link delay.

Scaling/Shaders

The calls are working, there seems to be some kind of Internet lag or server-side caching. I'm not sure.

Regarding scaling - this piece has to scale for larger audiences. Imposing a limit of 10 is difficult to defend from an aesthetic and theoretical perspective. The reason why I chose 10 was that it would be manageable and an easy modulus value. Hard-coding 10 users has an unfortunate effect of inefficient coding - to many ifs, too many redundant server calls. I'm not pleased with this, but need to spend more time on the look of the environment.

There is a phenomenal shader example in the class .cmos. It looks extremely complex, but I'd really like to add liquids or a body of water to the environment.

Telephony problems

2 things:

The provider's "home" URL, or all three of them, are cached for a brief time

My local numbers are not working. There could be several reasons for this - badly formated and cached VXML, or some other limitation with the numbers. I'll wait before trying the number that is now just linked to a "hello word" as opposed to
a page that's mostly PHP

VXML Bug II

The bugs were all VXML syntax-related. One extra tab, or single quote and the whole page breaks. As far as the busy number, I can't say for 100% sure that the bug caused that.

I need to test the phone number for volume. Also, wrt redundancy, this is really the ideal set-up:

myServer - provider 1
stage - provider 2
myServer - provider 1
stage -provider 2

For presentation purposes, I need to consolidate the four phone numbers I have into two, otherwise it could be a mess. The key thing over the next three weeks is user testing.

VXML Bug

I'm getting busy signals on my two development phone numbers - either this is a bug in the code or a telephony thing.
Redundancy for the project will be as follows:

my server + provider 1
stage + provider 2

Starting the terrain

A terrain helps put the particles in context. For the camera, I think I'm going to get rid of the transition and rotate it on the x-axis. For the user movement, I'm playing on using either Yaw pitch and roll or some combination with up, down, left, and right. The thing is, the movement has to be different from my earlier 2D work. These VXML options really should be more inventive than something hard coded, as criticized before in class.

Scaled for 10 callers

I have a basic switching mechanism in place. How would this scale for more than 10 users - like 100 or so? My next step is to figure out how to have each move independently of each other while still pulling VXML from the same page.

Database

This database screenshot conveys what I'm trying to do wrt switching telephone calls programmatically. Here is the flow:

Initialize cells from VIRTOOLS
For each page view (i.e. phone call) of menu page, write a database value according to a modulus of the ID

Interface for Scalable Environment

I need to redesign the interface so that users can see their particle system when they call in. The current implementation transitions from the one particle to the other. It's going to be interesting to see in tomorrow's demonstration if the users can make the system work, or if it breaks altogether.

Call Switching from the web back end

I'm trying to use PHPMyAdmin and the modulo operator in PHP to switch calls. When the .vmo boots up, it sends a flag to the server to initialize the particle instances.

Qualitative Comments

As is, the particles look to similar and don't give the user a sense that they differ significantly. Over the next couple of weeks, I'll build and refine a NURBS terrain to house them in and give some sense of perspective.

Session Management in VoiceXML/PHP

I'm trying to figure out how session management will work in the final application. Setting up x amount of phone numbers with the provider is _not_ an intelligent solution to this problem. Essentially, I need to route callers so that their input can write to different database tables in turn pulled in by the Web Get Data building block in VIRTOOLS. Having an exclusive feed into
VIRTOOLS gives better interactivity so that the particle systems are autonomous for movement and camera transition. In an ideal world, the number of users could scale up, but there is probably no time for this feature, so I'll set this value at 15 or something.

Here are the remaining technical steps I have till May:

-Find a way to have a unique voice session on one phone number
-Design a new voice interface (the existing hard-coded one was met with disapproval - it basically consists of users saying their names)
-Create new database tables
-Create new web feeds into VIRTOOLS

User Response

My first set of user-testing in scientific visualization went well today - the system worked on the web page and I received some positive feedback.

Citique Notes, in short

The following project was mentioned with respect to my project:

Build your own shoe

This indicates the potential of a news event. Two different effects are at play in this project: voice and visualization. An element of spectacle is present. One way to improve the piece is to add location data to determine how the actors appear.

Phone number and URL

This is a demo that worked ok the last time I checked, press any key to navigate the particles.
You can download the web viewer from Virtools

It's hard to see depth in these particles, but the idea is that any particle emitter coordinate is
gettable at any point, or could be driven by IVR menus.

For I S L A N D demo

Call - 1 646 358 4368

http://www.dimitridarras.com/3D/demo13.htm

I S L A N D Screen shots

Here I've taken screen shots from two different actors' perspectives. This piece is predicated on multiple users navigating the system - an attempt to visualize John Donne's meditation XVII "no man is an island".

Visualization

Here is a screenshot of what the visual element will look like. I'm still working on the logic to get the camera to transition between the actors. Looking ahead, here is the technical to-do list:

-Vary the background with a slight alpha value
-Instantiate actors when they call the number
-Get a voice print if possible
-Decide if additional data feeds are necessary
-Test cross browser and platform
-Can anything interesting be done with the web data?
-Compose an original soundrack in Digital Performer that is spatially located within the piece
-Create redundant DB values on stage with a redundant .cmo

Camera Array

I'm making a camera array (by hand, not programmatic, yet) for the actors.

Pending:

Testing .vmo for performance
Scripting the transition camera to go between actors.

I'm scaling back on the input parameters from VXML for now - there are just too many options available, so for testing purposes I'm limiting them to just one.

NURBS to POLYGONS

This animated gif illustrates an interesting problem. Virtools only uses polygon models, but NURBS are the best choice for organic modeling. I'm noticing with this sketch that converting to polygons increases the triangle count, textures are uneven and soft edges become jagged.

Particles

I am using particles for their painterly effects. They are causing some problems however:

-Mac render crashes
-Using lots of emitters causes crashes
-Moving particles in 3D space is hard to visualize since they have no real 3D presence

Here is a screenshot of the navigation I have that unfortunately doesn't make sense for the interaction. The user scenario
I have in mind is for several participants - I'm unclear how the movement is going to animated though, through cameras, objects, or some combination thereof.

2 Actors

My goal is to figure out how to combine two users - in this conception as moveable particle systems:

Audio, .cmo, and test app. QA

I think I've solved the MP3 problem and am delaying the frame rate so that it doesn't loop within the conditional. The transition camera seems to break after a couple of minutes for some reason. The .vmo works fine once the MP3s are placed in the same dir. However, on the Mac, there is no audio (perhaps this has to do with the .wav BB).

I've asked three friends to test my work over the next couple of weeks - I believe they'll offer feedback on A) if it works and B) if the content therein is any good.

I've been thinking further about the interactivity and am happy to see that VIRTOOLS scales - this means that I could potentially get 15 feeds in at the same time on one phone line. Having a sound track to the .vmo has shed new light on where this project is going.

2 Feeds

I've established two feeds. On the to-do list:

-Create database tables for x amt. of feeds
-Switch between MP3 feed
-Find a reasonable way to route calls (initial menu linking to x amount of dynamic web pages)

Screenshots of basic working mechanism

Herewith the VoiceXML code and screenshots of my skyscraper with transitions up and down depeneding on what the phone user says. Here is a link to the web version:
http://www.dimitridarras.com/3D/demo01.htm

To interact, call 1 646 502 9002

Hit "1" at the prompt

Say "up" (2) or "down" (8)

Sound

I'm attempting to switch sounds based on the server value.

Files Retrieved

Seems like it's going to be OKAY. There was a disk error but my files are retrievable.

Server Crash

I've documented some of my work, getting the transition camera back along with the other BBs will take me into next week.

Polygonal head expression issues II

In VIRTOOLS, it seems like the best strategy to modify my head's facial expressions is to use Morphing. I will post examples of this shortly in my Virtools Blog.

Thesis Statement Expanded - Weekend Tasks

Splitting into groups two weeks ago was really helpful. It helped me expand on my thesis statements, what I'm trying to prove and disprove.

Thesis:

VoiceXML is a more intuitive interface to installations, architectural and data visualization than traditional point and click metaphors.

This just about covers the technial part. I have 8 weeks left to refine the content for this project, really, the heart of it.

This weekend, there are some pending issues in VIRTOOLS:

-Getting the (awesome) transition camera to well, transition, seamlessly on event handler.
-Figuring out IK handles in case I need them (this is part of the assignment for Scientific Visualization)
-Figuring out how to animate a MESH in VIRTOOLS (this is important if I'm going to do the Wizard-of-Oz head thing)
-Placing a taurus in a 3D grid and figuring out the Voice interface as to how to do this (it falls off of the page now till I tell it to go "up")
-Getting the web player to work with the VXML

Polygonal head expression issues

I'm trying out the polygon proxy to try to change the fixed expression and am running into some challenges. The way the model is built, the polygons stretch in an unnatural way. The smile is pinched, there are overlapping vertices, some N-sided polygons, and a seam on the rendered head. Here are some screen shots to illustrate what I'm referring to.

Server Issues II

Today all socket connections seem fine. For my thesis, I'd better have video back up in the event that my server and stage give problems.

Continued Server Problems

I'm having the same problems on stage. I'm not sure if this is a browser or LAN thing. Regardless, I need a more scientific way of testing the connection. The few times I can get it running here at NYU, it craps out.

There's probably a maximum number of times I can poll one page. How many other connections is the server able to handle? What is the frame rate of the VIRTOOLS file? I don't need to poll the server at 60 fps. I need to slow it down I do however, need an asynchronous data flow or something close.

Here is my development set up:

Windows XP Professional Version 5.1
IE 6.0.2900

VIRTOOLS 3.5

PHPMyAdmin

I've tested this on my Mac with Safari 10.3.9 and am not having the same problem, so this eliminates the LAN/Network as a culprit, maybe it has something to do with Windows and system-wide caching.

Web Server II

Something's wrong with my PHP files on my server, either that or there's some DB problems. I'm using NYU's stage server instead, I somehow think it's better than my ISP's. I've scripted PERL pages and output to a PHP page (In Dynamic Web, I couldn't figure out the PERL syntax for a stack). For the demo, the user could watch a 2D version work in tandem with a 3D version - I can't really think of any use for this, maybe it's like watching color instead of B&W.

PHP problems

I'm running into some kind of PHP coding, DB, ISP, or network traffic problem that's preventing me from pumping integers in and out seamlessly. It's important to note, I have no redundancy planned in the event that network traffic slows the whole application down.

Test model

I'm planning on using a test model before creating my final one - the head at the start of this blog. I may deform a mesh instead of using bones - this will probably save a major headache. Animating facial expressions is something I have no experience with - at CADA they teach a whole section on that alone for character animation. I'm not going to spend all of my time on this and will be happy with a smile or a blink - it's more important to concentrate on the final model.

Better Focus

Monday I had meeting that was greatly beneficial in terms of focusing on the model part of this thesis and learning about multimodal inputs. I'm roughly on target with my schedule but definitely need all of Spring Break to start building! I'm astonished by VIRTOOLS' capabilities - last year, I just didn't get it. Now, a different story altogether.

Schedule in Brief

Herewith a very loose description of tasks ahead. I would like to have some kind of working prototype as a .cmo by the middle of March. I'm going to use the head posted at the beginning of this blog - it seems like a good point of departure, but I'm hoping for a more elaborate final model.

Development Schedule – Draft I – Voice to 3D

Date
Task
Resource

Feb 23 – Feb 28th
Establish Database Connection in Virtools - make this data gettable for animation
Web/Virtools/VML

March 1st – 6th
Stitch head model and place IK bones for expressions – link DB variable to basic movement
MAYA/Virtools

March 7th – 15th
Scrape gettable VXML vars from log page (needs some kind of automated login)
Web

March 15th – March 30
Plan and start building model and design preliminary voice interface
MAYA, VXML

Meaningful Interaction

There needs to be more to the interaction than just telling an object what to do. I've pretty much concluded that configuring open source is not going to happen, much less a telephone server - that is a different project.

On the upside, my developer environment gives gettable (I think) properties from an IVR interaction.
Here is a screenshot of what their debugging tool looks like. I'm hoping to script access and scrape some of these parameters.

I'm seeing only three parameters of interest ("confidence" and "utterance/interpretation"), perhaps I can extrapolate more. To get these values, I'm going to have to script a login and scrape the web page and return the value back to VIRTOOLS. Would be great to see the Hidden Markov Model for the confidence utterance!

Sphinx Comments

The SPHINX documentation is mainly for voice software experts only. CMU has linked SPHINX up to an IVR system.

Here it is:

1-877-CMU-PLAN (1-877-268-7526) or at +1 412 268 1084.

The quality in one quick pass-through is very good.

This work, I believe, would be a massive project in an of itself - from a business perspective companies could use it, I think, free of charge because of SPHINX's licensing.

I need to find a better way to use IVR, delving into open source is not viable unless I can find some kind of authoritative how-to on the subject.

To summarize: my focus is on VIRTOOLS building blocks and interactivty based on dabase values - not configuring open source speech software.

Introductory Comments

Project in short: Voice as a multimodal input into 3D worlds

(I'm hoping to animate this face according to tone of voice)

Description

To build an environment either 3D, web based, or both, that is influenced in an original manner by voice commands, specifically, using a mobile phone to influence 2D or 3D action and models. I intend to build a new model in MAYA , but during the course of the thesis will use my pre-fabricated models from last semester including a head, a body, a car, a bar counter top.

Personal Statement

My work in Dynamic Web Development and Ubiquitous Computing for Mobile Devices led me to this attempt to combine VoiceXML/DTMF with Interactive 3D. Although voice-driven menus are dry as can be and irritate the hell out of customers, I'm inspired by using VoiceXML for creative purposes - specifically, navigating an historical map of Northern Africa, or telling a car to travel along a path. Unfortunately VoiceXML is a hosted solution - the voice engine is proprietary, so I'm hoping to explore other alternatives to interfacing voice to 3D. Carnegie Mellon University created SPHINX, an open-source engine - time providing, I'll explore this software although I don't want to get bogged down in the technical details of creating a viable software build - that is beyond the scope of this thesis. Of note, VXML falls under the umbrella of "multimodal" datatype. This is significant because an XML web site can be parsed into VXML, meaning that all web sites could be conceivably "voice enabled", or authoring for one input device can be inclusive of another.

Background

WC3 specifications for VoiceXML, multimodal devices
Sphinx CMU software
VoiceXML references and articles
Playstation II Voice activated commands
Readings in Interactive Telecommunications and Design for Voice Interface

Socom II for Playstation II features voice input with a USB headset. Here you can call on additional troops with a contextual menu. I believe the underlying technology is Dragon Naturally Speaking for headsets - this works well, a low margin of error.

Audience

The audience for this project falls into several categories - 2D, 3D for the web, and live interactive 3D. For the broadest reach, I'd like to publish a Virtools .cmo for the web. This would just require a telephone and a browser with the Virtools plug-in to experience the 3D, if the project goes as planned. The web audience could also access JavaScripted pages for examples of 2D interactivity.

The second audience is more specific - I envision a real-time demonstration on a PC with Internet connection and Virtools.

User Scenario

-The user would dial to my VXML provider or turn on a piece of hardware for a live demonstration.
-The dial-in user access my account and listens to a voice menu of options regarding voice commands to web.
-The user points the browser to either a Processing Applet or .cmo file on the web
-The web applet/.cmo makes a connection to my Database
-The Database stores user "moves" and commands
-User speaks to actor on web page
-User navigates aforementioned map

Implementation

This project is mostly software based, using front-end and backend web programming, Virtools Scripting, and modeling in MAYA.

Preliminary Observations

Thus far, I've learned that VXML is powerful, but proprietary. Voice engines are a science in and of themselves. VXML as a specification is multimodal, so if you want to access data from the web, it can be formatted for voice too. I know that voice commands can be captured in a web page through Processing or platform-specific JavaScripting, but I really don't know how it's going to work with exported Virtools .cmo's.

References

Here is a link to groundbreaking work in 3D by NYU's MRL lab and Ken Perlin's "Responsive Face": http://mrl.nyu.edu/~perlin/experiments/head/Face.html The expressiveness of this model is remarkable.

On the voice-recognition side, inspiration for the project input came from Dennis Crowley's Ubiquitous Computing for Mobile Devices where I learned how to intereface a mobile phone to the web and a database.

Ithaca - a virtual reality piece using VoiceXML into a 3D Space

Monday, April 14, 2008

Found the .cmo

Friday, March 28, 2008

Reflections Two Years Later

Wednesday, March 12, 2008

Voice Interface Article from SpeekTek Magazine

Friday, May 11, 2007

Dumbo Arist's Studio Show

Sunday, August 06, 2006

What is this?

Friday, August 04, 2006

SIGGRAPH 2006

Friday, June 30, 2006

swf

Friday, May 12, 2006

Show, continued user testing

Wednesday, April 26, 2006

Visual Interface

Sunday, April 23, 2006

Z-Buffer

Thursday, April 20, 2006

Call Volume

Wednesday, April 19, 2006

.avi

Tuesday, April 18, 2006

Bad Demo

Monday, April 17, 2006

Web Playback

Design Problem

Thursday, April 13, 2006

Final Design

User Testing II

Monday, April 10, 2006

Server Issues III

Scaling/Shaders

Sunday, April 09, 2006

Telephony problems

Friday, April 07, 2006

VXML Bug II

VXML Bug

Wednesday, April 05, 2006

Starting the terrain

Scaled for 10 callers

Tuesday, April 04, 2006

Database

Interface for Scalable Environment

Call Switching from the web back end

Sunday, April 02, 2006

Qualitative Comments

Session Management in VoiceXML/PHP

Wednesday, March 29, 2006

User Response

Thursday, March 23, 2006

Citique Notes, in short

Tuesday, March 21, 2006

Phone number and URL

I S L A N D Screen shots

Monday, March 20, 2006

Visualization

Wednesday, March 15, 2006

Camera Array

Tuesday, March 14, 2006

NURBS to POLYGONS

Particles

Sunday, March 12, 2006

2 Actors

Audio, .cmo, and test app. QA

2 Feeds

Saturday, March 11, 2006

Screenshots of basic working mechanism

Sound

Friday, March 10, 2006

Files Retrieved

Thursday, March 09, 2006

Server Crash

Wednesday, March 08, 2006

Polygonal head expression issues II

Saturday, March 04, 2006

Thesis Statement Expanded - Weekend Tasks