Aug 102011

I look at you and say “hey” – as if I just ran into you at the water cooler, or stopped by your cube, but without the planes, trains and automobiles to get us to the same physical location. In fact, there is an ocean in between us. We have a quick conversation and move on.

The scenario above is NOT video calling, video conferencing or telepresence. Making a video call is work and involves uncertainty and planning. The scenario above involves instant personal interaction, rather than the transaction of a call. The impromptu meeting above requires real-time, always-on, multi-way video. Video calls, yes even the best engineered telepresence sessions at 1080p with 30 FPS, actually limit or prohibit this type of interaction.

This is not the future of all video conferencing or telepresence. But is the future for use cases such as distributed work teams that work best with seamless real-time video interaction. Compare IM and mobile push-to-talk (like the service Nextel pioneered in the US) to email and phone calls to see the differences between the always-on paradigm and the transaction paradigm. Now to extend it to video.

We need three main developments to get to an always-on video paradigm:

  1. Move beyond SIP and H.323
  2. Visual presence
  3. Personal network enabling services

Beyond SIP and H.323

Video calls are transactions, not real-time interaction. Video calls are in fact multi-step transactions – find number, initiate call, hope it works, hope the other party picks up. Transactions that involve friction such as loss of time, uncertainty, delay, error, aggravation and overhead. Transactions that may be different depending who you are calling and where you are calling from, adding more friction and moving us further away from real-time interaction.

SIP and H.323 are designed for calling transactions, not real-time interaction. While you are waiting for your call to connect, SIP or H.323 is setting up the session. This includes aspects like authentication, capabilities exchange, codec negotiation and opening of ports. For use cases such as distributed workers in a team that require real-time interaction, we need protocols and methods to pre-build the video sessions, proactively “assuming” our video clients or endpoints will be talking. Methods and protocols that are built for the always-on paradigm, rather than the every-call-is-a-brand-new-call model of SIP and H.323.

We don’t necessarily need to entirely replace SIP and H.323 – we can overlay and augment – for example proactively exchange capabilities, broadcast relevant updates, cache control plane information, perform periodic discovery, put sessions in a “sleep” mode with a quick wake-up, etc. To get to always-on video, we need to stop setting up each and every call as if it was the first call attempted, either with new protocols or by overlaying new methods and protocols.

Visual presence

How does presence work if we are in the same physical area? I see and hear you and observe what you are doing, your expressions, any activity going on in the background, any cues you give me, etc. and judge whether you are available for a conversation. Online presence on the other hand is still mainly one dimensional such as a green light icon that just means you are online on some device or service. We can do better than that. We need visual, multi-dimensional presence.

For much of the past ten years, I’ve managed large, distributed, international teams, mainly from my home office. My tools have improved dramatically, especially for video. I now have a Cisco Telepresence EX90 with HD video calling, a touch-screen control for normal use, a web UI and CLI for more robust functionality, and decent interoperability – both SIP and H.323. But visual presence would make my current EX90 presence paradigm seem like a fax machine.

At the high end of the future visual presence spectrum, my EX90 continually transmits video. Audio is muted (until I choose to send it), but a text overlay at the bottom of my video tells my co-workers if I am sending or receiving audio in other conversations. Of course, that is a lot of bandwidth and potentially many MCU or switching ports for large distributed teams. But it will happen in some cases.

At the lower end of the future video presence spectrum, my EX90 (or video client on my tablet or PC) will periodically broadcast a relatively low resolution “snapshot” to my team every few minutes. The snapshot could be a still image with a text overlay describing the audio state at that time, or be a short video clip.

In the examples above, or similar paradigms, visual presence will enable us to meet at the water cooler at any time. Seamlessly, without friction, without work. I can tap different icons to start with IM only (like waving to you), voice-only (like saying “hi”) or go right to full video (like walking in your office). When we are ready to video, we instantly resume our video session, already connected via methods like the ones described above.

I can’t quantify how much this would improve the effectiveness of distributed teams, and help enable folks to work from remote areas that otherwise wouldn’t do very well in that paradigm. It would be certainly be an order of magnitude “larger” than the sum of all the technology and process improvements that I’ve seen in my experience managing distributed teams.

Personal video networks

Consider how Skype personal network building functionality enables us to quickly build our personal networks and easily communicate across those networks. Skype personal network building functionality gives us dial tone. Yes, it is only dial tone on the Skype island. However, I know anyone can easily and quickly get on the island, so it is not a prohibitive barrier.

Enterprise video services however are full of barriers, because the islands are usually defined by IT administrators, not controlled by each user. This needs to change. Enterprise video service providers and vendors need to add personal video network building capabilities and integrate with other personal networks so that we can have always-on video communications and visual presence.

Timing for moving beyond SIP, adding visual presence and making enterprise video into personal network-centric services? I don’t know but it will happen and I look forward to seeing you at the water cooler, anytime, anywhere, instantly.

  • Zock

    You are right. My thought was in the same direction. I see great obstacles with security. However IMs started from personal unformal network where the security is not so important.  I believe that the ideas in near future. Moreover there are corporate platforms that are ready for the always-on video like Cisco Webex Connect, IBM Sametime. I think that Microsoft has something about the same.
    Anyway I think that always-on video will start as personal non-corporate free software like ICQ was first really popular IM at least in my country.

  • Vuongdnguyen

    I agree that traditional SIP solution to video conferencing is not adequate. Shacast Communicator is a SIP phone that is complied with SIP standard but we also overlay a P2P layer ontop inorder to achieve full P2P video conferencing. You can have up to 5-way video conferencing without the need for any server. For more information please visit

  • Pingback: enterprise LAN elimination and communication revolution | NextBlitz

  • Pingback: Why WebRTC will not change communications | NextBlitz

  • Pingback: CU-RTC-WEB versus WebRTC | NextBlitz

  • Pingback: great distance collaboration » NextBlitz | NextBlitz