On Point Writing Solutions
 
Illustrated Portfolio
Text Portfolio
About
Expertise
Clients
Testimonials
Contact
 

Voice 2.0: What's It All About

Voice 2.0 didn’t come into being at one fixed point in time so much as it evolved through technologies and concepts that allowed real-time, IP-based transmission of voice, video, data and instant messages from anyplace. That process began back in the 90’s and continues today. The cost-savings implications for telecom are huge: those savings will grow as Voice 2.0 apps are developed that anticipate and accommodate all the ways that IP telephony users want to communicate.

This doesn’t pre-empt carriers from retaining a piece of the action. If anything, they can be part of the evolution. How? Sprint’s corporate strategy VP Russ McGuire suggests two things: operate networks that are reliable and robust enough to handle the expanded communications traffic, and be flexible enough to let customers have a say in how they receive the service. It becomes all about how the carriers support applications on their networks.

Voice 2.0: The Basics and More

Definition and Buzzword

Voice 2.0 is the next step up from Voice 1.0 – the traditional telephony system we all knew growing up. It’s about voice, but it’s so much more than voice – so the term is more buzzword than exact description. Generally speaking, it refers to how developments in technology have merged with new and expanding needs for business-process and personal communications find their expression in Web-based IP telephony applications.

While these are voice-enabled apps, voice is just the start. Indeed, in some quarters, pure voice is seen as the least of the communication modes, and the one that will drop off in value. ifByPhone CEO Irv Shapiro takes that view, expressing a preference for his firm’s “apptel” business – i.e., merging apps and communications – to working in the declining space of raw voice calls. Because it’s in that “apptel” niche, Shapiro doesn’t “have to compete with the race for voice calls to the bottom”.

Though Voice 2.0 isn’t that new anymore, there aren’t many well-known examples of it. Skype, with its video component, is one such app. American Express’ customer service operation uses a Smart Interactive Voice Response app. Voice 2.0 is the driver for some of the iPhone apps – things like video capture, editing and sharing; and multi-purpose messaging via text, photos, video and audio (also message forwarding). GrandCentral’s Voice 2.0 offering, which Google obtained in 2007, generated a lot of interest with its palette of features, including next-generation answering machine capabilities and versatile call-routing and call-switching functionality. It also provided some standard VoIP facility.

In and of itself, VoIP can be part of the Voice 2.0 mix, but as a standalone service (e.g., Magic Jack and Vonage) it wouldn’t be considered true Voice 2.0.

PSTN Interface

The beauty of Voice 2.0 is what makes it work. People don’t need to employ special codes or get retrained to make sense of it. It’s user-friendly, so users just call Public Switched Telephone Networks (PSTN, a.k.a. Plain Old Telephone Service, or POTS) as they would from a regular phone. The devices and software do the work through Dual-Tone Multi-Frequency (DTMF) touch tones.

Voice 2.0 is the logical extension of the migration of legacy PSTN onto a VoIP foundation, since Voice 2.0 is the integration of IP telephony and the Web. Again, the value of the old networks in this new world is that they are large enough to support the resulting new technologies.

Traditional Voice Applications

Whatever happens with traditional voice applications in the future, they are the stuff that distinguishes Voice 2.0 now. These include IVR – where a computer identifies voice and keypad entries -- conferencing, text-to-voice (as with your iPhone reading your e-mail aloud to you), and unified messaging. That last one – unified messaging – suggests the larger goal of Voice 2.0 – a converged system that ties in all modes of communication so they can be accessed from any single interface, wherever you are. That, in turn points to something called Fixed Mobile Convergence, which is the full integration of VoIP, landline and cellular communications. Everything happens for you from a single telephone number. Industry observer Robert Poe of VoIP News points out the huge benefit of this capability for callers: they can own their own phone numbers and link them through any service or provider they choose. That’s a big step up from today’s number portability.

Business-Process Automation with Voice

The sophistication of new tools like VoiceXML is making it possible to create even richer, more expansive voice applications with more business-process value. Based upon voice-recognition software, VoiceXML gives users access to automated information and services via the Web through telephones and other voice interfaces. This is a big money-saver, too: developers build the automated voice services with the same technology for making visual Web sites, so the infrastructure and service delivery costs are reduced.

Where voice and business process automation really complement one another is through voice-enabled IT apps like sales force automation, Customer Relationship Management (CRM), payroll, accounting and e-mail. Every operational tool in an enterprise becomes a communications mode with voice. Right now, soft phones let business travelers use their laptops to make online phone calls, but the next generation soft phones will become platforms for building a comprehensive system of applications for knowledge workers.

The Voice-Web Mash-Up

Without knowing what they are, we can know that a whole new class of applications will come out of the mash-up of voice and the Internet that is the voice Web. VoiceXML is the jumping off point for this process, and then tools like PHP Voice – a voice-designated programming language similar to VoiceXML – will carry it further with web scripting tools that can create more sophisticated voice applications.

The wealth of services that can come out of the voice-enabled Web is where the real business-process value lies. Application possibilities are only limited by the imagination. Consider some of them already in play: MLS real-estate descriptions coupled with mapping; voice-activated matchmaker services; variations on customer service, involving functions like inventory, ordering, availability and delivery tracking. All of these and more will emerge out of the fertile combination of text, voice, Web and programmatic access to data.

Voice 2.0 and The Cloud

Like anything else involving the Web these days, Voice 2.0 has its place in the cloud. Mario Dal Canto, the CEO and Chairman of cloud computing platform provider SIMtone Corp., has observed that cloud computing makes the network and service provider data center the repository of all programming and maintenance challenges. Hence, the end-point is agnostic to these operations. What that means is that Voice 2.0 transmissions via the cloud can be customized to fit any end-point configuration. Voice 2.0 enables voice on the Web so that it can be ubiquitous and take form in voice-to-voice, voice-to-Web and Web-to-voice communications. This makes it easier for service provider to build more efficient Voice 2.0 offerings. That directly addresses the problems Voice 2.0 has faced in gaining widespread adoption, because it’s been a costly a difficult undertaking for Web services to deliver rich content in real-time.

The cloud can provide a hosted model for service delivery of Voice 2.0 applications. The app servers are managed in hosted infrastructure and a hosted global managed service network is the routing nexus for all incoming and outgoing calls. And, hosted voice can reliably deliver reliable business services to homes as well as enterprises. In writing recently for No Jitter, a leading online community idea exchange about subjects such as unified communications, Dave Michels identified hosted voice as one of several key influences the cloud will have upon voice. Among the others are:

  • Simplified PBX, where multiple delivery systems and services can easily and inexpensively send advanced voice features (e.g., voice-mail transcription; conferencing and conference transcription; call recording, call center queues and click-to-dial) to any phone system in the world.
  • Voice-enabling APIs, where new-line cloud enterprises are using APIs as catalysts for creating voice-on-demand services from outside-source data or functionality.
  • Phone top applications which exploit the largely untapped browser capabilities of IP phones to develop specialized applications. Since most desk phones are adjacent to computer desktops, with their much greater functionality, this hasn’t happened much. But that’s changing with apps like attendance and school/emergency safety services (for schools) and room service, valet parking and tee times (for the hospitality industry). It only makes sense, because phones are located in more places than computers.
  • Super PBX mobility, where the PSTN’s Direct Inward Dialing (DID) numbers accommodate voice system integrations. This is a higher-level PBX, and it makes all sorts of phones – cell, home, digital, office, soft and even hotel – into virtual PBX extensions that not only route calls to the phones’ DID numbers, but stay connected to monitor the  call for more service requests (e.g., for recording or call transfer).
  • Virtual phone, the counterpart to phone top applications, in that it aggressively pushes soft phone solutions which are much more powerful than a phone. We can expect that an expanding array of enterprise services like collaboration, mobility and videoconferencing will create more integration between unified communications services and the desktop computer.
  • Distributed phone systems, which transcend the fixed-site location of phone systems to make system components available across the nationwide. This farms out pieces of the system to cities across the country (e.g., the conference bridge could be sent to Houston, the main call manager to San Antonio, and the voice mail and session manager to Portland and Seattle, respectively.

The larger point here isn’t the specific applications that Voice 2.0 will run through the cloud. Sure, we’ll have phone systems grafted onto virtual in-cloud servers and tied in with multiple APIs and PSTN receivers. We’ll have XML apps that are browser-activated and run API calls right through desktop computers. No, the larger point is that enterprise voice will realize its full innovative potential in the cloud.

Voice-recognition apps that are hosted in-cloud can produce multiple benefits. Noted speech recognition consultant lists several of them:

  • Front end capital investment is eliminated or drastically reduced;
  • Scalability and upgrading is easier,  and scalability can meet demand for high or quickly-changing call volumes
  • New apps are quickly deployed to market;
  • Smoother integration with existing web self-service;
  • Can use best-of-breed technologies;
  • Top-notch network reliability and redundancy; and
  • There’s compatibility with legacy telephone and Web infrastructure.
Call Fire, a provider of on-demand cloud telephony systems, has prescribed the essential characteristics such systems should include: namely, a scalable infrastructure that can handle large call-volume spikes; customizable IVR that is vendor-agnostic; Web, mobile and API interfaces that manage all your voice services in real time, wherever they are; and text-to-speech conversion that is clear, crisp and un-delayed. It should go without saying that these are user-friendly systems that come with perfectly reliable tech support.

The Good, The Bad and The (Potentially) Ugly

Strengths

Voice 2.0 has formidable advantages, starting with cost. For starters, it’s simply cheaper to provider than POTS. And, in tandem with the enhanced web functionality that is sometimes called Web 2.0, the sophistication of enterprise applications, and employee productivity, are expanding apace. In his blog for iPhone Developers Journal, John Hart describes how the emergence of new voice and rich-media technologies has software vendors “integrating Web 2.0 collaborative features and enhanced voice features – like click-to-call, automatic conferencing and find-me, follow-me – into existing enterprise applications, such as customer relationship management”.  The productivity gains happen because so many of the employees – in customer service and sales, among other places – can reach more customers more quickly. While VoIP is a starting place for Voice 2.0, it, too, has features that can shrink operational expenses from international and other long-distance calls.

The convergence of voice, video, instant messaging and data along the same IP –enabled path is the primary economy introduced through Voice 2.0. Businesses can merge all this traffic onto one line, which slashes or even eliminates set-up and infrastructure costs. This isn’t a given, however, because service providers and Web developers have to devise common standards and equipment that can run all these types of data smoothly. And, the telcos and other network operators must have broad and deep enough networks to shoulder the voice- and video-traffic loads that advanced technologies have – hopefully – decreased to manageable size.

Because Voice 2.0 opens up a whole new universe of communications play beyond the single, static, two-way exchange of Voice 1.0, it perfectly complements the dynamic, interactive character of rich media. But there are multiple factors that businesses must consider in deciding how to integrate the voice-and-rich-media combination into applications. From a cost standpoint, Hart raises a few key questions:

  • Is the deployment of these capabilities affordable, or can the enterprise buy voice and rich media features for on-demand usage and lower the financial risk and capital costs of the undertaking?
  • At what price will it be possible to augment voice with social media and rich media features?
  • Will operational communications cost more or less?
  • Will deploying voice, mobile and rich media into basic and Web 2.0 applications boost or lower telephony costs?

The over-arching factor is how software vendors will price the voice and Web interactive features they introduce in their next-generation products. Will they charge an ongoing service fee to provide those features or will they make the fee a one-time charge as part of the upgrade? To the extent vendors make some of these features available as basic services, at no extra cost, it will cut deployment costs to enterprise managements and let them pay for only as much service as they use.

Voice 2.0 functionality begins and ends in the same place, really – with all of your phone lines completely integrated to receive and deliver all possible voice-enabled applications. You can switch call functions among and between any and all of your lines without hanging up the call. There’s a continuous relay capability that keeps the communication alive wherever you or your devices are located. The irresistible momentum is towards creating devices with unlimited capability to transmit and receive instant messages, e-mails, videos and calls over the most direct and inexpensive pathways.

At the very least, while these super-versatile devices are in development, Voice 2.0 can certainly make it easier to move calls between gadgets with the help of a SIP-based standard (Session Initiation Protocol) that chooses the best place to route communications at any given moment.

If the solutions providers are on the same wavelength as their enterprise customers, then expanding the capacity of existing telephony networks with new voice-and-Web capabilities can be done seamlessly so that the system doesn’t miss a beat and employees have a very user-friendly learning curve to master the new apps. The new features can be powerful enough to link calls globally rather than just regionally. To the extent vendors build and install features that utilize commonly-accepted inter-operability standards instead of proprietary protocols, the functional strength and reliability of the operating infrastructure increases.

Presence management – which monitors and coordinates the online availability of people to participate in communications -- also comes into being with Voice 2.0. The principle is that a product enables a single phone number to access any user communications interface – be it a PDA, landline, cell or laptop. Call management responds to the device the user is working, their upcoming schedule and their availability. This coordination function creates a lot of efficiency for businesses in that it arranges meetings for team members at times of maximum convenience for all of them. Call centers, too, would benefit from this by saving money on unnecessary calls and voice mails. There’s even a little sleight-of-hand made possible by presence management, where companies have created temporary, disposable phone numbers for communicating the desired information without divulging normal contact data.

Other apps, such as faxes, have also gotten in sync with IP telephony with the advent of the necessary inter-operability protocol.

So often, the focus on business-process Voice 2.0 technology is how it makes life easier for the enterprise, through intra-office communications as well as multi-modal transmissions between workers stationed at home and road locations. But, really, the ultimate focus is on the customer, and one of the prime apps to come along for exploiting that relationship is click-to-call. This utilizes a product that lets consumers talk with company sales or customer-service reps immediately. It doesn’t cost either person anything, nobody has to dial a phone number and there’s no need for new software to make it happen, either.

It’s impossible to track the ultimate evolution of Voice 2.0; it’s challenging enough to anticipate everything that could happen in the near term, much less the distant future. But what is inevitable is that the builders and vendors of voice and rich media apps that have Web functionality will be turning out more options with greater versatility. As those apps become more widespread, the cost of adoption and implementation is bound to drop – until the next great new thing arrives. Also, there is clear pressure to give the user more and more control over the direction of voice, video, data and instant messaging communications. So, more open-source and less proprietary technology becomes inevitable, too. Eventually, the user could be making all the decisions about what functionality is available.

Voice 2.0 makes telecom a much better quality carrier of long-distance calls carrier. Coding can arrange to have Voice 2.0 companies send these calls over inexpensive, conventional uncompressed voice circuits and use VoIP only for switching purposes. (Relying on VoIP for the complete call process in a multi-modal communications infrastructure can muddy the sound considerably.) This is the best of both worlds -- traditional telephony sound and IP versatility.

There’s a competitive impetus for Web-based telecom apps, too. Their selling points are quality and reliability, but they need to match what Voice 2.0 providers can do, since those companies can negate the telco quality advantage by delivering their calls over the same wire-line voice and cellular circuits the telcos use. Hence, telecoms must work with Web coders to develop feature-rich, efficient and convenient apps at a reasonable price. If they make that concession and give up the old monopoly mindset of dictating service options from on high, the telecoms not only can survive, but even thrive in the era of Voice 2.0.

In fact, there are some strong telco assets that will have significant value in the Voice 2.0 universe, at least for the next few years. Sanjay Jhawar, partner in the strategy consulting firm Ideas and Plans, has identified those which can inform application development. These include subscriber intelligence and profile data (which can alert carriers to their customers’ needs to move out of a captive legacy environment, and therefore offer them a more open standard approach), call control inbound triggers, outbound identity, three-screen linkages and content bundles; and video optimization for device and dynamic network conditions.

Being able to carry your own phone number with you throughout your life is one of the signature perks of Voice 2.0. But this means more than just keeping your number for whatever phone mechanism you like. As Poe emphasizes, it frees you up from being tethered to any technology or service provider you don’t like or you suspect may go out of business. Plus, you control who does and doesn’t have access to your number, as you can block unwanted calls at will.

For professional purposes, it’s even better if you can designate and keep your own business number beyond the time you’re actually employed at any particular corporation. Having that number, instead of the one that comes with your office location or computer lets you stay in contact with choice customers from your prior workplace – to your benefit and that of your current employer.

There’s another twist on portability with mobile phones, too, where you can input multiple numbers to your proprietary cellular phone number and identify the various callers through customized ringtones and call messages. A variation on that application theme lets you use incoming caller IDs to vary your voice mail greetings for different callers. No longer does a one-number, one-phone status limit what you can do with your calls.

As appealing as portability is, it hasn’t been given for everyone who wants it. Whether the user wants local number portability, for fixed lines, or full mobile number portability, for their cell devices, there have been limits on what numbers can be transferred, and those have had to do with the location of the number, the extent and quality of coverage in its service area and technology restrictions. Some users who had been assigned the same numbers for decades found that they couldn’t port those numbers over to a given carrier to take advantage of its voice apps. To get a number that is portable from that point on, they faced paying a big switching cost.

Fortunately, some of the inertia here is falling away. Last fall, for instance, Google allowed subscribers to begin porting their existing phone numbers over to a Google Voice account. Previously, you had to obtain a new number from Google to utilize their voice app. Though some features, like call screening and recording, aren’t available under that arrangement, Google’s decision raised the stakes for traditional telecom carriers, who may find it harder to compete on services like enhanced voicemail.

Technology developments also require the intelligence capability to correctly route calls through ported numbers and charge appropriate costs for them. The efficiency and timeliness of the call routing are issues in this regard: if calls are misrouted, there may be transit penalties, higher termination costs and corrupted call clarity.

Weaknesses

Now the caveats about Voice 2.0. For all its realized potential – and future prospects – Voice 2.0 is not the ultimate expression of voice-cum-Web telephony. At least, not yet it isn’t.

First off, the technology is deficient in one of the most vital indices of telephony communications quality – speed. Computers don’t translate voice entries quickly, which can be unnerving if you’re waiting on a message transmission.

App development is only as advanced as the capabilities of the vendors and solutions providers who are cranking out new tools. At last year’s PTCO9 panel discussion, moderator Gary Kim posed the operative question here: do vendors develop apps that demonstrate a real, incisive understanding of business problems and help facilitate the conduct of business? Coming out with usage features just for their own sake, and unrelated to meeting dollars-and-cents communications needs, doesn’t do much good. The corollary question is this: do these apps inspire the IT people inside the enterprise to build their own telephony solutions? Here’s where open standards are very important, because they can kick-start customers out of the legacy environment and support their own voice app development. Automating processes for customers also helps them serve their clientele with better-targeted apps.

The risk with app development is that the developers can be too smart for their own good. Limited-focus niche apps aren’t helpful with voice-Web communication functions that are basically similar across many kinds of large enterprises and small businesses, yet have differences that reflect that particular industry. Developers need a keen understanding of business logic and processes, as well as technological acumen, to come up with a mix of solutions that are standardized and customized to meet specific needs. As calls become more likely to originate from within web applications over time, voice has to become less commoditized and more concerned with enriching the call interaction, not just making sure the data is transmitted from point to point.

Data network capacity isn’t always sufficient to carry bandwidth-intensive voice and video traffic, and the presence of VoIP further strains networks and even threatens to overwhelm them. So, the technology has to stay at least a step ahead of the voice and video communications volume by spreading it out or reducing it to manageable loads.

The most obvious drawback of Voice 2.0 is that it’s useless where there’s little or no Internet access. That applies to most people in the world.

It may be that new applications for integrating voice, rich media and mobile applications into IP telephony will work perfectly well – in a vacuum. But there’s also the possibility that any one of these could be incompatible with existing business software or disable other applications. Even if these complications don’t occur, the new features may come with a learning curve that’s steep enough to provoke a negative reaction from workers begin trained in their proper usage.

The person that Dan Fisher, president and CEO of consulting firm The Copper Group, calls the Digital Native is going to take to all the neat and cool features and possibilities of Voice 2.0 much more readily, as a rule, than the 50- or 60-something corporate lifer for whom the Web became a necessary evil or an acquired taste. That Digital Native, born after 1985, might give her dad an iPhone for his birthday so that he can videotape family gatherings and then edit and send the recording to friends and relatives. For the daughter, that’s one of a myriad number of Voice 2.0 apps she’s comfortable using instinctively. For Dad, maybe that’s as much technological change as he dares to absorb for weeks or months, lest he feel overwhelmed rather than intrigued by the iPhone’s possibilities.

Therein lies the challenge that vendors, service and solutions providers face in acclimating many enterprise users to the new frontier that is Voice 2.0. The preferred approach with people who are uneasy with online work, as well as new to Voice 2.0, may be incremental, giving them one app at a time to master gradually, and then introducing additional apps that build on previously learned skills. For instance, for those technology laggards who at least know caller ID, learning the new-fangled answering machine features of a Voice 2.0 option like Google’s GrandCentral could be a piecemeal process – starting with screening a call and listening to it before you decide to take it. Then, perhaps, the newcomers can learn how to record their calls – with an announcement cuing them about the start of the recording – then figure out how to forward the messages to themselves by e-mail.

What would make the most sense to break down the resistance of those who fear being overwhelmed by all the intricacies of Voice 2.0? Start with fundamentals, those elements having to do with signaling and controlling voice and other message forms that Voice 2.0 makes possible. Two such elements stand out for the Voice 2.0 novice – presence and directories.

Presence technology will appeal to everyone, be they dinosaurs or bleeding-edge technocrats, because it will let them know if the person you want to call is available, or busy or even willing to take the call. This makes all calls more valuable and more timely, and even cuts down on unnecessary voice mail. Conceivably, you could even have pre-conference call knowledge of which potential attendees will be there for the call.

Directories, in Voice 2.0, are customizable. You’re not assigned the type of data allowed, you determine it – be it name, address, phone and other contact details, personal and work interests and talents, publication subscriptions, references, credentials, you name it. A very basic tool, easy to flesh out, with infinite data possibilities.

Opportunities

There are so many communities for whom Voice 2.0 represents a chance to put voice and Web integration to work on behalf of their business and personal interests. For telecoms, the conventional wisdom that Voice 2.0 imperils their position needn’t be true at all, as the carriers have the network reach and strength to make maximum use of new apps on behalf of their subscribers.

For Web developers, Voice 2.0 is an invitation to test and surpass the boundaries of their creative powers and imagination in devising apps that exhaust the full range of Voice 2.0 possibilities for fixed mobile convergence, the complete integration of Web 2.0 and rich media features and the range of functionality that can be established through a single phone number assigned to a wide range of communications devices and modes.

For enterprise workers, there’s the prospect of immediate, 24/7 availability at the home office and in the field to any business associates, clients, colleagues or prospects. It makes all their calls and other voice-enabled communications highly pertinent and valuable and drastically reduces – even eliminates – wasted time and effort.

For IT managers, this is a tool that lets them anticipate and respond to the real-world business-process needs of their workplace peers and subordinates. They can work in tandem with Web developers and solutions providers, educating them in the business logic that is specific to their industry. In turn, they can learn what they need to develop their own apps in-house from time to time – and in time-sensitive ways that respond to communications challenges as they arise.

Where everybody wins with Voice 2.0 is how it makes it so much easier and cheaper to talk in a variety of voice-enabled communication applications. The variations on the theme are many – , among them online conferencing, app sharing, e-commerce, push-to-talk – but all of them are expanding voice communications and making 9-to-5 a thing of the past and being open for business 24-7 an ever-present voice-web integration reality.

Safety is another advantage. Authentication protocols for Voice 2.0 make its usage more secure and lessen the chance of connecting with people who fraudulently assume an identity other than their own. Usernames and passwords are among the tools that verify when people claim to be who they are. That not only saves time, but also untold amounts of money that would otherwise go to scammers obtaining your credentials under false pretenses. Voice biometric engines can “fingerprint” the voice user to make a correct identification. These can include suite offerings with multiple enrollment and verification methods that fit IVR, enterprise, Web, call center and mobile applications; LDAP (lightweight directory access protocol) software; vocal PIN replacements for conferencing or voicemail set-ups; and vocal time and location tracking products to keep tabs on populations like a mobile workforce.

Threats

With so much going for it, are there any reasons why Voice 2.0 might fall short of its promise and fail to realize the claims being made for it? Yes, and they start with whatever reluctance there is on the part of potential users to adopt the technology.

The reasons for hesitation are classic to almost any breakthrough technology which upsets the prevailing order and aren’t necessarily unique to Voice 2.0. Fear, for one – of being unable to understand or master the apps. Skepticism, for another – i.e., doubts that the technology can yield solutions to the specific business problems that voice-enabled web is intended to address. Lack of vision, too, is an obstacle: we’ve always done things this way and see no reason to change now. Economics is another. While the cost savings possible through Voice 2.0 have been readily identified, some will balk at any expenses involved in transitioning over to the technology, and perhaps even take a “this-is-too-good-to-be-true” approach to adoption.

Whatever the reasons, the danger of foot-dragging adoption is that the technology never gets used, which is, practically speaking, a death knell for it. This is a particular tragedy in developing nations and economies where the growing sophistication of communications in the information age can mean the difference between ongoing poverty and ignorance, and full entrance into the community of technologically-advanced, and prospering nations. Harvard Business School has found that it takes almost a half century, on average, between the invention of a technology and its adoption. Countries that fall well below that average invariably suffer much lower levels of productivity and per capita income.

Voice 2.0 might also be pre-empted by some other technology advance – a Voice 3.0, if you will. Next-generation-network IVR technologies like SIP 2.0 come to mind. This is a VoIP-based media platform that creates the voice prompts, tones, DTMF recognition and recording capabilities that evolving networks need to utilize. It’s a fully-integrated application, much like a fleshed-out Voice 2.0 product would be.

Companies offering hosted-voice services, which have become known as Voice 3.0, bypass the 2.0 providers to let software developers link their applications straight to the PSTN. Click-to-call, where you phone someone from your Web page, is one example of this.

Or the threat to Voice 2.0 could be an easier way to text. Oftimes, it’s easier to text than call; if a new technology came along – such as a new keyboard – that greatly simplified the texting process, it could undermine the position of voice-based apps.

Could there be too much of a good thing, too, where all sorts of unrelated Voice 2.0 are thrown into the fixed/mobile convergence pot? This could blur the purpose of features to the point of irrelevance or confusion.

Finally, there’s the fear of the unknown – specifically, the development of yet-to-appear applications that could make Voice 2.0 obsolete.

Major Players

The pre-eminent companies in the Voice 2.0 space have their distinctive characteristics. Vendors are distinguished chiefly by who provides a platform for building custom applications vs. whose set-up requires you to use or modify an off-the-shelf solution. In most cases, too, they won’t let customers keep their own numbers for operating those cool voice-enabled apps. Instead, users have to take a designated number, and they’ll be billed in the 3-5-cent-per-minute range for usage.

Twilio – As this decade began, Twilio is the clear leader in Voice 2.0 technology. They’re unique in that they provide a pure platform play for developing apps; they create an agnostic foundation on which literally millions of developers can build Voice 2.0 applications. With almost everyone else, the product-and-service offering is a hybrid of solutions and platforms.

This is where their industry leadership comes in. Since everything on Twilio is built from scratch, they are, arguably, the most versatile in that they provide all the building block codes, tools and techniques to developers, who have free rein to create as they wish with their own servers and databases. Nobody else has as extensive (read: versatile) a set of communications options for talking with callers. You can use any audio format you wish – MP3, WAV, AIFF, etc. – for playing spoken-text audio to callers, and you can record caller audio as a WAV of MP3 or take the URL which represents this Twilio-hosted recording and store it URL in your database.

Where Twilio has a drawback, if any, is that it does have some pre-built applications but it doesn’t promote them. And there’s relatively little code available for those apps, though developers are welcome to copy and modify what code there is.

Ifbyphone – An example of a hybrid provider of solutions and platforms, they offer a variety of developers APIs – for phone mashups, cloud telephony and call initiation APIs – through which new applications can be accessed. The phone API meshes voice technology with business processes; the cloud telephony API allows developers to put their web skills to work integrating things like voice broadcasting and IVR into business apps; and call initiation has imaginative options for connecting callers directly and indirectly. There’s also two data set-ups, for sending phone call data to web servers in real time and when the call is completed.

Callfire – Another hybrid and, like ifbyphone, they have a set of solutions you can choose from and modify, and host them on a platform in the cloud. In the solutions end, they provide customized system deployments for up to several hundred server clusters: these include software integration to, and custom IVR and logic development for phone systems. On the platform side, the choices include a voice broadcast interface and a scalable hosted cloud call center.
(One advantage to hybridization is that your customer pool is probably greater, along with your potential financial return from any given client who wants both ends of your business. The potential drawback is being neither fish nor fowl and perhaps blurring your focus in the eyes of potential customers.)

Voxeo – Voice 2.0 thought leadership guru Thomas Howe heads up this interesting operation, which is strongly oriented towards providing advanced IVR platform options. They collaborate with IVR app developers to create bundled solutions that meld Voxeo’s IVR platform and hosting solutions with pre-built IVR apps.

As platform-heavy as Voxeo is, they view platform features as products, and promote them as such – a stance which takes some getting used to for Voice 2.0 industry participants. Take their IMified platform -- the largest of its kind for hosted instant messaging apps. Voxeo portrays it as an entity divided between IM- and SMS-enabled apps (short text messaging).  It’s a way to assert the flexibility of their offerings.   Other companies, like DSO Software, don’t split up IM and SMS, but rather combine them, as DSO does in its IMSMS tool for sending and receiving SMS messages in an IM time frame.

However bold-thinking Voxeo is, the wonder is that, with Howe at the helm, they’ve given relatively little funding to their ventures, compared with the likes of Twilio.

Simple Signal – This company’s forte is data center-based, in-cloud, unified communications management, with an emphasis on hosted PBX scalability through a nationwide VoIP network to rival that of a carrier. The closer they make for their case is the purported savings that users will realize over traditional carriers, in the range of more than 50 percent – up to 75% for businesses that utilize their SIP trunking. What’s really novel about their approach is their on-screen toolbar app that announces incoming calls and presents click-on, instant-access call routing options.

Simple Signal really occupies a neutral space between the platform and applications developers, in that it’s creating a means to integrate those two things. So they’re not really in competition with either camp.

Ribbit – They’re in that solutions provider camp. Two products in particular stand out, and they allow developers to utilize tools for creating advanced voice app features. One is pegged for business professionals and links mobile phones and the Web. The second, intended for sales force personnel, integrates cell phones and sophisticated voice automation features directly into Salesforce.com. In both cases, Ribbit has a back support system to help developers market and sell their applications.

Mobivox – This is another hybrid, leaning more in the direction of solutions than platforms, and those solutions are about call feature versatility from any phone. There’s the ability to get and receive calls from anywhere and keep your numbers private, a feature for dictating and sending SMS and e-mail communications, and conferencing capability, too.

Exposition of Terms

Let’s get down to terms – literally – to establish and understand the vocabulary of Voice 2.0.

Smart IVR – Interactive Voice Response – IVR – is where a computer registers keypad and voice input commands.  Smart IVR is Web-based, and its integration with an automatic call distributor facility – ACD – provides fast and wide-ranging call-routing capability for an enterprise. It triggers an automatic response in whatever medium you’re using – be it voice, fax, e-mail or something else – so that users communicate their information almost instantaneously.

Voice to Text – The conversion of voiced words to text happens through a process known as speech recognition – also called automatic or computer speech recognition. While the recognition system can be keyed to identify a particular voice, it will recognize anybody’s voice – as in a call center application. Some of the Voice 2.0 applications of voice-to-text include call routing, voice dialing, basic data entry and speech-to-text processing, as with e-mails or word processing.

Text to Voice – Software that converts text to speech is sophisticated enough to make the switch into voices that sound natural. It can take written text like MS Word, e-mails and PDF files and change them into spoken words. Some solutions (e.g., NaturalSoft’s © NaturalReader) can also turn written text into audio files such as MP3 for iPods and CD players.

SIP – Session Initiation Protocol controls functions such as voice and video calls over the Internet. It can apply to starting, changing and ending both two-party and multi-party calling sessions. Where SIP modifies phone communications, it can change addresses or ports, expand the circle of callers, or add or subtract media streams. There are additional application possibilities like video conferencing, instant messaging, streaming multimedia and presence information.

VoIP – Voice over Internet Protocol transmits voice conversations through the Internet and any of IP-based network and it has many strong points, such as: it can transmit multiple calls over a single broadband-connected phone line; it makes many PSTN options available (e.g., three-way calling and automatic redial); its architecture establishes secure calling through existing common protocols; and VoIP phones can connect to other Web-based services during voice calls, among them video, data file, message exchange and audio conferencing.

VoIP is at the heart of voice-enabled applications whose rate of development seems to be growing exponentially. These include voice-enabled Web pages that provide tech support or accommodate social networking; unified reach numbers that can smoothly migrate between devices; cell-based callback services; and VoIP and Microsoft Office-based relevance engines (i.e., search engine tools that link content and the demand for it.

While VoIP is foundational to Voice 2.0, it isn’t the last word in it. Rather than act as a telephony focal point (like it is with Skype or Cableco), it functions as a catalytic influence in development of Voice 2.0 services.

PBX – The Private Branch Exchange is a private intra-enterprise telephone network where users share some outside lines for calling out. There’s a switchboard and accompanying equipment that’s usually on premises. PBXs switch calls between internal extensions or an inside extension and the national telephone system.

POTS – Plain Old Telephone Service, as it’s called, is the vernacular reference to Public Switched Telephone Network. All the equipment and software works like the traditional telephone, via touch tones.

VoiceXML – Voice Extensible Markup Language is for writing Web pages that users interact with by hearing spoken prompts and jingles and control through voice input. It is the link between the Web and telephones. The resulting audio dialogues contain features like synthesized speech, digitized audio, the recognition of spoken and DTMF keyed input and telephony. To find out what VoiceXML “feels” like, prospective users can phone into a variety of tutorial voice portals. Some sites also host VoiceXML for free.

VoiceXML documents – also known as files or pages – describe several functions: the previously-mentioned spoken prompts (or synthetic speech), voiced word-and-phrase recognition, DTMF touch-tone recognition, and dialog flow control; and two additional actions: the output of audio files (which are downloaded) and streams (which are played in the Web browser), and telephony control (call transfer and hang-up).

Soft phone – This is a software program for making phone calls over the Web using any given computer instead of dedicated hardware. It’s frequently designed to function like and resemble a traditional telephone. Ordinarily, the soft phone has a headset connected to the personal computer sound card, or it’s paired with a USB phone.  Calls can be made to other soft phones or traditional phones, and IP telephony service providers might make these calls between personal computers available for free.

Soft switch – Telecommunications networks will have central devices, called soft switches, that uses computer system software to connect calls between phone lines. It usually controls phone connections where circuit and packet networks intersect.